Considerations when using zero-ETL integrations with Amazon Redshift - Amazon Redshift

Considerations when using zero-ETL integrations with Amazon Redshift

The following considerations apply to zero-ETL integrations with Amazon Redshift.

  • Your target Amazon Redshift data warehouse must meet the following prerequisites:

    • Running Amazon Redshift Serverless or an RA3 node type.

    • Encrypted (if using a provisioned cluster).

    • Has case sensitivity enabled.

  • If you delete a source that is an authorized integration source for an Amazon Redshift data warehouse, all associated integrations will go into the FAILED state. Any previously replicated data remains in your Amazon Redshift database and can be queried.

  • The destination database is read-only. You can't create tables, views, or materialized views in the destination database. However, you can use materialized views on other tables in the target data warehouse.

  • Materialized views are supported when used in cross-database queries. For information about creating materialized views with data replicated through zero-ETL integrations, see Querying replicated data with materialized views.

  • You can query tables only in the target data warehouse that are in the Synced state. For more information, see Metrics for zero-ETL integrations.

  • Amazon Redshift accepts only UTF-8 characters, so it might not honor the collation defined in your source. The sorting and comparison rules might be different, which can ultimately change the query results.

  • Zero-ETL integrations is limited to 50 per Amazon Redshift data warehouse target.

  • Tables in the integration source must have a primary key. Otherwise, your tables can't be replicated to the target data warehouse in Amazon Redshift.

    For information about how to add a primary key to Amazon Aurora PostgreSQL, see Handle tables without primary keys while creating Amazon Aurora PostgreSQL zero-ETL integrations with Amazon Redshift in the AWS Database Blog. For information about how to add a primary key to Amazon Aurora MySQL or RDS for MySQL, see Handle tables without primary keys while creating Amazon Aurora MySQL or Amazon RDS for MySQL zero-ETL integrations with Amazon Redshift in the AWS Database Blog.

  • You can use data filtering for Aurora zero-ETL integrations to define the scope of replication from the source Aurora DB cluster to the target Amazon Redshift data warehouse. Rather than replicating all data to the target, you can define one or more filters that selectively include or exclude certain tables from being replicated. For more information, see Data filtering for Aurora zero-ETL integrations with Amazon Redshift in the Amazon Aurora User Guide.

  • For Aurora PostgreSQL zero-ETL integrations with Amazon Redshift, Amazon Redshift supports a maximum of 100 databases from Aurora PostgreSQL. Each database replicates from source to target independently.

  • Zero-ETL integration does not support transformations while replicating the data from transactional data stores to Amazon Redshift. Data is replicated as-is from the source data base. However, you can apply transformations on the replicated data in Amazon Redshift.

  • Zero-ETL integration runs in Amazon Redshift using parallel connections. It runs using the credentials of the user who created the database from the integration. When the query runs, concurrency scaling does not kick in for these connections during the sync (writes). Concurrency scaling reads (from Amazon Redshift clients) works for synced objects.

  • You can set the REFRESH_INTERVAL for a zero-ETL integration to control the frequency of data replication into Amazon Redshift. For more information, see CREATE DATABASE and ALTER DATABASE in the Amazon Redshift Database Developer Guide.

Considerations when the zero-ETL integration source is Aurora or Amazon RDS

The following considerations apply to Aurora and Amazon RDS zero-ETL integrations with Amazon Redshift.

For Aurora sources, also see Limitations in the Amazon Aurora User Guide.

For Amazon RDS sources, also see Limitations in the Amazon RDS User Guide.

Considerations when the zero-ETL integration source is DynamoDB

The following considerations apply to DynamoDB zero-ETL integrations with Amazon Redshift.

  • Table names from DynamoDB greater than 127 characters are not supported.

  • The data from a DynamoDB zero-ETL integration maps to a SUPER data type column in Amazon Redshift.

  • Column names for the partition key or sort key greater than 127 characters are not supported.

  • A zero-ETL integration from DynamoDB can map to only one Amazon Redshift database.

  • For partition and sort keys, the precision and scale maximum is (38,18). Numeric data types on DynamoDB support a maximum precision up to 38. Amazon Redshift also supports a maximum precision of 38, but the default decimal precision/scale on Amazon Redshift is (38,10). That means values scale values can be truncated.

  • For a successful zero-ETL integration, an individual attribute (consisting of name+value) in a DynamoDB item, must not be larger than 64 KB.

  • On activation, the zero-ETL integration exports the full DynamoDB table to populate the Amazon Redshift database. The time it takes for this initial process to complete depends on the DynamoDB table size. The zero-ETL integration then incrementally replicates updates from DynamoDB to Amazon Redshift using DynamoDB incremental exports. This means the replicated DynamoDB data in Amazon Redshift is kept up-to-date automatically.

    Currently, the minimum latency for DynamoDB zero-ETL integration is 15 minutes. You can increase it further by setting a non-zero REFRESH_INTERVAL for a zero-ETL integration. For more information, see CREATE DATABASE and ALTER DATABASE in the Amazon Redshift Database Developer Guide.

For Amazon DynamoDB sources, also see Prerequisites and limitations in the Amazon DynamoDB Developer Guide.

Considerations when the zero-ETL integration source is applications, such as, Salesforce, SAP, ServiceNow, and Zendesk

The following considerations apply to source is applications, such as, Salesforce, SAP, ServiceNow, and Zendesk with Amazon Redshift.

  • Table names and column names from application sources greater than 127 characters are not supported.

  • The maximum length of an Amazon Redshift VARCHAR data type is 65,535 bytes. When the content from the source does not fit into this limit, replication does not proceed and the table is put into a failed state. For more information about data type differences between zero-ETL integration application sources and Amazon Redshift databases, see Zero-ETL integrations in the AWS Glue Developer Guide.

  • The minimum latency for a zero-ETL integration with applications is 1 hour. You can increase it further by setting a non-zero REFRESH_INTERVAL for a zero-ETL integration. For more information, see CREATE DATABASE and ALTER DATABASE in the Amazon Redshift Database Developer Guide.

For sources of zero-ETL integrations with applications, also see Zero-ETL integrations in the AWS Glue Developer Guide.