Amazon Redshift SQL translation guide

Data types

Users moving data between ClickHouse and Redshift will immediately notice that ClickHouse offers a more extensive range of types, which are also less restrictive. While Redshift requires users to specify possible string lengths, even if variable, ClickHouse removes this restriction and burden from the user by storing strings without encoding as bytes. The ClickHouse String type thus has no limits or length specification requirements.

Furthermore, users can exploit Arrays, Tuples, and Enums - absent from Redshift as first-class citizens (although Arrays/Structs can be imitated with SUPER) and a common frustration of users. ClickHouse additionally allows the persistence, either at query time or even in a table, of aggregation states. This will enable data to be pre-aggregated, typically using a materialized view, and can dramatically improve query performance for common queries.

Below we map the equivalent ClickHouse type for each Redshift type:

Redshift	ClickHouse
`SMALLINT`	`Int8` *
`INTEGER`	`Int32` *
`BIGINT`	`Int64` *
`DECIMAL`	`UInt128`, `UInt256`, `Int128`, `Int256`, `Decimal(P, S)`, `Decimal32(S)`, `Decimal64(S)`, `Decimal128(S)`, `Decimal256(S)` - (high precision and ranges possible)
`REAL`	`Float32`
`DOUBLE PRECISION`	`Float64`
`BOOLEAN`	`Bool`
`CHAR`	`String`, `FixedString`
`VARCHAR` **	`String`
`DATE`	`Date32`
`TIMESTAMP`	`DateTime`, `DateTime64`
`TIMESTAMPTZ`	`DateTime`, `DateTime64`
`GEOMETRY`	Geo Data Types
`GEOGRAPHY`	Geo Data Types (less developed e.g. no coordinate systems - can be emulated with functions)
`HLLSKETCH`	`AggregateFunction(uniqHLL12, X)`
`SUPER`	`Tuple`, `Nested`, `Array`, `JSON`, `Map`
`TIME`	`DateTime`, `DateTime64`
`TIMETZ`	`DateTime`, `DateTime64`
`VARBYTE` **	`String` combined with `Bit` and Encoding functions

_{* ClickHouse additionally supports unsigned integers with extended ranges i.e. UInt8, UInt32, UInt32 and UInt64.}
_{**ClickHouse’s String type is unlimited by default but can be constrained to specific lengths using Constraints.}

DDL syntax

Sorting keys

Both ClickHouse and Redshift have the concept of a “sorting key”, which define how data is sorted when being stored. Redshift defines the sorting key using the SORTKEY clause:

CREATE TABLE some_table(...) SORTKEY (column1, column2)

Comparatively, ClickHouse uses an ORDER BY clause to specify the sort order:

CREATE TABLE some_table(...) ENGINE = MergeTree ORDER BY (column1, column2)

In most cases, you can use the same sorting key columns and order in ClickHouse as Redshift, assuming you are using the default COMPOUND type. When data is added to Redshift, you should run the VACUUM and ANALYZE commands to re-sort newly added data and update the statistics for the query planner - otherwise, the unsorted space grows. No such process is required for ClickHouse.

Redshift supports a couple of convenience features for sorting keys. The first is automatic sorting keys (using SORTKEY AUTO). While this may be appropriate for getting started, explicit sorting keys ensure the best performance and storage efficiency when the sorting key is optimal. The second is the INTERLEAVED sort key, which gives equal weight to a subset of columns in the sort key to improve performance when a query uses one or more secondary sort columns. ClickHouse supports explicit projections, which achieve the same end-result with a slightly different setup.

Users should be aware that the “primary key” concept represents different things in ClickHouse and Redshift. In Redshift, the primary key resembles the traditional RDMS concept intended to enforce constraints. However, they are not strictly enforced in Redshift and instead act as hints for the query planner and data distribution among nodes. In ClickHouse, the primary key denotes columns used to construct the sparse primary index, used to ensure the data is ordered on disk, maximizing compression while avoiding pollution of the primary index and wasting memory.

Data types​

DDL syntax​

Sorting keys​

Data types

DDL syntax

Sorting keys