Snowflake SQL translation guide
Data types
Numerics
Users moving data between ClickHouse and Snowflake will immediately notice that
ClickHouse offers more granular precision concerning declaring numerics. For example,
Snowflake offers the type Number for numerics. This requires the user to specify a
precision (total number of digits) and scale (digits to the right of the decimal place)
up to a total of 38. Integer declarations are synonymous with Number, and simply
define a fixed precision and scale where the range is the same. This convenience
is possible as modifying the precision (scale is 0 for integers) does not impact the
size of data on disk in Snowflake - the minimal required bytes are used for a
numeric range at write time at a micro partition level. The scale does, however,
impact storage space and is offset with compression. A Float64
type offers a
wider range of values with a loss of precision.
Contrast this with ClickHouse, which offers multiple signed and unsigned
precision for floats and integers. With these, ClickHouse users can be explicit about
the precision required for integers to optimize storage and memory overhead. A
Decimal type, equivalent to Snowflake’s Number type, also offers twice the
precision and scale at 76 digits. In addition to a similar Float64
value,
ClickHouse also provides a Float32
for when precision is less critical and
compression paramount.
Strings
ClickHouse and Snowflake take contrasting approaches to the storage of string
data. The VARCHAR
in Snowflake holds Unicode characters in UTF-8, allowing the
user to specify a maximum length. This length has no impact on storage or
performance, with the minimum number of bytes always used to store a string, and
rather provides only constraints useful for downstream tooling. Other types, such
as Text
and NChar
, are simply aliases for this type. ClickHouse conversely
stores all string data as raw bytes with a String
type (no length specification required), deferring encoding to the user, with
query time functions
available for different encodings. We refer the reader to "Opaque data argument"
for the motivation as to why. The ClickHouse String
is thus more comparable
to the Snowflake Binary type in its implementation. Both Snowflake
and ClickHouse
support “collation”, allowing users to override how strings are sorted and compared.
Semi-structured types
Snowflake supports the VARIANT
, OBJECT
and ARRAY
types for semi-structured
data.
ClickHouse offers the equivalent Variant
,
Object
(deprecated) and Array
types. Additionally, ClickHouse has the JSON
type which replaces the now deprecated Object('json')
type and is particularly
performant and storage efficient in comparison to other native JSON types.
ClickHouse also supports named Tuple
s and arrays of Tuples
via the Nested
type,
allowing users to explicitly map nested structures. This allows codecs and type
optimizations to be applied throughout the hierarchy, unlike Snowflake, which
requires the user to use the OBJECT
, VARIANT
, and ARRAY
types for the outer
object and does not allow explicit internal typing.
This internal typing also simplifies queries on nested numerics in ClickHouse,
which do not need to be cast and can be used in index definitions.
In ClickHouse, codecs and optimized types can also be applied to substructures. This provides an added benefit that compression with nested structures remains excellent, and comparable, to flattened data. In contrast, as a result of the inability to apply specific types to substructures, Snowflake recommends flattening data to achieve optimal compression. Snowflake also imposes size restrictions for these data types.
Type reference
Snowflake | ClickHouse | Note |
---|---|---|
NUMBER | Decimal | ClickHouse supports twice the precision and scale than Snowflake - 76 digits vs. 38. |
FLOAT , FLOAT4 , FLOAT8 | Float32 , Float64 | All floats in Snowflake are 64 bit. |
VARCHAR | String | |
BINARY | String | |
BOOLEAN | Bool | |
DATE | Date , Date32 | DATE in Snowflake offers a wider date range than ClickHouse e.g. min for Date32 is 1900-01-01 and Date 1970-01-01 . Date in ClickHouse provides more cost efficient (two byte) storage. |
TIME(N) | No direct equivalent but can be represented by DateTime and DateTime64(N) . | DateTime64 uses the same concepts of precision. |
TIMESTAMP - TIMESTAMP_LTZ , TIMESTAMP_NTZ , TIMESTAMP_TZ | DateTime and DateTime64 | DateTime and DateTime64 can optionally have a TZ parameter defined for the column. If not present, the server's timezone is used. Additionally a --use_client_time_zone parameter is available for the client. |
VARIANT | JSON , Tuple , Nested | JSON type is experimental in ClickHouse. This type infers the column types at insert time. Tuple , Nested and Array can also be used to build explicitly type structures as an alternative. |
OBJECT | Tuple , Map , JSON | Both OBJECT and Map are analogous to JSON type in ClickHouse where the keys are a String . ClickHouse requires the value to be consistent and strongly typed whereas Snowflake uses VARIANT . This means the values of different keys can be a different type. If this is required in ClickHouse, explicitly define the hierarchy using Tuple or rely on JSON type. |
ARRAY | Array , Nested | ARRAY in Snowflake uses VARIANT for the elements - a super type. Conversely these are strongly typed in ClickHouse. |
GEOGRAPHY | Point , Ring , Polygon , MultiPolygon | Snowflake imposes a coordinate system (WGS 84) while ClickHouse applies at query time. |
GEOMETRY | Point , Ring , Polygon , MultiPolygon |
ClickHouse Type | Description |
---|---|
IPv4 and IPv6 | IP-specific types, potentially allowing more efficient storage than Snowflake. |
FixedString | Allows a fixed length of bytes to be used, which is useful for hashes. |
LowCardinality | Allows any type to be dictionary encoded. Useful for when the cardinality is expected to be < 100k. |
Enum | Allows efficient encoding of named values in either 8 or 16-bit ranges. |
UUID | For efficient storage of UUIDs. |
Array(Float32) | Vectors can be represented as an Array of Float32 with supported distance functions. |
Finally, ClickHouse offers the unique ability to store the intermediate state of aggregate functions. This state is implementation-specific, but allows the result of an aggregation to be stored and later queried (with corresponding merge functions). Typically, this feature is used via a materialized view and, as demonstrated below, offers the ability to improve performance of specific queries with minimal storage cost by storing the incremental result of queries over inserted data (more details here).