Using DateTime in Primary Key for Time Series #55954

marioaae · 2023-10-23T19:13:50Z

marioaae
Oct 23, 2023

Dear community,
My understanding is that it is NOT recommended to use high-cardinality fields as primary keys, for example, unique_ids as this would create too many granules as explained in this article:

https://clickhouse.com/docs/en/optimize/sparse-primary-indexes

However, when I read articles about making efficient time-series tables, I always see DateTime types in the ORDER BY clause, which by definition, if there's no PRIMARY KEY clause, then the ORDER BY clause become as well the PRIMARY KEY. Some of those examples can be found here:

https://clickhouse.com/blog/working-with-time-series-data-and-functions-ClickHouse#querying

CREATE TABLE wikistat
(
    `time` DateTime,
    `project` String,
    `subproject` String,
    `path` String,
    `hits` UInt64
)
ENGINE = MergeTree
ORDER BY (time)

In my use-case, I'm generating over 2k entries per second (they are batched before inserting them in clickhouse), and each entry has a DateTime timestamp which would generate basically entries covering every millisecond. If I follow the example above, where timestamp is part of the ORDER BY (and therefore, also the PRIMARY KEY), then that would contradict the recommendation of not using high-cardinality columns as PRIMARY KEY.

Yet, all (or most) queries I'll do to the database will always filter by time, usually in the hours to a few weeks time window.

So my question is:

Should I use DateTime fields as part of the PRIMARY KEY in order to make my queries more efficient?

Answered by den-crane

Oct 23, 2023

My understanding is that it is NOT recommended to use high-cardinality fields as primary keys, for example, unique_ids as this would create too many granules as explained in this article:

No, nothing like this. Number of granules is not related to columns in keys, it's defined by a number of rows. 1 granule=8192 rows.
High cardinal rows are not recommended at the leading positions of the primary key , because they de-valuate other columns in the primary key, BUT only if you are omitting these high cardinal columns in the where section with equality expression. Also they may reduce compression rate.

So in the case order by (event_id, city, region, country) I would bluntly suggest to chan…

View full answer

den-crane · 2023-10-23T20:10:37Z

den-crane
Oct 23, 2023

My understanding is that it is NOT recommended to use high-cardinality fields as primary keys, for example, unique_ids as this would create too many granules as explained in this article:

No, nothing like this. Number of granules is not related to columns in keys, it's defined by a number of rows. 1 granule=8192 rows.
High cardinal rows are not recommended at the leading positions of the primary key , because they de-valuate other columns in the primary key, BUT only if you are omitting these high cardinal columns in the where section with equality expression. Also they may reduce compression rate.

So in the case order by (event_id, city, region, country) I would bluntly suggest to change it order by (country, region, city, event_id). But it's not actually a rule. https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/

CREATE TABLE wikistat
(
    `time` DateTime,
    `project` String,
    `subproject` String,
    `path` String,
    `hits` UInt64
)
ENGINE = MergeTree

I would definitely use order by (project, subproject, path, time) primary key (project, subproject) in this case.

For timeseries tables I suggest a very strange approach:

order by (toStartOfDay(time), metric_id, time) 
primary key (toStartOfDay(time), metric_id)

see discussion and examples #33056

0 replies

marioaae · 2023-10-24T11:05:17Z

marioaae
Oct 24, 2023
Author

Thanks for your answer, @den-crane

For timeseries tables I suggest a very strange approach:
order by (toStartOfDay(time), metric_id, time) 
primary key (toStartOfDay(time), metric_id)

Assuming I would use that key, I guess CH would know that next time I do a query with something like

WHERE time > <some-initial-DateTime)
AND time <some-end-DateTime>

it would only go search in those respective indexes?

Would you also suggest, on top of this, partitioning with something like:

PARTITION BY toYYYYMMDD(time)

1 reply

den-crane Oct 24, 2023

Assuming I would use that key, I guess CH would know that next time I do a query with something like
it would only go search in those respective indexes?

yes.

Would you also suggest, on top of this, partitioning with something like:
PARTITION BY toYYYYMMDD(time)

It depends. Clickhouse does not like big number of small partitions. There is a lot of problems related to it. Slow starts, broken mutations, slow queries.
For how long are you going to store data? days? years?

marioaae · 2023-10-25T12:35:22Z

marioaae
Oct 25, 2023
Author

It depends. Clickhouse does not like big number of small partitions. There is a lot of problems related to it. Slow starts, broken mutations, slow queries. For how long are you going to store data? days? years?

I plan to store data approximately 1 year. But queries will be usually in a few days to few weeks range.

2 replies

den-crane Oct 25, 2023

I plan to store data approximately 1 year. But queries will be usually in a few days to few weeks range.

Well, then you should test these

PARTITION BY toYYYYMMDD(time) order by (metric_id, time)

PARTITION BY toYYYYMMDD(time) order by (toStartOfHour(time), metric_id, time) primary key (toStartOfHour(time), metric_id)

marioaae Oct 25, 2023
Author

Thanks a lot, I will give this a try!

ceevaaa · 2023-11-10T18:59:43Z

ceevaaa
Nov 10, 2023

Hey @den-crane.

I myself is trying to figure out the indexing for a time-series data. Essentially my question is
"What should be the PRIMARY KEY and ORDER BY clause for my Time-Series data tables ?"

Clickhouse Specs

CPU - 16vCPU
RAM - 64GB
I/O Optimized Storage

AIM

I have an observability pipeline like this.

OpenTelemetry Collector (responsible for collecting, processing and exporting telemetry data)
Clickhouse Cluster telemetry data retention
Grafana for visualisation

It is expected that almost ~5 Billion rows of telemetry data (metrics, logs and traces combined) will be generated everyday, which needs to be stored. I need to retain the same for 90 days, and hence I am using the TTL as 90 days in the table definition.

Attaching a screenshot of current amount of data in the DB.

I use the native Clickhouse Exporter of Otel Collector to export our data in batches. Writing the data into the DB is not at all an issue. Everything works well here.

The Problem

The problem comes in reading telemetry data from Grafana, as explained below.

Table Definitions [Current]

Logs Table (Only sharing the logs table definition in full)

CREATE TABLE IF NOT EXISTS otel.otel_logs ON CLUSTER otel_cluster (
     Timestamp DateTime64(9) CODEC(Delta, ZSTD(1)),
     TraceId String CODEC(ZSTD(1)),
     SpanId String CODEC(ZSTD(1)),
     TraceFlags UInt32 CODEC(ZSTD(1)),
     SeverityText LowCardinality(String) CODEC(ZSTD(1)),
     SeverityNumber Int32 CODEC(ZSTD(1)),
     ServiceName LowCardinality(String) CODEC(ZSTD(1)),
     Body String CODEC(ZSTD(1)),
     ResourceSchemaUrl String CODEC(ZSTD(1)),
     ResourceAttributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
     ScopeSchemaUrl String CODEC(ZSTD(1)),
     ScopeName String CODEC(ZSTD(1)),
     ScopeVersion String CODEC(ZSTD(1)),
     ScopeAttributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
     LogAttributes Map(LowCardinality(String), String) CODEC(ZSTD(1)),
     INDEX idx_trace_id TraceId TYPE bloom_filter(0.001) GRANULARITY 1,
     INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
     INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
     INDEX idx_scope_attr_key mapKeys(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
     INDEX idx_scope_attr_value mapValues(ScopeAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
     INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
     INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter(0.01) GRANULARITY 1,
     INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
) ENGINE ReplicatedMergeTree('/clickhouse/tables/{shard}/{database}/{table}', '{replica}')
TTL toDateTime(Timestamp) + toIntervalDay(90)
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId)
SETTINGS index_granularity=8192, ttl_only_drop_parts = 1;

Traces Table

...
...
TTL toDateTime(Timestamp) + toIntervalDay(90)
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SpanName, toUnixTimestamp(Timestamp), TraceId)
SETTINGS index_granularity=8192, ttl_only_drop_parts = 1;
...
...

Query

SELECT * FROM otel.otel_traces WHERE Timestamp BETWEEN x AND y 
AND ServiceName='foo' AND ServiceVersion='bar'

SELECT * FROM otel.otel_logs WHERE Timestamp BETWEEN x AND y 
AND SeverityText='ERROR'

Querying on the above tables generally takes more than 60s and the query processes all the rows.

99% of the queries are going to be based on Timestamp, i.e the very first WHERE in all the queries is always going to be Timestamp clause. With this said, I would like to ask now how I can optimise the same ?

Questions

Q1. Will changing the ORDER BY and adding a PRIMARY KEY as you have suggested, help? Something like this -

ORDER BY (toStartOfHour(Timestamp), ServiceName, SeverityText,  TraceId, Timestamp)
PRIMARY KEY (toStartOfHour(Timestamp), ServiceName, SeverityText,  TraceId)

Q2. Since there might still be a ton of data in every hour of time, I was also thinking of doing something like this -

ORDER BY (toStartOfHour(Timestamp), toStartOfFifteenMinutes(Timestamp), ServiceName, SeverityText,  TraceId, Timestamp)
PRIMARY KEY (toStartOfHour(Timestamp),  toStartOfFifteenMinutes(Timestamp), ServiceName, SeverityText,  TraceId)

Does this actually help, or rather degrades the performance ?

Q3. I also have a case where the exact same data needs to stored in another Clickhouse cluster BUT with TTL as 2182 days ~ 5 Years. Should I change the PARTITION BY toDate(Timestamp) to PARTITION BY toMonth(Timestamp) for this ? And then accordingly change the ORDER BY and PRIMARY KEY clause ?

Any inputs will be greatly appreciated 🙏.

Thanks and Regards
Shiva Pundir

PS: Also Apologies for hijacking this issue thread for my own issue. I just felt, they were similar in nature, and so might help other people in the future.

2 replies

ayush-san May 2, 2024

@ceevaaa Did you benchmark the performance of these

ceevaaa May 2, 2024

Really don't remember the benchmarks, but this worked wonders.

TTL toDateTime(Timestamp) + toIntervalDay(90)
PARTITION BY toDate(Timestamp)
ORDER BY (toStartOfHour(Timestamp), ServiceName, SeverityText, TraceId, Timestamp)
PRIMARY KEY (toStartOfHour(Timestamp), ServiceName, SeverityText, TraceId)
SETTINGS index_granularity=8192, ttl_only_drop_parts = 1;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using DateTime in Primary Key for Time Series #55954

{{title}}

Replies: 4 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Using DateTime in Primary Key for Time Series #55954

marioaae Oct 23, 2023

Replies: 4 comments · 5 replies

den-crane Oct 23, 2023

marioaae Oct 24, 2023 Author

den-crane Oct 24, 2023

marioaae Oct 25, 2023 Author

den-crane Oct 25, 2023

marioaae Oct 25, 2023 Author

ceevaaa Nov 10, 2023

Clickhouse Specs

AIM

The Problem

Table Definitions [Current]

Query

Questions

ayush-san May 2, 2024

ceevaaa May 2, 2024

marioaae
Oct 23, 2023

Replies: 4 comments 5 replies

den-crane
Oct 23, 2023

marioaae
Oct 24, 2023
Author

marioaae
Oct 25, 2023
Author

marioaae Oct 25, 2023
Author

ceevaaa
Nov 10, 2023