Add support Clickhouse database as source and sink #342

Delphin1 · 2023-10-02T15:33:51Z

It will be great if Arroyo also will be able to work with Clickhouse.

kzk2000 · 2023-10-07T21:49:01Z

FWIW, if your data is already on Kafka, it's trivial to sync

Kafka to Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka
Kafka from Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka

that said, syncing Arroyo stream to Clickhouse without Kafka would be cool indeed for long-term storage.

MuhtasimTanmoy · 2023-11-02T21:09:59Z

What would the high-level design be for implementing this feature and testing procedure?
Looks like a cool one.

kzk2000 · 2023-11-02T22:17:39Z

Clickhouse has various integrations for data ingestions, Kafka as mentioned above is just one of them.

I'm no expert but maybe any of these https://clickhouse.com/docs/en/integrations -> search for "Data ingestion" work well together with what Arroyo.dev already has.

marvin-hansen · 2024-03-04T10:32:06Z

Maybe try remote select?

Something like:

SELECT * FROM remote('127.0.0.1', db.remote_engine_table) LIMIT 3;

With that in place, you can simply run a remote insert into a ClickHouse table via the tcp protocol.

This might be easier and faster to implement than a full-blown integration.

Provide feedback