Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Support Native JSON columns in Clickhouse #112

Open
navinpai opened this issue Jul 14, 2022 · 1 comment
Open

[Feature]: Support Native JSON columns in Clickhouse #112

navinpai opened this issue Jul 14, 2022 · 1 comment
Labels
enhancement New feature or request

Comments

@navinpai
Copy link

Requirement

As a Clickhouse analytics user, I want the clickhouse-jaeger schema to allow using Clickhouse native JSON columns so that we can query data in clickhouse more efficiently (both in terms of performance and query simplicity)

Problem

Currently, Clickhouse-Jaeger stores JSON span data as a string column-type, which makes it quite verbose to have to query based on fields within the column using Clickhouse's JSON functions , especially if you get past 2 levels of nesting.

This is very evident, when you want to query the ingested data to generate your own analytics/insights. It would be nice if jaeear-clickhouse added support for Clickhouse native JSON columns

Proposal

A solution may be to start providing support for the native JSON datatype (It's still "experimental", but the spec has been quite stable for a while)

Open questions

The major open question is how this would affect the split between protobuf and json encoded data (currently, string supports both) and whether it'll add more complexities to the project. Need to observe more to see the impact of this, but wanted to raise this with the community/maintainers to get an idea of their thoughts.

@navinpai navinpai added the enhancement New feature or request label Jul 14, 2022
@chhetripradeep
Copy link
Contributor

@navinpai I agree from read perspective, reading a field from a JSON datatype is faster than reading a field from String datatype but from write perspective, inserting JSON columns are more expensive & hence slower than inserting String columns. And generally in metrics/logs/tracing system, we do far more writes than reads, so i feel string datatype is more appropriate. Feel free to correct me if my understanding is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants