Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance test (against BQ) #112

Open
danthegoodman1 opened this issue Sep 8, 2023 · 1 comment
Open

Performance test (against BQ) #112

danthegoodman1 opened this issue Sep 8, 2023 · 1 comment
Labels
documentation Improvements or additions to documentation

Comments

@danthegoodman1
Copy link
Owner

danthegoodman1 commented Sep 8, 2023

Github events has 232M rows and lots of example queries: https://ghe.clickhouse.tech/

https://clickhouse.com/docs/en/getting-started/example-datasets/nyc-taxi has 3B rows but is smaller in size can do this too and much less complex schema

Should run the same queries against. We can leave bigquery for 5+ hours (maybe even days) to merge, and fully merge up icedb. Record the performance of inserting and merging too, probably far faster than bigquery.

Then record the same quieries. The data scanned, the storage price, and query times, and the total query price.

For IceDB we will run it all on a single node, the largest ec2 instance we can get, connected to an s3 vpc endpoint with no auth.

@danthegoodman1 danthegoodman1 added the documentation Improvements or additions to documentation label Sep 8, 2023
@danthegoodman1
Copy link
Owner Author

With IceDB should test:

  • ClickHouse just reading all current data files before merge
  • ^^ through s3 proxy
  • merge and read raw data files
  • ^^ through s3 proxy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant