-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand databases benchmarked #3
Comments
I definitely want to add these. The tricky part of this is that you have to make a bunch of choices when you put the data in S3. Even if we just decide to go with Parquet, we have to decide how big to make the files and the blocks within the files. So it will take some fiddling around to be sure we're being fair to Athena/Spectrum. |
I agree. Apples to apples is hard - if not impossible. Maybe I'm atypical, but as far as I'm concerned defaults or a (very) limited parameter search is reasonable. Unlike with Redshift's sort keys, where Amazon makes a big deal about optimizing for your query-type, Athena and Spectrum both seem to be advertised more or less as 'drop on S3 and go'. It seems fair to take them more or less at their word for the benchmark or a very limited search of the parameter space. |
@russellpierce as I mentioned in #13 Athena/Spectrum have been shown elsewhere to be so sensitive to file types that getting performance to approach that of normal warehouses requires significant work, as well as that Athena and Spectrum are at best par with bigquery after all optimization is done (as you were hinting at when you said "bigquery-like") For that reason I will be closing this issue (for now!) |
Athena and Redshift Spectrum are BigQuery-like. One wonders hope they would stack up in this comparison.
The text was updated successfully, but these errors were encountered: