Do you guys have benchmark result for Athena in the plan? #13

syang · 2021-05-13T17:15:02Z

Given that Athena as a major 'serverless' data query engine, it would be great if you guys can put them into perspective.

Any thoughts?

mike-weinberg · 2021-07-16T18:32:47Z

Hey @syang, right now the focus is on converting the benchmark to run in DBT so that it is easier to

re-run the benchmark yourself and make changes as you see fit for your purposes
contribute new backends to the benchmark and get your name on an open source benchmark!

In truth, Athena's architecture means that it is certain to be slower than redshift. In general it may be better to think of athena less as a serverless data warehouse and more as serverless a data lake processing engine for companies that originally built out their data infrastructure on HDFS or S3. As a result, I think the decision to use Athena vs a more traditional cloud data warehouse should be based more on compatibility with existing infrastructure and less on performance, since Athena is not really intended to have the same performance characteristics as Redshift, Snowflake, BQ, et al, since it is dramatically more dependent on upstream optimization decisions like file-types, file size, parquet block configuration, etc which are entirely obscured in traditional warehouse systems.

That being said, I don't want you to feel like I'm waving my hands to get away with not writing an Athena benchmark. As you said, Athena is fully serverless, and the closest equivalent to it is probably BigQuery. Fundamentally, bigquery is just a really tightly controlled implementation of a similar architecture to athena, so we should expect a highly optimized athena implementation to perform similarly to bigquery, and in fact this is exactly what we see in a benchmark from the highly specialized data-lake-ingestion-optimization platform "Upsolver". In this benchmark they find that after optimizing for storage concerns, athena is basically equivalent to bigquery for normal looking sql.

Given this, I think it's safe to assume that Bigquery acts as a proxy for best-case athena performance, and so by reading Fivetran's benchmark you can implicitly compare purpose-built warehouses to so called "lakehouses" like athena, presto, etc.

I really hope this helps ! If you have further general questions about data warehouses, please find me on DBT Slack =)

mike-weinberg mentioned this issue Jul 16, 2021

Expand databases benchmarked #3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you guys have benchmark result for Athena in the plan? #13

Do you guys have benchmark result for Athena in the plan? #13

syang commented May 13, 2021

mike-weinberg commented Jul 16, 2021

Do you guys have benchmark result for Athena in the plan? #13

Do you guys have benchmark result for Athena in the plan? #13

Comments

syang commented May 13, 2021

mike-weinberg commented Jul 16, 2021