[Improvement] Difference in performance of STS & Kyuubi thrift server #6345

prathit06 · 2024-04-29T04:29:52Z

Code of Conduct

I agree to follow this project's Code of Conduct

Search before asking

I have searched in the issues and found no similar issues.

What would you like to be improved?

While trying to use kyuubi with tableau through thrift server exposed by kyuubi, we have noticed that for small dataset everything works fine. But for larger dataset when the collect operation is called, data transfer fails due to driver going OOM. For the same reason kyuubi exposes kyuubi.operation.incremental.collect flag which can be set to true for incremental collect. While this looks like an ideal solution (which it is) but there are performance bottlenecks when using this flag.

I ran same query through STS & Kyuubi thrift server

Query data size :

Data size : ~5Gbs
Records count : 32854241

Results as below with spark.sql.shuffle.partitions = 50 :

STS took 12-15 mins to run the job & transfer the data to tableau.
Kyuubi thrift was left to run for 2+ hours & transfered data size was only ~600Mbs

As can be seen there is significant performance difference between the two.

Kyuubi version : 1.9
Spark Version : 3.1.2
Running kyuubi on AWS EMR (version 6.5.0) only on primary node

kyuubi-defaults.conf config

kyuubi.ha.addresses ..compute.internal
kyuubi.operation.incremental.collect true
spark.submit.deployMode cluster # ( have tried with client as well )
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.scheduler.mode FAIR
spark.rdd.compress true
spark.shuffle.service.enabled true
spark.sql.hive.convertMetastoreParquet false
spark.sql.catalogImplementation hive
spark.sql.shuffle.partitions 50
spark.kryoserializer.buffer.max 1g
spark.driver.maxResultSize 25g
spark.driver.memory 35g
spark.executor.memory 25g
spark.driver.memoryOverhead 4g
spark.executor.memoryOverhead 3g
spark.cleaner.periodicGC.interval 10min

How should we improve?

Upon looks at STS & Kyuubi code, i could see a lot of similarities but also differences here & there. One major point i noticed is that some logs were missing from kyuubi, which could help to notice what is happening in kyuubi during data transfer.

For e.g.
Below logs were getting printed in STS but not in kyuubi

24/04/24 06:49:51 INFO SparkExecuteStatementOperation: Received getNextRowSet request order=FETCH_NEXT and maxRowsL=10000 with a1d07d3a-d6bb-4706-99c0-728fa8115816
24/04/24 06:49:52 INFO SparkExecuteStatementOperation: Returning result set with 10000 rows from offsets [1320000, 1330000) with a1d07d3a-d6bb-4706-99c0-728fa8115816

To investigate further, maybe adding more logs is a good start to check what exactly is happening & where is the bottleneck.

Please feel free to suggest or ask for any other additional information if needed.

Are you willing to submit PR?

Yes. I would be willing to submit a PR with guidance from the Kyuubi community to improve.
No. I cannot submit a PR at this time.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-04-29T04:30:16Z

Hello @prathit06,
Thanks for finding the time to report the issue!
We really appreciate the community's efforts to improve Apache Kyuubi.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement] Difference in performance of STS & Kyuubi thrift server #6345

[Improvement] Difference in performance of STS & Kyuubi thrift server #6345

prathit06 commented Apr 29, 2024 •

edited

github-actions bot commented Apr 29, 2024

[Improvement] Difference in performance of STS & Kyuubi thrift server #6345

[Improvement] Difference in performance of STS & Kyuubi thrift server #6345

Comments

prathit06 commented Apr 29, 2024 • edited

Code of Conduct

Search before asking

What would you like to be improved?

How should we improve?

Are you willing to submit PR?

github-actions bot commented Apr 29, 2024

prathit06 commented Apr 29, 2024 •

edited