Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Iceberg split manager threads configurable #22754

Merged
merged 1 commit into from
May 21, 2024

Conversation

wypb
Copy link
Contributor

@wypb wypb commented May 15, 2024

Description

We have an Iceberg table with a lot of metadata. Recently, we found that every time we query this table, the Coordinator's CPU usage will be very high. We checked and found that it is because when calling Iceberg's tableScan#planFiles() API, Iceberg's tableScan will use all cores available in the system (Runtime.getRuntime().availableProcessors()) for calculations. This PR allows us to control the number of threads used by Iceberg's tableScan on the Presto side.

BTW, this PR Cherry-pick from trinodb/trino@57f0081

The JMX metrics for executor used for Iceberg tableScan can be tracked at com.facebook.presto.iceberg:name=iceberg,type=icebergsplitmanager

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

Iceberg Connector Changes
* Add configuration of Iceberg split manager threads using the ``iceberg.split-manager-threads`` configuration property :pr:`22754`

CC: @tdcmeehan @yingsu00 @ChunxuTang @hantangwangd

@wypb wypb requested review from hantangwangd and a team as code owners May 15, 2024 08:16
@wypb wypb requested a review from presto-oss May 15, 2024 08:16
@hantangwangd
Copy link
Member

Thanks for the change. I notice that Iceberg use java system property iceberg.worker.num-threads or environment variable ICEBERG_WORKER_NUM_THREADS to control the worker pool for files planning. But use a Presto side configuration parameter seems better, so the change looks good to me. Can you please add the configuration parameter to our iceberg docs?

@wypb wypb requested a review from steveburnett as a code owner May 15, 2024 10:55
@wypb wypb force-pushed the iceberg_split_manager_threads branch from 03762df to 6bdf1c8 Compare May 15, 2024 10:56
@wypb
Copy link
Contributor Author

wypb commented May 15, 2024

@hantangwangd Thank you for your review. I have added this configuration parameter to the document.

@wypb wypb force-pushed the iceberg_split_manager_threads branch from 6bdf1c8 to 5560bba Compare May 15, 2024 13:05
@tdcmeehan
Copy link
Contributor

So that others can more easily know that they need to tune this value, can you please also export the metrics from this executor?

See an example in HivePartitionManager#getExecutor.

@tdcmeehan tdcmeehan self-assigned this May 15, 2024
steveburnett
steveburnett previously approved these changes May 15, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local docs build, looks good. Thanks!

@steveburnett
Copy link
Contributor

Suggest revising the release note entry based on the Order of changes and Phrasing of the Release Notes Guidelines:

== RELEASE NOTES ==

Iceberg Connector Changes
* Add configuration of Iceberg split manager threads using the ``iceberg.split-manager-threads`` configuration property :pr:`22754`

@wypb
Copy link
Contributor Author

wypb commented May 15, 2024

Hi @tdcmeehan Thank you for your review. I followed your suggestion and added the Executor information to JMX metrics.

presto:iceberg_velox_test> show tables in jmx.current like '%iceberg%';
                                       Table
-----------------------------------------------------------------------------------
 com.facebook.presto.hive.metastore.thrift:name=iceberg,type=thrifthivemetastore
 com.facebook.presto.hive.metastore:name=iceberg,type=inmemorycachinghivemetastore
 com.facebook.presto.hive.metastore:name=iceberg,type=metastorecachestats
 com.facebook.presto.hive.s3:name=iceberg,type=prestos3filesystem
 com.facebook.presto.iceberg:name=iceberg,type=icebergsplitmanager
 com.facebook.presto.iceberg:name=icebergfilewriterfactory
(6 rows)

Query 20240515_143707_00005_gazur, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
[Latency: client-side: 0:01, server-side: 0:01] [6 rows, 520B] [6 rows/s, 577B/s]

presto:iceberg_velox_test> select * from jmx.current."com.facebook.presto.iceberg:name=iceberg,type=icebergsplitmanager";
 executor.activecount | executor.allowcorethreadtimeout | executor.completedtaskcount | executor.corepoolsize | executor.keepalivetime | executor.largestpoolsize | executor.maximumpoolsize | executor.poolsize | executor.queuedtaskcount |       >
----------------------+---------------------------------+-----------------------------+-----------------------+------------------------+--------------------------+--------------------------+-------------------+--------------------------+------->
                    0 | false                           |                           2 |                    10 | 0.00ns                 |                        2 |                       10 |                 2 |                        0 | java.u>
(1 row)

tdcmeehan
tdcmeehan previously approved these changes May 15, 2024
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

Comment on lines 359 to 360

``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing, should configuration properties add here? See Configuration Properties chapter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, and good catch! I have a question for you:

Is the iceberg.split-manager-threads property related to Manifest File Caching?

  • If yes, then I think it should stay where it is.
  • If no, then I suggest moving the iceberg.split-manager-threads property to the table in Configuration Properties that you noticed.

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not related to Manifest File Caching, so I think it's better to be placed in Configuration Properties.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hantangwangd @steveburnett I read the documentation and found that I had indeed added the wrong place, and I have corrected it. Thanks.

hantangwangd
hantangwangd previously approved these changes May 16, 2024
Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wypb Thanks for your work! Just left a minor suggestion.

@@ -60,6 +60,7 @@ public class IcebergConfig
private long maxManifestCacheSize = IO_MANIFEST_CACHE_MAX_TOTAL_BYTES_DEFAULT;
private long manifestCacheExpireDuration = IO_MANIFEST_CACHE_EXPIRATION_INTERVAL_MS_DEFAULT;
private long manifestCacheMaxContentLength = IO_MANIFEST_CACHE_MAX_CONTENT_LENGTH_DEFAULT;
private int splitManagerThreads = Runtime.getRuntime().availableProcessors();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that they used Runtime.getRuntime().availableProcessors() * 2 as the default value, probably because each processor core commonly supports 2 threads (hyper-threading). Are there any specific reasons for your setting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default value is to be consistent with the default value of Iceberg's iceberg.worker.num-threads attribute:https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SystemConfigs.java#L38-L43

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for the clarification.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving the property to the best location! One final nit just noticed today, about capitalizing Iceberg.

@@ -247,6 +247,8 @@ Property Name Description
metadata optimization is skipped.

Set to ``0`` to disable metadata optimization.

``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``iceberg.split-manager-threads`` Number of threads to use for generating iceberg splits. ``Number of available processors``
``iceberg.split-manager-threads`` Number of threads to use for generating Iceberg splits. ``Number of available processors``

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @steveburnett, I've already fixed it.

ChunxuTang
ChunxuTang previously approved these changes May 16, 2024
@@ -60,6 +60,7 @@ public class IcebergConfig
private long maxManifestCacheSize = IO_MANIFEST_CACHE_MAX_TOTAL_BYTES_DEFAULT;
private long manifestCacheExpireDuration = IO_MANIFEST_CACHE_EXPIRATION_INTERVAL_MS_DEFAULT;
private long manifestCacheMaxContentLength = IO_MANIFEST_CACHE_MAX_CONTENT_LENGTH_DEFAULT;
private int splitManagerThreads = Runtime.getRuntime().availableProcessors();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for the clarification.

ZacBlanco
ZacBlanco previously approved these changes May 17, 2024
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but since it is a cherry-pick from trino, I think you should credit the original author as the co-author in the commit.

@wypb wypb dismissed stale reviews from ZacBlanco, ChunxuTang, and hantangwangd via cfc5671 May 20, 2024 05:36
@wypb wypb force-pushed the iceberg_split_manager_threads branch 2 times, most recently from d9591b8 to 070f748 Compare May 20, 2024 05:49
Copy link

linux-foundation-easycla bot commented May 20, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: wypb / name: wypb (51b619b)

@wypb
Copy link
Contributor Author

wypb commented May 20, 2024

Hi @ZacBlanco Thank you for your reply, I've added the co-author to the commit log.

@wypb wypb force-pushed the iceberg_split_manager_threads branch from 070f748 to 2d0fec1 Compare May 20, 2024 05:52
@wypb wypb force-pushed the iceberg_split_manager_threads branch from 2d0fec1 to 51b619b Compare May 20, 2024 07:24
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, local build, everything looks good. Thanks!

@hantangwangd hantangwangd merged commit ddf6f39 into prestodb:master May 21, 2024
57 checks passed
@wypb wypb deleted the iceberg_split_manager_threads branch May 22, 2024 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants