New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many small data files created for a table, unable to optimize the number of data files #21808
Labels
iceberg
Iceberg connector
Comments
I tried to reproduce exactly the scenario you pointed out and did have initially the following files
After
|
After trying some things, I think it might be related to the number of workers in the cluster. When I scaled down to a single worker I was able to optimize down to one file, but with 3 workers I could only get it down to 3 files. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Cluster: 1 coordinator, 3 workers
Trino version: 441
Connector: iceberg
Hello! I'm running a query to create a new iceberg table from an existing iceberg table. Something like this:
The data is small after compression (< 100MB), so I was sort of expecting there to only be one data file for the table, but I'm noticing that the data is being split into multiple data files:
After this, I tried running:
as the docs mention:
However even after running that command multiple times, it never merges below 3 files, despite them all being less than 100MB:
Reproduction steps:
The text was updated successfully, but these errors were encountered: