New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28268: Iceberg: Retrieve row count from iceberg SnapshotSummary in case of iceberg.hive.keep.stats=false #5215
Open
zhangbutao
wants to merge
3
commits into
apache:master
Choose a base branch
from
zhangbutao:iceberg_count_optimize
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ceberg.hive.keep.stats=false
asf-ci-hive
added
tests pending
tests failed
and removed
tests unstable
tests pending
labels
Apr 26, 2024
zhangbutao
force-pushed
the
iceberg_count_optimize
branch
from
April 26, 2024 04:45
0c14207
to
1a953e8
Compare
asf-ci-hive
added
tests pending
tests unstable
and removed
tests failed
tests pending
labels
Apr 26, 2024
zhangbutao
force-pushed
the
iceberg_count_optimize
branch
from
April 28, 2024 09:22
1a953e8
to
77d9a7e
Compare
zhangbutao
commented
Apr 28, 2024
@@ -237,19 +237,19 @@ STAGE PLANS: | |||
alias: ice01 | |||
filterExpr: (a = 22) (type: boolean) | |||
Snapshot ref: branch_test1 | |||
Statistics: Num rows: 3 Data size: 291 Basic stats: COMPLETE Column stats: COMPLETE | |||
Statistics: Num rows: 5 Data size: 485 Basic stats: COMPLETE Column stats: COMPLETE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, we always get row count
of branch/tag/timetravel by the current snapshot summary, which is not right.
zhangbutao
force-pushed
the
iceberg_count_optimize
branch
from
April 28, 2024 09:44
77d9a7e
to
441db00
Compare
asf-ci-hive
added
tests failed
tests pending
and removed
tests pending
tests failed
labels
Apr 28, 2024
zhangbutao
force-pushed
the
iceberg_count_optimize
branch
from
April 29, 2024 05:15
441db00
to
9971db5
Compare
zhangbutao
force-pushed
the
iceberg_count_optimize
branch
from
April 30, 2024 02:34
9971db5
to
0ffc9df
Compare
asf-ci-hive
added
tests pending
tests failed
and removed
tests unstable
tests pending
tests failed
labels
Apr 30, 2024
Quality Gate passedIssues Measures |
zhangbutao
changed the title
Iceberg: Retrieve row count from iceberg SnapshotSummary in case of iceberg.hive.keep.stats=false
HIVE-28268: Iceberg: Retrieve row count from iceberg SnapshotSummary in case of iceberg.hive.keep.stats=false
May 21, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
At present, in case of
iceberg.hive.keep.stats=true
&hive.compute.query.using.stats=true
, HS2 will do a fetch task to get iceberg table'snumRows
property from HMS to optimizecount
query.If
iceberg.hive.keep.stats=false
, HS2 will always launch tez task to compute table's row count when filing acount
query.However, as we know, iceberg table's metadata has some stats information, we can also just start a fetch task to retrieve the row count from iceberg's snapshot summary when
iceberg.hive.keep.stats=false
or no stats stored in hms. This can avoid launching tez task to compute the table's row count.BTW, timetravel or branch/tag has different stats from current snapshot, so we need to get the specified snapshotid based on the different iceberg version. Otherwise, we will get the wrong stats when querying the time travel/branch/tag.
Why are the changes needed?
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Qtest