-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ColumnChunk statistics for zone mapping #2611
base: master
Are you sure you want to change the base?
Conversation
58573db
to
0e78482
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2611 +/- ##
==========================================
+ Coverage 90.01% 93.36% +3.34%
==========================================
Files 1190 1073 -117
Lines 42956 40471 -2485
==========================================
- Hits 38668 37786 -882
+ Misses 4288 2685 -1603 ☔ View full report in Codecov by Sentry. |
4270421
to
7987b02
Compare
7987b02
to
143d9a2
Compare
9684983
to
4cf7675
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add tests for MANY_TO_MANY rels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some very simple many to many tests (identical to the one to one). They are failing, along with a bunch of other rel tests; I don't think the statistics are working properly for those columns, which breaks how they are compressed and produces nonsense values.
0a2e51e
to
65bc5b8
Compare
b0ac414
to
12411e0
Compare
12411e0
to
c83a515
Compare
I went a little in circles on this, trying to reduce the mess of switch cases, settling on ones using macros which I'm not overly fond of, but gets the job done at no runtime cost. I think it might be possible to create something equivalent using templates, but it's tricky.
This replaces the compression-specific data in CompressionMetadata with minimum and maximum statistics (which I considered moving out of there, but I think I prefer to keep the data separate as much as possible, since the minimum, maximum and compression type are all that's needed at the moment for the compression-related functions).
More tests are needed, including tests of tests updating and deleting nodes, and rel versions of the tests. Updates when writing don't seem to be correct as some tests are failing.
It might be useful to profile and see how much of an impact calculating the bitpacking info from the min/max has. It should be pretty fast, but also is being done for every
decompressFromPage
call now.