Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ColumnChunk statistics for zone mapping #2611

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

benjaminwinger
Copy link
Collaborator

I went a little in circles on this, trying to reduce the mess of switch cases, settling on ones using macros which I'm not overly fond of, but gets the job done at no runtime cost. I think it might be possible to create something equivalent using templates, but it's tricky.

This replaces the compression-specific data in CompressionMetadata with minimum and maximum statistics (which I considered moving out of there, but I think I prefer to keep the data separate as much as possible, since the minimum, maximum and compression type are all that's needed at the moment for the compression-related functions).

More tests are needed, including tests of tests updating and deleting nodes, and rel versions of the tests. Updates when writing don't seem to be correct as some tests are failing.
It might be useful to profile and see how much of an impact calculating the bitpacking info from the min/max has. It should be pretty fast, but also is being done for every decompressFromPage call now.

@benjaminwinger benjaminwinger force-pushed the chunk-stats branch 2 times, most recently from 58573db to 0e78482 Compare January 9, 2024 19:19
Copy link

codecov bot commented Jan 9, 2024

Codecov Report

Attention: Patch coverage is 86.63594% with 29 lines in your changes are missing coverage. Please review.

Project coverage is 93.36%. Comparing base (2518d65) to head (4cf7675).
Report is 16 commits behind head on master.

Current head 4cf7675 differs from pull request most recent head c83a515

Please upload reports for the commit c83a515 to get more accurate results.

Files Patch % Lines
src/storage/compression/compression.cpp 83.33% 16 Missing ⚠️
src/include/storage/compression/compression.h 83.33% 5 Missing ⚠️
src/include/storage/store/string_column.h 0.00% 3 Missing ⚠️
src/storage/store/string_column.cpp 84.21% 3 Missing ⚠️
src/function/table_functions/call_functions.cpp 88.88% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2611      +/-   ##
==========================================
+ Coverage   90.01%   93.36%   +3.34%     
==========================================
  Files        1190     1073     -117     
  Lines       42956    40471    -2485     
==========================================
- Hits        38668    37786     -882     
+ Misses       4288     2685    -1603     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/function/table_functions/call_functions.cpp Outdated Show resolved Hide resolved
test/test_files/update_node/stats.test Outdated Show resolved Hide resolved
test/test_files/update_node/stats.test Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add tests for MANY_TO_MANY rels?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some very simple many to many tests (identical to the one to one). They are failing, along with a bunch of other rel tests; I don't think the statistics are working properly for those columns, which breaks how they are compressed and produces nonsense values.

src/include/storage/store/string_column.h Outdated Show resolved Hide resolved
src/function/table_functions/call_functions.cpp Outdated Show resolved Hide resolved
@benjaminwinger benjaminwinger force-pushed the chunk-stats branch 3 times, most recently from 0a2e51e to 65bc5b8 Compare February 14, 2024 22:33
@benjaminwinger benjaminwinger force-pushed the chunk-stats branch 3 times, most recently from b0ac414 to 12411e0 Compare May 27, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants