Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark regression alp #7

Draft
wants to merge 62 commits into
base: master
Choose a base branch
from

Conversation

seb711
Copy link
Contributor

@seb711 seb711 commented Apr 12, 2024

This is a super bulky PR; therefore it is only in draft. I think it needs a few adjustments until it can be merged. Following a list of thoughts and things that need to be done:

  • for testing and benchmarking reasons I changed decompression-speed and csvtobtr => new files should be created with their content
  • ALP and PFOR (for int64) was added and checked with the PBI dataset; i also included a file to benchmark/test the implementation (tools/regression-benchmark/column_benchmark.cpp) => this tests compression/decompression and checks the contents after decompression; it also outputs PerfEvent measurement
  • the INT64 schemes and architecture is the work of Maximilian Kuschewski and it is not sure if there could be more work done to template the schemes/refactorment
  • for completeness I included also the AlpRD implementation; in my tests AlpRD was throwing an SEGFAULT with the dataset Bimbo/1/Bimbo_1/5_Dev_proxima.double; therefore i disabled it for now (i didnt had the time to look into it)
  • also the implementation of FastPFOR int64 is very sparse and opted for working

otherwise this PR works and you can now use integer64 values and compress your data with the novel ALP approach; tests yield that you get way better compression factors for double values.

short note on the ALP implementation:

  • initially i compressed the whole block of 2^16 values. after consideration of performance i modified the logic so that the compression happens batchwise with batches of size 2^12. i also tested a bit with simd (see commit history), but on the one hand i couldnt really see a performance improvement in decompression and on the other side i introduced bugs when casting vectors of ints to doubles. therefore no simd at the time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant