Benchmark regression alp #7

seb711 · 2024-04-12T11:12:58Z

This is a super bulky PR; therefore it is only in draft. I think it needs a few adjustments until it can be merged. Following a list of thoughts and things that need to be done:

for testing and benchmarking reasons I changed decompression-speed and csvtobtr => new files should be created with their content
ALP and PFOR (for int64) was added and checked with the PBI dataset; i also included a file to benchmark/test the implementation (tools/regression-benchmark/column_benchmark.cpp) => this tests compression/decompression and checks the contents after decompression; it also outputs PerfEvent measurement
the INT64 schemes and architecture is the work of Maximilian Kuschewski and it is not sure if there could be more work done to template the schemes/refactorment
for completeness I included also the AlpRD implementation; in my tests AlpRD was throwing an SEGFAULT with the dataset Bimbo/1/Bimbo_1/5_Dev_proxima.double; therefore i disabled it for now (i didnt had the time to look into it)
also the implementation of FastPFOR int64 is very sparse and opted for working

otherwise this PR works and you can now use integer64 values and compress your data with the novel ALP approach; tests yield that you get way better compression factors for double values.

short note on the ALP implementation:

initially i compressed the whole block of 2^16 values. after consideration of performance i modified the logic so that the compression happens batchwise with batches of size 2^12. i also tested a bit with simd (see commit history), but on the one hand i couldnt really see a performance improvement in decompression and on the other side i introduced bugs when casting vectors of ints to doubles. therefore no simd at the time.

Alp

seb711 and others added 30 commits March 12, 2024 15:53

added a script to get the latest regression results

eefceff

SC2115 – ShellCheck

c6e0e5c

added alp and int64

b642e9b

added alp and int64

937d052

simded alp decompression and a bit of refactoring

c26b62d

Merge pull request #2 from seb711/alp

6d46b66

Alp

add alp to default schemes

fb7242c

removed unnecessary cout

0d13c74

removed unnecessary cout

69be115

fixed simd bug

c747ee4

fixed decompression-speed.cpp

7816c09

old tbb version fix

00cf8c0

old tbb version fix

3c8bfa3

fixed decompression script

eb9459c

fixed getDecompressedDataSize function

cfe5105

fixed csvtobtr

4a05359

fixed decompression-speed

516d629

fixed csvtobtr

d71ce72

include alp

2745a45

alp fixes

f90aaa7

enhance with some SIMD love

1b5c599

enhance with some SIMD love

4d4c766

enhance with some SIMD love

42dc9f6

revert changes

70074b4

revert changes

60f6d00

revert changes

f6999ab

revert changes

b9e4f5d

instance was too small

072bd91

yea simd not working right now

fe5a424

bugfixing

3a9edf9

seb711 added 30 commits March 25, 2024 22:47

remove stdout

8dcf79e

updated fastpfor

c2f3516

updated fastpfor

2ac649a

updated fastpfor

010bad0

mem leak

998e7a0

mem leak

e6d9561

address sanitiyer

3bde7af

address sanitiyer

1e5a193

alp fix

4ed8c7d

alp fix

dbe8ec8

debug

57c4096

remove debug logs

3367656

refactored to meet btrblocks code style

0324a42

wip added 64bit fastpfor to btrblocks to enhance alp performance

36e5221

added a script to benchmark double columns

bdd26af

suppress aws cli prints

2dfc7a8

fixed buffer overflow bug

d4b7d33

modified fastpfor blocklength

8f29b46

fixed pbp

8bd8b8c

calculate correct compressed size

7ad1cf4

reset old parameters

4bb1f84

fixed pointer related bug

1c1815e

fix fastpfor related "bugs"

892e7b4

added minor changes to pbp

f3618d4

added column_benchmark real runtime and alp paper columns

de46698

SIMD alp loading and decoding values

243bcc7

remove SIMD alp loading and decoding values due to bugs

777853f

add runtime to decompression

5e7bf6c

add ALPRD but disable it

8cdbec0

added memory import

e7ace34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark regression alp #7

Benchmark regression alp #7

seb711 commented Apr 12, 2024

Benchmark regression alp #7

Are you sure you want to change the base?

Benchmark regression alp #7

Conversation

seb711 commented Apr 12, 2024