Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

60x compression! #250

Open
rsignell-usgs opened this issue Dec 1, 2021 · 16 comments
Open

60x compression! #250

rsignell-usgs opened this issue Dec 1, 2021 · 16 comments

Comments

@rsignell-usgs
Copy link

https://www.nature.com/articles/s43588-021-00156-2.pdf
Implemented in Julia, but could be cool for NCO in the future.

@czender
Copy link
Member

czender commented Dec 1, 2021

Yes, Rich, these are amazing results. FYI, Milan Klower is the invited speaker to our upcoming EGU session on lossy compression...
ESSI2.7: Lossy and Lossless Compression for Greener Geoscientific Computing and Data Storage
Conveners: Charlie Zender (UC Irvine), Edward Hartnett (NOAA), Bryan N. Lawrence (NCAS, U. Reading), V. Balaji (Princeton)
https://meetingorganizer.copernicus.org/EGU22/session/42046
Lots of new lossy features making their way into NCO/CCR/netCDF these days :)

@rsignell-usgs
Copy link
Author

Awesome, good to know you are tracking this Charlie!

@pnorton-usgs
Copy link

Are there any plans to implement the bit information and the bit rounding routines in a future NCO release?

@czender
Copy link
Member

czender commented Dec 10, 2021

Unfortunately we do not have the resources to implement these features ourselves, though we would certainly welcome the patches. Klower's methods are a departure from the existing lossy compression algorithms in NCO where the user specifies the acceptable lossiness. The new method would require a substantial amount of code to ascertain the number of keep bits based on the information content of a variable. Only the final IEEE rounding part is completely in NCO (and is used in BitGroom and Granular BitGroom).

@rsignell-usgs
Copy link
Author

rsignell-usgs commented Dec 10, 2021

@czender, I ran a modified version of the @milankl Julia bit information code example and determined that for my variable T2, for 1% information loss I should keep 10 mantissa bits.

If I then wanted to use NCO to apply this to my data, what value would I specify to the --ppc bit-grooming option?

From your bitgrooming paper:

To guarantee preserving 1–7 digits of precision, Bit Grooming must retain 5,8,11,15,18,21, and 25 explicit mantissa bits, respectively

BitInformation Mantissa Keepbits NCO --ppc
5 1
8 2
11 3
15 4
18 5
21 6
25 7

So if I need to keep 10 mantissa bits, I would want a value of 3, which actually keeps 11, right?

ncks  --ppc T2=3 data.nc data_groomed.nc 

Is this right?

@czender
Copy link
Member

czender commented Dec 10, 2021

Indeed, this is the correct table for the original BitGroom. However, the latest NCO defaults to Granular BitGroom (GBG), which is more aggressive and guarantees preserving the same number of significant digits with fewer bits than BG. GBG documentation is still lacking, sorry. In debugging mode GBG prints the number of keep bits is retains as the pbxr value ('precision binary explicit required'), e.g.,

zender@sastrugi:~$ ncks -O -7 -C -D 6 -v one_dmn_rec_var_flt --ppc default=3 ~/nco/data/in.nc ~/foo.nc 2>/dev/null | grep pbxr
ncks: 1 = 0.5 * 2^1, dgt_nbr = 0, qnt_pwr = -10, pbxr = 11, bxnz = 12, qnt_val = 1, qnt_prx = 0
ncks: 2 = 0.5 * 2^2, dgt_nbr = 1, qnt_pwr = -7, pbxr = 9, bxnz = 14, qnt_val = 2, qnt_prx = 0
ncks: 3 = 0.75 * 2^2, dgt_nbr = 1, qnt_pwr = -7, pbxr = 8, bxnz = 15, qnt_val = 3, qnt_prx = 0
ncks: 4 = 0.5 * 2^3, dgt_nbr = 1, qnt_pwr = -7, pbxr = 10, bxnz = 13, qnt_val = 4, qnt_prx = 0
ncks: 5 = 0.625 * 2^3, dgt_nbr = 1, qnt_pwr = -7, pbxr = 9, bxnz = 14, qnt_val = 5, qnt_prx = 0
ncks: 6 = 0.75 * 2^3, dgt_nbr = 1, qnt_pwr = -7, pbxr = 9, bxnz = 14, qnt_val = 6, qnt_prx = 0
ncks: 7 = 0.875 * 2^3, dgt_nbr = 1, qnt_pwr = -7, pbxr = 9, bxnz = 14, qnt_val = 7, qnt_prx = 0
ncks: 8 = 0.5 * 2^4, dgt_nbr = 1, qnt_pwr = -7, pbxr = 11, bxnz = 12, qnt_val = 8, qnt_prx = 0
ncks: 9 = 0.5625 * 2^4, dgt_nbr = 1, qnt_pwr = -7, pbxr = 10, bxnz = 13, qnt_val = 9, qnt_prx = 0
ncks: 10 = 0.625 * 2^4, dgt_nbr = 1, qnt_pwr = -7, pbxr = 10, bxnz = 13, qnt_val = 10, qnt_prx = 0

You can see that NSD=3 keeps 11 mantissa bits for two values, 10 for three values, 9 for four values and 8 for one value. This is why it is called "Granular"---the bitmask changes for every value. NSD=4 keeps too many bits for most numbers:

zender@sastrugi:~$ ncks -O -7 -C -D 6 -v one_dmn_rec_var_flt --ppc default=4 ~/nco/data/in.nc ~/foo.nc 2>/dev/null | grep pbxr
ncks: 1 = 0.5 * 2^1, dgt_nbr = 0, qnt_pwr = -14, pbxr = 15, bxnz = 8, qnt_val = 1, qnt_prx = 0
ncks: 2 = 0.5 * 2^2, dgt_nbr = 1, qnt_pwr = -10, pbxr = 12, bxnz = 11, qnt_val = 2, qnt_prx = 0
ncks: 3 = 0.75 * 2^2, dgt_nbr = 1, qnt_pwr = -10, pbxr = 11, bxnz = 12, qnt_val = 3, qnt_prx = 0
ncks: 4 = 0.5 * 2^3, dgt_nbr = 1, qnt_pwr = -10, pbxr = 13, bxnz = 10, qnt_val = 4, qnt_prx = 0
ncks: 5 = 0.625 * 2^3, dgt_nbr = 1, qnt_pwr = -10, pbxr = 12, bxnz = 11, qnt_val = 5, qnt_prx = 0
ncks: 6 = 0.75 * 2^3, dgt_nbr = 1, qnt_pwr = -10, pbxr = 12, bxnz = 11, qnt_val = 6, qnt_prx = 0
ncks: 7 = 0.875 * 2^3, dgt_nbr = 1, qnt_pwr = -10, pbxr = 12, bxnz = 11, qnt_val = 7, qnt_prx = 0
ncks: 8 = 0.5 * 2^4, dgt_nbr = 1, qnt_pwr = -10, pbxr = 14, bxnz = 9, qnt_val = 8, qnt_prx = 0
ncks: 9 = 0.5625 * 2^4, dgt_nbr = 1, qnt_pwr = -10, pbxr = 13, bxnz = 10, qnt_val = 9, qnt_prx = 0
ncks: 10 = 0.625 * 2^4, dgt_nbr = 1, qnt_pwr = -10, pbxr = 13, bxnz = 10, qnt_val = 10, qnt_prx = 0

If you want to shave the same number of bits from every value, as Klower does, you can add the --baa=5 ("bit-adjustment-algorithm") option which is a hybrid of the BitGroom # of bits (from your table) and the IEEE rounding algorithm used (I think) by Klower:

ncks -O -7 -C --baa=5 -v one_dmn_rec_var_flt --ppc default=3 ~/nco/data/in.nc ~/foo.nc

However, that mode does not provide the same debugging information as Granular BG. Hope this helps.

@milankl
Copy link

milankl commented Dec 10, 2021

@rsignell-usgs The number of significant bits nsb is obtained from the number of significant bits nsd as nsb = ceil(log2(10)*nsd) whereby the ceil is just because rounding down can mean that the signficant bits aren't enough to preserve the precision of nsd.

@milankl
Copy link

milankl commented Dec 10, 2021

You can see that NSD=3 keeps 11 mantissa bits for two values, 10 for three values, 9 for four values and 8 for one value. This is why it is called "Granular"---the bitmask changes for every value. NSD=4 keeps too many bits for most numbers:

Sorry, Charlie, I'd be interested what the motivation behind this is? Is it basically because nco provides to the user an adjustable level of precision via option of NSD which is just too coarse if one want less than, say, NSD = 4 but more than NSD = 3 or is there another reason (why not choose NSD as a real, e.g. NSD=3.5)? What impact does that have on the lossless compression? I'd reckon that an irregular number of trailing zero bits can't be compressed as well.

@czender
Copy link
Member

czender commented Dec 10, 2021

Yes, partly the motivation is that many people think in terms of integral numbers of decimal digits of precision. A filter that accepts a real instead of an integer NSD, or alternatively an integer number of significant bits (NSB), and then straightforwardly rounds to that number of bits would be a great feature, though it would be distinct from GBG. I have not tested the interesting question of whether the lossless compressors do worse on an irregular number of trailing bits. My hunch is that bitstream (1d) compressors would do better not worse, assuming they have a table somewhere of multiple bit patterns to substitute for. For a given NSD we're only talking about a small range (up to 4) in the number of keep bits, so four patterns to substitute for instead of one does not seem too difficult. This is not my strong suit, and I'd welcome the results of an investigation...

@czender
Copy link
Member

czender commented Dec 10, 2021

I will try to implement a straightforward bitround algorithm in the January timeframe. This would take as input the integer number of keepbits (for example as output by the 99% information content algorithms of @milankl) and then round as per IEEE. This could be a bridge or first step in a fuller implementation of the information content approach. All of the necessary ingredients are already in nco_ppc.c. The NCO API accepts per-variable NSDs, so after crunching the 99% level keepbits elsewhere (in Julia, for now), an entire dataset could be rounded in one ncks invocation.

@milankl
Copy link

milankl commented Dec 11, 2021

For a given NSD we're only talking about a small range (up to 4) in the number of keep bits, so four patterns to substitute for instead of one does not seem too difficult. This is not my strong suit, and I'd welcome the results of an investigation...

Just did that

using BitInformation, Random
using TranscodingStreams, CodecZstd

ZstdCompressorL10 = ZstdCompressor(level=10)
TranscodingStreams.initialize(ZstdCompressorL10)

function ar1process(r::T,N) where T
    
    ar1 = Array{T}(undef,N) # preallocate
    ar1[1] = randn(T)       # initial conditions
    s = sqrt(1-r^2)         # noise magnitude
    
    for i in 2:N
        ar1[i] = r*ar1[i-1] + s*randn(T)
    end
    
    return ar1
end

N = 3000_000
autocorrelation = 0.9
keepbits = 7
A = ar1process(autocorrelation,N)

Ar1 = copy(A)
Ar2 = copy(A)

# either always round to 10 mantissa bits (i.e. not granular)
round!(Ar1,keepbits)

# or random thirds of all values to 9,10 or 11 (i.e. granular)
shuffled_indices = shuffle(1:N)
round!(view(Ar2,shuffled_indices[1:N÷3]),keepbits-1)
round!(view(Ar2,shuffled_indices[N÷3+1:2N÷3]),keepbits)
round!(view(Ar2,shuffled_indices[2N÷3+1:N]),keepbits+1)

# compress with Zstd level 10
Ar1_as_UInt8 = copy(reinterpret(UInt8,Ar1))
cf1 = sizeof(Ar1)/sizeof(transcode(ZstdCompressorL10,Ar1_as_UInt8))

Ar2_as_UInt8 = copy(reinterpret(UInt8,Ar2))
cf2 = sizeof(Ar2)/sizeof(transcode(ZstdCompressorL10,Ar2_as_UInt8))

Then always rounding to 10 mantissa bits (=not granular) instead of sometimes 9, sometimes 10 and sometimes 11 in an irregular manner (=granular) has a compression factor that's about 4% higher

julia> cf1/cf2
1.0385679964190777

This seems to be largely independent of the array size N, and the autocorrelation within 0.0...0.99 but but strongly increases (meaning not granular is better) for more autocorrelation in the data >0.99, which makes sense as the irregularity of the keepbits becomes more and more the bottleneck for the lossless compression algorithm to remove redundancies.

Furthermore, reducing keepbits also makes not granular better than granular, probably for the same "contribution of irregularity"-reason as with autocorrelation. E.g. for keepbits = 5 and autocorrelation = 0.99999 (I think this is realistic for some high resolution temperature) the compression factors become obv better, reaching 16x without granularity but only 11x with.

@czender maybe you could see whether you can confirm this with some real data and your actual GBG algorithm? If yes, whether sacrificing a few percent of compression factors for the granularity is worth it, I don't know?

@czender
Copy link
Member

czender commented Dec 12, 2021

Thank you for that analysis, @milankl. A surprising (to me) result. If true for multiple lossless codecs then provides a way to gain a few more percent compression. It will be easier for me to test after I (or a volunteer) adds a barebones NSB algorithm with bitround to NCO in the January timeframe. Once implemented, this will probably be accessible via --baa=8. It could become the default if more testing shows it to perform better for typical cases. Always happy to improve the defaults!

@milankl
Copy link

milankl commented Dec 12, 2021

While I don't know how many lossless algorithms work in detail, I usually think of them as "removing redudancies". If all bits m to n of every float in an array are 0, storing these bits more than once is redundant (as the entropy of these bits is 0). Now if due to granular bitgrooming/rounding for roughly half the floats only bits m+1 to n are 0, but bit m is 1 for 25% of the floats (1/2 due to granular times 1/2 assuming full entropy where it's not rounded to zero) then bit m still contribues ~0.8bit of entropy (-1/4*log2(1/4)-3/4*log2(3/4)). This also explains why for a smaller amount of keepbits the granular rounding yields lower compression factors: The smaller the number of bits with low mutual information (as in our recent paper) but high unconditional entropy in rounded floating-point numbers, the larger the contribution to the total entropy of the ~0.8bit entropy by the m-th bit. (<- I hope this makes sense?)

Following this logic, I'd say it makes sense to either keep the mth bit (which will yield a compression factor cf_keep) or to throw it away (compression factor cf_nokeep), and cf_keep < cf_nokeep. The granular approach will likely yield cf_granular, such that cf_keep ≈ cf_granular < cf_nokeep such that one may want to keep the mth bit it in the first place?

@czender
Copy link
Member

czender commented Jan 5, 2022

@rsignell-usgs The BitRound quantization algorithm is now in the current snapshot of NCO as baa=8. If all goes well the then-current snapshot will be released as NCO 5.0.5 this Friday, January 7. Following is the usage for anyone inclined to test it.

First, BitRound is not the default quantization method (that is currently GranularBR) so one must manually select baa=8. Second, unlike most other NCO quantization algorithms, BitRound takes the Number of Significant Bits (NSB, or "keepbits") as the argument, not NSD. The example above would thus be specified as:

ncks  --baa=8 --ppc T2=10 data.nc data_bitround.nc # BitRound T2, keep 10 bits
ncks  --baa=8 --ppc default=10 data.nc data_bitround.nc # BitRound all floats (except coordinates), keep 10 bits

Documentation (which needs improvement) is at http://nco.sf.net/nco.html#BitRound. Note that I have changed the external name of the Granular BitGroom (GBG) algorithm to Granular BitRound (GBR), to more clearly convey that it applies BitRound with the keepbits determined optimally for each value, rather than using the same number of keepbits for all values in a variable. Feedback welcome!

@rsignell-usgs
Copy link
Author

rsignell-usgs commented Jan 5, 2022

Super exciting news Charlie!! Hope all goes well with the release plan, and I'll be trying this out soon, as I've got my keepbits dictionary (e.g. keepbits['U10']=8) from BitInformation all ready to go!

@czender
Copy link
Member

czender commented Jan 24, 2022

There was a two-week holdup for unrelated reasons, and now NCO 5.0.5 with BitRound is finally released.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants