New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
60x compression! #250
Comments
Yes, Rich, these are amazing results. FYI, Milan Klower is the invited speaker to our upcoming EGU session on lossy compression... |
Awesome, good to know you are tracking this Charlie! |
Are there any plans to implement the bit information and the bit rounding routines in a future NCO release? |
Unfortunately we do not have the resources to implement these features ourselves, though we would certainly welcome the patches. Klower's methods are a departure from the existing lossy compression algorithms in NCO where the user specifies the acceptable lossiness. The new method would require a substantial amount of code to ascertain the number of keep bits based on the information content of a variable. Only the final IEEE rounding part is completely in NCO (and is used in BitGroom and Granular BitGroom). |
@czender, I ran a modified version of the @milankl Julia bit information code example and determined that for my variable If I then wanted to use NCO to apply this to my data, what value would I specify to the From your bitgrooming paper:
So if I need to keep 10 mantissa bits, I would want a value of 3, which actually keeps 11, right?
Is this right? |
Indeed, this is the correct table for the original BitGroom. However, the latest NCO defaults to Granular BitGroom (GBG), which is more aggressive and guarantees preserving the same number of significant digits with fewer bits than BG. GBG documentation is still lacking, sorry. In debugging mode GBG prints the number of keep bits is retains as the
You can see that NSD=3 keeps 11 mantissa bits for two values, 10 for three values, 9 for four values and 8 for one value. This is why it is called "Granular"---the bitmask changes for every value. NSD=4 keeps too many bits for most numbers:
If you want to shave the same number of bits from every value, as Klower does, you can add the
However, that mode does not provide the same debugging information as Granular BG. Hope this helps. |
@rsignell-usgs The number of significant bits |
Sorry, Charlie, I'd be interested what the motivation behind this is? Is it basically because nco provides to the user an adjustable level of precision via option of |
Yes, partly the motivation is that many people think in terms of integral numbers of decimal digits of precision. A filter that accepts a real instead of an integer NSD, or alternatively an integer number of significant bits (NSB), and then straightforwardly rounds to that number of bits would be a great feature, though it would be distinct from GBG. I have not tested the interesting question of whether the lossless compressors do worse on an irregular number of trailing bits. My hunch is that bitstream (1d) compressors would do better not worse, assuming they have a table somewhere of multiple bit patterns to substitute for. For a given NSD we're only talking about a small range (up to 4) in the number of keep bits, so four patterns to substitute for instead of one does not seem too difficult. This is not my strong suit, and I'd welcome the results of an investigation... |
I will try to implement a straightforward bitround algorithm in the January timeframe. This would take as input the integer number of keepbits (for example as output by the 99% information content algorithms of @milankl) and then round as per IEEE. This could be a bridge or first step in a fuller implementation of the information content approach. All of the necessary ingredients are already in |
Just did that using BitInformation, Random
using TranscodingStreams, CodecZstd
ZstdCompressorL10 = ZstdCompressor(level=10)
TranscodingStreams.initialize(ZstdCompressorL10)
function ar1process(r::T,N) where T
ar1 = Array{T}(undef,N) # preallocate
ar1[1] = randn(T) # initial conditions
s = sqrt(1-r^2) # noise magnitude
for i in 2:N
ar1[i] = r*ar1[i-1] + s*randn(T)
end
return ar1
end
N = 3000_000
autocorrelation = 0.9
keepbits = 7
A = ar1process(autocorrelation,N)
Ar1 = copy(A)
Ar2 = copy(A)
# either always round to 10 mantissa bits (i.e. not granular)
round!(Ar1,keepbits)
# or random thirds of all values to 9,10 or 11 (i.e. granular)
shuffled_indices = shuffle(1:N)
round!(view(Ar2,shuffled_indices[1:N÷3]),keepbits-1)
round!(view(Ar2,shuffled_indices[N÷3+1:2N÷3]),keepbits)
round!(view(Ar2,shuffled_indices[2N÷3+1:N]),keepbits+1)
# compress with Zstd level 10
Ar1_as_UInt8 = copy(reinterpret(UInt8,Ar1))
cf1 = sizeof(Ar1)/sizeof(transcode(ZstdCompressorL10,Ar1_as_UInt8))
Ar2_as_UInt8 = copy(reinterpret(UInt8,Ar2))
cf2 = sizeof(Ar2)/sizeof(transcode(ZstdCompressorL10,Ar2_as_UInt8)) Then always rounding to 10 mantissa bits (=not granular) instead of sometimes 9, sometimes 10 and sometimes 11 in an irregular manner (=granular) has a compression factor that's about 4% higher julia> cf1/cf2
1.0385679964190777 This seems to be largely independent of the array size Furthermore, reducing @czender maybe you could see whether you can confirm this with some real data and your actual GBG algorithm? If yes, whether sacrificing a few percent of compression factors for the granularity is worth it, I don't know? |
Thank you for that analysis, @milankl. A surprising (to me) result. If true for multiple lossless codecs then provides a way to gain a few more percent compression. It will be easier for me to test after I (or a volunteer) adds a barebones NSB algorithm with bitround to NCO in the January timeframe. Once implemented, this will probably be accessible via --baa=8. It could become the default if more testing shows it to perform better for typical cases. Always happy to improve the defaults! |
While I don't know how many lossless algorithms work in detail, I usually think of them as "removing redudancies". If all bits Following this logic, I'd say it makes sense to either keep the |
@rsignell-usgs The BitRound quantization algorithm is now in the current snapshot of NCO as First, BitRound is not the default quantization method (that is currently GranularBR) so one must manually select
Documentation (which needs improvement) is at http://nco.sf.net/nco.html#BitRound. Note that I have changed the external name of the Granular BitGroom (GBG) algorithm to Granular BitRound (GBR), to more clearly convey that it applies BitRound with the keepbits determined optimally for each value, rather than using the same number of keepbits for all values in a variable. Feedback welcome! |
Super exciting news Charlie!! Hope all goes well with the release plan, and I'll be trying this out soon, as I've got my keepbits dictionary (e.g. |
There was a two-week holdup for unrelated reasons, and now NCO 5.0.5 with BitRound is finally released. |
https://www.nature.com/articles/s43588-021-00156-2.pdf
Implemented in Julia, but could be cool for NCO in the future.
The text was updated successfully, but these errors were encountered: