Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared Compressor fails in multithreaded code #31

Open
LexaGV opened this issue Apr 12, 2024 · 6 comments
Open

Shared Compressor fails in multithreaded code #31

LexaGV opened this issue Apr 12, 2024 · 6 comments

Comments

@LexaGV
Copy link

LexaGV commented Apr 12, 2024

I compiled sources of ZstdNet for FW 4.8. Plain usage:

var compressor = new Compressor();
data = File.ReadAllBytes(f.Filename);
ComprData = compressor.Wrap(data);

...and immediately fails:

System.AccessViolationException
HResult=0x80004003
Message=Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
Source=
StackTrace:
ZStdLibFW.dll!ZstdNet.Compressor.Wrap(System.ReadOnlySpan src, System.Span dst) Line 90 C#
ZStdLibFW.dll!ZstdNet.Compressor.Wrap(System.ReadOnlySpan src) Line 56 C#
ZStdLibFW.dll!ZstdNet.Compressor.Wrap(byte[] src) Line 43 C#

It happens in Compressor.cs[90]:

var dstSize = Options.AdvancedParams != null
			    ? ExternMethods.ZSTD_compress2(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length)
			    : Options.Cdict == IntPtr.Zero
				    ? ExternMethods.ZSTD_compressCCtx(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length, Options.CompressionLevel)
				    : ExternMethods.ZSTD_compress_usingCDict(cctx, dst, (size_t)dst.Length, src, (size_t)src.Length, Options.Cdict);

Any idea why memory management fails?

@dscheg
Copy link
Member

dscheg commented Apr 21, 2024

@LexaGV Could you please provide a more complete minimal example that reproduces the problem? Is the problem still reproducible with the version from the NuGet? Is the problem still reproducible with another file (e.g. empty)

Also the following information may be useful:

  • Sources commit revision
  • What changes are made
  • Build Target
  • Debug/Release
  • x86/x64
  • Libzstd version

In general, the AccessViolationException may indicate that the Compressor/CompressionOptions was disposed before use (but that doesn't seem to be the case for the snippet above)

@LexaGV
Copy link
Author

LexaGV commented Apr 22, 2024

Nope, it's not nuget - I compiled my own .NET Framework 4.8 version. Seems I found what's a problem. Full example:

var compressor = new Compressor();
Parallel.ForEach(Disk.EnumFiles(@"C:\DEV\", ".vs", ".git", ".hg", "bin", "obj", "bin_Debug", "bin_Release"), ff => {
    if (!ff.EndsWith(".cs")) return;
    var data = File.ReadAllBytes(ff);
    var comprData = compressor.Wrap(data);
    Log.Debug("Compr len=" + comprData.Length);
});

THIS example fails w memory problem described above. But if you put var compressor = new Compressor(); before var comprData = compressor.Wrap(data);, everything works smooth. What makes me thinking Compressor cannot be shared across threads and it's not re-entrant. I know, it's not a huge problem - to create compressor every time you need, but I thought on HUGE list of files shared Compressor could improve performance.

@dscheg
Copy link
Member

dscheg commented Apr 22, 2024

Instances of the Compressor class are not thread safe (as stated in the readme). If you want to optimize bulk processing performance you can use for example

  • Single Compressor instance with lock
  • ThreadStatic or ThreadLocal<T> instances
  • ConcurrentBag of instances with SemaphoreSlim
  • Object pooling
  • etc

Here is an example with ObjectPool<T>

// Read the docs: https://learn.microsoft.com/en-us/aspnet/core/performance/objectpool?view=aspnetcore-8.0
int parallelism = 4;
var pool = new DefaultObjectPool<Compressor>(new DefaultPooledObjectPolicy<Compressor>(), parallelism);
Parallel.ForEach(Directory.GetFiles(AppDomain.CurrentDomain.BaseDirectory, "*.cs"), new ParallelOptions {MaxDegreeOfParallelism = parallelism}, file =>
{
    var data = File.ReadAllBytes(file);
    var compressor = pool.Get();
    try
    {
        var compressed = compressor.Wrap(data);
        Console.WriteLine(compressed.Length);
    }
    finally
    {
        pool.Return(compressor);
    }
});

@LexaGV
Copy link
Author

LexaGV commented Apr 23, 2024

Dmitry, thanks for such descriptive and helpful answer! The one thing I don't like is "dancing around" the class. Is it possible to make Compressor itself as a reentrant? Not sure what "status" you need to keep once you compress data. All necessary variables can be declared locally, so Compressor will be static class and all you need is just call "SqueezeMyData()"! :)

@dscheg
Copy link
Member

dscheg commented Apr 24, 2024

Adding another simplified API is a possible option. But it will not be as efficient because it will require create/free an unmanaged context on each wrap. Instances of the Compressor class allow to reuse an unmanaged context. Check also native zstd docs

When compressing many times, it is recommended to allocate a context just once, and reuse it for each successive compression operation. This will make workload friendlier for system's memory.

Note : re-using context is just a speed / resource optimization, it doesn't change the compression ratio, which remains identical.

Note 2 : In multi-threaded environments, use one different context per thread for parallel execution.

@LexaGV
Copy link
Author

LexaGV commented Apr 24, 2024

Dmitriy, thanks for explanation! I'm glad to chat with such qualified engineer with so deep knowledge. Then your solution is ideal, what is rare for FOSS. :) For my peace of mind I made a few tests - compress *.cs sources of FW. Sources were on the RAM drive (PF=Parallel.Foreach):

Default compression:
Pool+PF: Spent=00:00:00.627 on 18296 files, Avg ratio = 0.1898
PF:      Spent=00:00:01.425 on 18296 files, Avg ratio = 0.1898
foreach: Spent=00:00:02.755 on 18296 files, Avg ratio = 0.1898

Max compression:
Pool+PF: Spent=00:00:30.078 on 18296 files, Avg ratio = 0.159
PF:      Spent=00:00:30.275 on 18295 files, Avg ratio = 0.159
foreach: Spent=00:02:12.764 on 18296 files, Avg ratio = 0.159

(Looks like pool gives nothing when each thread works long enough time. And max compression almost identical to default one, but takes much more resources!)

Perfect result! Issue can be closed. And I highly recommend to include Pool sample in README - it's a huge help for novices! Thanks for all your work and time!

@LexaGV LexaGV changed the title Compressor doesn't work at all Shared Compressor fails in multithreaded code Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants