Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please add possibility to call Dictionary creation from stream #28

Open
TomArrow opened this issue Sep 18, 2021 · 0 comments
Open

Please add possibility to call Dictionary creation from stream #28

TomArrow opened this issue Sep 18, 2021 · 0 comments

Comments

@TomArrow
Copy link

TomArrow commented Sep 18, 2021

I would like to do some training of dictionaries with datasets that are many gigabytes and consist of sometimes millions of files. I keep them in 7z files and linearly decompress them using Sharpcompress on the fly. It would be awesome to be able to feed that data straight into the dictionary creation using a custom stream or something.

Even so, the function that does the training copies an array into a stream, which for large datasets is a waste of memory when you could just directly supply the needed stream. The function that accepts the stream is in a class marked as internal so I can't access it directly.

Ideally I'd love to be able to load up a 7z file with hundreds of gigabytes of data and stream that into the dictionary creation without running out of RAM (because I don't like to keep that many files on my hard drive bare to waste space and system resources).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant