read/readall dumps the decompressed files to memory, instead of streaming them #579
Labels
enhancement
New feature or request
for extraction
Issue on extraction, decompression or decryption
help wanted
Extra attention is needed
There is a problem with reading large files, whose decompressed form exceed the available RAM:
The library (namely read/readall methods) tries to first decompress the file to memory using BytesIO, and then returns that BytesIO object. While that may work well for small files, it fails due to lack of memory, for bigger ones.
It would be better if the library streamed the files, just like the standard file IO.
To Reproduce
Expected behavior
Library should allocate only as much memory as really needed for reading data requested, and allow to stream files even if their decompressed form exceeds available memory and disk space.
Environment:
(the Wikipedia dump file used as an example is 246.6 MB in compressed form, and 36 GB when decompressed)
The text was updated successfully, but these errors were encountered: