Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance for multi-threaded access to encrypted zip files #97

Open
mxmlnkn opened this issue Nov 13, 2022 · 4 comments
Open
Labels
enhancement New feature or request performance Something is slower than it could be

Comments

@mxmlnkn
Copy link
Owner

mxmlnkn commented Nov 13, 2022

#96 (comment)

BTW one thing i've discovered when trying to integrate libarchive is that python zipfile has similar inefficiencies as tarfile module:
if 2 threads trying to access the same member it will decompress (and decrypt if pw-protected) the member from the beginning.
The situation is better than with .tar.gz when it needs to decompress from the start of the whole archive, but still poblematic.
There is a need to develop something like SQLindexedTar class to checkpoint decompression and decryption states..

This might need yet another backend like indexed_bzip2 that works with zip files. So... a lot of work.

@mxmlnkn mxmlnkn added enhancement New feature or request performance Something is slower than it could be help wanted Extra attention is needed and removed help wanted Extra attention is needed labels Nov 13, 2022
@mxmlnkn
Copy link
Owner Author

mxmlnkn commented Nov 22, 2022

@Vadiml1024 Could it be that you are running into this issue instead:

Decryption is extremely slow as it is implemented in native Python rather than C.

With #98 also observing performance issues I get the feeling that a better zip module must be available :/. Maybe libarchive? But, we tried. And, concurrency support in the libarchive Python-bindings was a work in progress. czipfile exists but it seems to be Python 2 and dead.

So, I guess another self-written backend.

@Vadiml1024
Copy link

Vadiml1024 commented Nov 22, 2022 via email

@Vadiml1024
Copy link

Somebody already ported czipfile to python3
https://github.com/ziyuang/czipfile

@mxmlnkn
Copy link
Owner Author

mxmlnkn commented Nov 22, 2022

Ah nice. I didn't see it on PyPI.

Cython as opposed to Python is also said to be faster: https://stackoverflow.com/a/72513075/2191065

And there is this: https://github.com/TkTech/fasterzip But it seems like it might be missing some features like setting a password among others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Something is slower than it could be
Projects
None yet
Development

No branches or pull requests

2 participants