New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why lzma for data compression? #559
Comments
Hi @Yomguithereal, I didn't know that Python could come without LZMA, I thought it was a standard package and I used it because it compresses text better. I could switch to bz2 for example, do you know a list of supported platforms so that we don't run into the same problem again? |
@adbar I think you would have the same problem with |
I checked again, usually all the packages in the stdlib are available. In some cases compression librairies are missing with Python compiled from source but it's inconsistent across systems, see pyenv wiki, on Mac OS |
The only thing I see here could be to conditionally support a pure-python implementation of the lzma decompression scheme (using this for instance https://github.com/Rogdham/python-xz). So your code would import |
It would also be difficult to test on Github Actions (the current CI/CD). We could also explain how to fix the problem in the docs. Let's leave the issue open for now. |
Hello @adbar,
Sorry to bother you but can I ask the reason why the library's model data is compressed using
lzma
? I am asking because I have found that a lot of people are using versions of python on their computer that were compiled/installed withoutlzma
support and using trafilatura therefore breaks and they often struggle to fix the problem as they don't always know how to reinstall python after having installed the proper dependencies (through yum or apt usually). Wouldn't gzip or another compression scheme be more widespread and avoid this issue?Have a good day,
The text was updated successfully, but these errors were encountered: