Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make unpacking archives optional #2215

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

anjackson
Copy link

We are using MrJob to process WARC files, in similar manner to this example given in the Writing Jobs guide.

For our use case, it is crucial that the .gz compressed file is not automatically decompressed before use.

This PR proposes a new setting that would allow this to be controlled via a unpack_archives option passed to the MrJob runner. This new option defaults to True to maintain the expected default behaviour, while allowing us to set it to False when needed. We have tested this locally and it seems to work just fine.

I've attempted to document this new option, as per the contributing guidelines, but I'm not sure I've covered everything. Is there any other documentation I should add?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants