Support extraction of 'large' (>= ~2 GiB) files #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When extracting Tar archives with 'large' files (i.e. files larger than ~2 GiB) I noticed that this either didn't work (macOS) or resulted in corrupt/too small files (Linux). While debugging, I noticed that there are limits to the amount of data that
syswrite
writes, on Linux, it's at (2**31 - 4096) bytes while on macOS, it appears to be at (2**31 - 1) bytes. In addition to that,syswrite
on Linux doesn't return an error in that case, but just the amount of data actually written (which is well within the specified behavior ofwrite(2)
), while on macOS, it returns an error - which explains the behavior I observed. There's also an older bug report on rt.cpan.org which seems to describe the same issue on Windows.This PR changes the code which writes extracted files to (1) write smaller chunks (1 GiB) and (2) write until all data was actually written. This seemed to fix the problem in my tests with the file where the extraction failed before. I'm not sure if this is the best solution for the problem, but at least on Linux, the current implementation seems problematic since it may produce incomplete files without warning or error.