Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds method to allow for extracting data at a given offset. #275

Conversation

bitwisejb
Copy link

Fixes #274

Changes proposed in this PR

  • Adds a method that allows the user to extract data given an entry, size, and offset.

Tests performed

  • Added unit tests to verity the change against a standard and a zip64 compressed archive.
  • Verified existing tests work.
  • Ran SwiftLint and saw no issues.

@weichsel
Copy link
Owner

weichsel commented May 9, 2023

Hi bitwisejb,
Thanks for providing this PR.
Can you explain your use case for this addition? It seems like you want to achieve random access into an archive file based on an entry starting position and some arbitrary offset.
The focus of ZIP Foundation is to provide a structured way to access content of an archive on an per-entry basis - abstracting away the internals (offsets, lengths, compression, ...) of ZIP files.
While your addition makes use of some metadata (e.g. the beginning of the entry data offset), it mainly performs low level seek/file access that could be achieved without using ZIP Foundation. API users that call into your new extract method would get back a blob of data without any context. Reading chunks of compressed entries that way wouldn't make much sense since they'd be impossible to decompress at the call site.

Would it help for your usage scenario to expose e.g. Entry.dataOffset?

@bitwisejb
Copy link
Author

I believe making Entry.dataOffset public would work for my use case. The archive I am working with has map imagery stored in a folder structure designating levels and tile positions. One entry in the archive is an index for locating tile image data for a given xyz. We use xyz to determine the entry and offset for the image to extract. Byte count is known to us based on information in the index entry.

@bitwisejb
Copy link
Author

bitwisejb commented May 13, 2023

@weichsel It looks like we would need the fileHandle for the archive. Would exposing Archive.archiveFile be an option as well?

@bitwisejb
Copy link
Author

@weichsel We need to be able extract a single entry from a Zip file with a compression level of 0 without extracting the entire archive. The method that was originally put in place enables this functionality. We appreciate the thought and design that you have put in place that hides the lower level details. You had mentioned above that there may be a way to accomplish this without this change. What would be a good approach for accomplishing this, or is there a change you would recommend that could introduce this functionality?

@bitwisejb bitwisejb closed this Nov 9, 2023
@bitwisejb bitwisejb reopened this Nov 9, 2023
@weichsel
Copy link
Owner

weichsel commented Nov 9, 2023

@bitwisejb

We need to be able extract a single entry from a Zip file with a compression level of 0 without extracting the entire archive.

You can subscript into an archive via path: https://github.com/weichsel/ZIPFoundation#accessing-individual-entries.
This will provide you access to an entry without having to extract the whole archive first.

You had mentioned above that there may be a way to accomplish this without this change. What would be a good approach for accomplishing this, ...

After retrieving the entry, you can use the closure-based Archive.extract method: https://github.com/weichsel/ZIPFoundation#closure-based-reading-and-writing
This will allow you to perform chunk-wise reads on the contents of your entry. The sample code in the README uses the basic version of this method. Please refer to the docs for more info. e.g. there's a bufferSize parameter that allows you to control the size of the data chunks passed into the closure.

@weichsel weichsel closed this Nov 15, 2023
@bitwisejb
Copy link
Author

@weichsel We have investigated this API in the past, but it is inefficient for extracting a known set of bytes from a zip file that may be several gigabytes. Our use case requires high volume random access to well known files (offset and size) within the zip file without additional overhead. Perhaps there is another more performant api that exists that I am not aware of.

We have been using a fork of this repo with the included functions for some time with great success. We wish to contribute this back to this repo and change to using this repo so that we may benefit from any future contributions.

Please advise on what we can do to move this change forward? Otherwise, we will be left working with our fork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to read data from archive given byte offset and size
2 participants