Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "cli callback" for streaming output #8

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

gizlu
Copy link

@gizlu gizlu commented Jun 20, 2022

Add -c cmd [cmd_arg ...] ; option which, for each extracted file will spawn supplied program, supply decompressed file to its stdin and save its stdout instead of original file.
Add -j n option, for specifing count of concurent instances of spawned callbacks

It allows for downloading, unziping and processing files without storing originals on disk

For example:

  • download, unzip and convert pictures from archive to webp, non-picture-files are "destroyed": curl "https://dd.b.pvp.net/latest/set1-lite-en_us.zip" | sunzip -j 4 -c cwebp -quiet -m 5 -o - -- - \;
  • Same, but more elaborate
  • compute checksum of every file in archive: sunzip -c md5sum - \; -j 2 < archive.zip (very likely useless, but nice for testing)

Problems with this patch:

  • sunzip will keep running even if program returns non-zero exit code. Rationale for that is that callbacks often fail with non-zero status if suplied unrecognized file. Maybe I should add flag for not ignoring child's exit code when it is not a problem?
  • It is unix specific. It won't even compile on windows (has it before?). It can be made optional feature disabled on not posix (easy), or ported
  • Cleanup after childs, in case of sunzip failure, is done in quite wishful manner - we assume that single SIGTERM will simply kill them. It should be easily fixable, see comments in bye() implementation
  • We don't handle any timeouts. Should we?

I done this for fun and don't have any particular use case for this (actually, I had, but this was another useless project done for fun), thus i don't have much motivation for polishing this, but I can try if you would be willing to merge this

gizlu and others added 5 commits June 19, 2022 20:02
Program specified by user (currently hardcoded) will be invoked,
the contents of the file being extracted supplied to its stdin,
and its stdout saved instead of original file. It is something like
`--to-command` from GNU tar

I think it is way more universal than madler#4
which is kind of useless when you have more than one file in archive
(and care about outputed filenames)

Unfortunately prog's exit code is curently ignored - sunzip will keep running
even if prog fail. This kinda suck, but it hugely simplifies integration with
programs that panic on unrecognized input.

TODO:
- Add CLI (prog is curently hard-set to `tac` for testing purposes)
- Improve child cleanup
- Add option to enable exit code checking or at least inform on which files error occured
- Add support for non-posix systems or at least make it optional feature
  (currently build on non-posix will probably fail)
Add folowing flags, and docs about them
-c cmd [cmd_arg ...] ;
   Pipe each extracted file to specified program and save its stdout on disk.
   All arguments beetween prog_name and semicolon are used as its args.
   Unfortunately prog's exit status is ignored - sunzip will keep running
   even if prog fail (Yeah, that sucks) Note: semicolon might need to be escaped
   with '\' to protect it from being intercepted by shell
-j n: limit concurent jobs spawned by -c. Default: 1

Sorry for quite big diff:
- I rewrote arg parsing logic, because it was easier to do that than to
integrate `-c` flag into old one. I hope that I didn't break compat doing that
- I changed help text format a bit

TODO (maybe): handle -j without spaces like -j5
Bug was caused by waiting for termination of child that wasn't
suplied any data yet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant