Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stdin/Stdout support for recompression, Stdout support for precompression #140

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nicolas-comerci
Copy link

@nicolas-comerci nicolas-comerci commented Aug 19, 2022

This was surprisingly easier to accomplish than I (and apparently others #55) originally suspected.

The changes on this MR are pretty much just:

  • If using stdin and/or stdout (just by using "stdin"/"stdout" as input/output names), set stdin/stdout to binary mode and set fin=stdin and fout=stdout
  • printf and cout interfere when using stdout as they also go there which would make the prints mix with the actual data, so I added a print_to_console function that uses putch on windows, writes directly to tty on unix/mac, which solves this problem
  • The header was being read twice in some cases, this involved seeking back to the start of the file on fin, which caused problems.
    Just added a guard to prevent reading it twice and removed the seeking as it now only reads the header when we start reading from the stream

The MR looks large because of all the changes of printf => print_to_console but other than that is actually pretty small.

I tested this on Windows 11, Ubuntu 22 and Mac Monterrey (tho I had to do some changes on the CMake file and on some lzma header to get it to compile there, not included here) and it has worked beautifully

Tested:

  • I would like to check that this didn't break the dlltest or precompf, haven't gotten them to compile yet
  • Most of my testing has been using -cn, so no compression, still testing some files to ensure compression still works
    EDIT: tested using -cl and -cb, fixed some minor issues, now both lzma and bzip2 precompressing to stdout and decompressing from stdin (used cat) to stdout works fine
    EDIT2: got dlltest and precompf to compile on linux and mac, tested them with some files and they work okay

Please give any feedback on any improvements needed on this MR.
I think this feature will greatly improve precomp as it will make it possible to use precomp without having to deal with huge files, and have more flexibility chaining/piping precomp with external compressors/tools

@M-Gonzalo
Copy link

Hi! This is great @nicolas-comerci !!

A little feedback: Could you write stats and such to stderr instead? It will be useful for scripts like this one where I give precomp a little UI
Screenshot_20220823_110626

I love your new version but I can't use it on my scripts as it is

@nicolas-comerci
Copy link
Author

Hi! This is great @nicolas-comerci !!

A little feedback: Could you write stats and such to stderr instead? It will be useful for scripts like this one where I give precomp a little UI Screenshot_20220823_110626

I love your new version but I can't use it on my scripts as it is

Mhhh didn't think of that use case, should be easy enough to make print_to_console print to stderr...
Maybe even a switch so we can decide if we want prints to go directly to console, to stderr or disable them completely.
I'll think some options and see what I can do about.

Also, that script looks cool, is it publicly available?

@M-Gonzalo
Copy link

Mhhh didn't think of that use case, should be easy enough to make print_to_console print to stderr... Maybe even a switch so we can decide if we want prints to go directly to console, to stderr or disable them completely. I'll think some options and see what I can do about.

Yes, that would be ideal, a switch to disable output altogether, or even specify a file for the log info to go, maybe {file_name}.pcf.log or something like that by default.

Also, that script looks cool, is it publicly available?

It wasn't, but not for any particular reason. I uploaded it here: https://github.com/M-Gonzalo/bash-stuff/blob/main/fancyPrecompArchiver.sh

Keep in mind that it is a Linux script, and is heavy on dependencies. You're going to need:

  • precomp
  • pv
  • bc
  • bat
  • wimlib
  • WINE
  • fazip
  • srep
  • fxz

If there are some of them you don't know about or can't get, let me know and I'll give the script a proper README with everything needed

The part that's not about precomp it's built to be as fast as possible while remaining at a 7z - like ratio (usually better and at least a couple of times faster)
You'll see better the speed-up and improvements in ratio on machines with a high CPU count.

@M-Gonzalo
Copy link

@nicolas-comerci I think there's a problem with the way precomp handles file writing.
I'm getting sudden crashes when I try to decompress using pipes, even though there are no corrupted files (everything decompresses correctly when done one by one)
I get out of memory messages, even when I have most of it free.
But it only happens with really big .pcf files (several GB). I'm just guessing here, but, couldn't a fixed-size buffer on precomp's side help here? It's like it tries to allocate memory for the whole thing, but there aren't chunks that big.
Nothing of the sort happens with other pipes, including archives bigger than my available memory. It's only with precomp.

@nicolas-comerci
Copy link
Author

@nicolas-comerci I think there's a problem with the way precomp handles file writing. I'm getting sudden crashes when I try to decompress using pipes, even though there are no corrupted files (everything decompresses correctly when done one by one) I get out of memory messages, even when I have most of it free. But it only happens with really big .pcf files (several GB). I'm just guessing here, but, couldn't a fixed-size buffer on precomp's side help here? It's like it tries to allocate memory for the whole thing, but there aren't chunks that big. Nothing of the sort happens with other pipes, including archives bigger than my available memory. It's only with precomp.

I've tried some really huge files (including one 19.5Gb file) and didn't run into problems, though most of my testing was on windows.
I will try that and some other huge files on linux and see if I can reproduce the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants