Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input file is removed without being compressed with redirecting STDOUT together with --rm #1193

Open
nobodyinperson opened this issue Nov 25, 2022 · 6 comments
Milestone

Comments

@nobodyinperson
Copy link

Describe the bug

Input file is removed without being compressed when --rm is given and stdout is redirected.

Expected behavior

Input file stays untouched and is not removed.

To Reproduce

Steps to reproduce the behavior:

  1. lz4 --rm inputfile >/dev/null
  2. Warning appears: Warning : using stdout as default output. Do not rely on this behavior: use explicit '-c' instead !
  3. inputfile was deleted (check with ls)

grafik

System (please complete the following information):

❯ inxi
CPU: quad core Intel Core i5-6500 (-MCP-) speed/min/max: 3477/800/3600 MHz
Kernel: 5.15.78-1-MANJARO x86_64 Up: 8d 20h 7m Mem: 5281.8/7804.1 MiB (67.7%)
Storage: 1.14 TiB (57.6% used) Procs: 332 Shell: fish inxi: 3.3.23
❯ lz4 --version
*** LZ4 command line interface 64-bits v1.9.4, by Yann Collet ***

Additional context
Add any other context about the problem here.

Noted this while developing a compression algorithm benchmarker: https://gitlab.com/nobodyinperson/compression-algorithm-benchmark

https://fosstodon.org/@nobodyinperson/109404289543643442

@t-mat
Copy link
Contributor

t-mat commented Nov 25, 2022

Hi @nobodyinperson,
Could you try to avoid to use --rm in this case?
I think lz4, gzip and xz are able to work in similar manner with the following commands:

# lz4
cd
mkdir lz4-issue-1193
cd lz4-issue-1193/
touch inputfile
lz4 -1 -c inputfile > /dev/null
ls
# inputfile
# gzip
cd
mkdir lz4-issue-1193-gz
cd lz4-issue-1193-gz/
touch inputfile
gzip -1 -c inputfile > /dev/null
ls
# inputfile
# xz
cd
mkdir lz4-issue-1193-xz
cd lz4-issue-1193-xz/
touch inputfile
xz -z -0 -c inputfile > /dev/null
ls
# inputfile

@nobodyinperson
Copy link
Author

Sure I can and now that I know my data is being removed otherwise I definitely will 😉

Still, I think preventing data loss should be top priority so an accidental misuse of options and redirection should absolutely not delete data.

In the above case, a user that just doesn't want to see the output of their lz4 --rm filename command will then append the common >/dev/null that works for pretty much every other command and then find that their data was deleted without a warning or confirmation.

@nobodyinperson
Copy link
Author

gzip, zstd and xz all behave as expected (still saving the compressed output file before deleting the source file when redirecting STDOUT), lz4 behaves differently here:

❯ touch bla
❯ gzip bla > /dev/null
❯ ls
 bla.gz 
❯ rm bla.gz 
❯ touch bla
❯ zstd bla --rm > /dev/null
bla                  :  (     0   B =>     13   B, bla.zst)                    
❯ ls
 bla.zst
❯ rm bla.zst 
❯ touch bla
❯ xz bla >/dev/null
❯ ls
 bla.xz
❯ touch bla
❯ lz4 bla --rm > /dev/null
Warning : using stdout as default output. Do not rely on this behavior: use explicit `-c` instead !  
❯ ls
❯ # lz4 removes input file without writing compressed output file when redirecting STDOUT, gzip, xz and zstd don't

@nobodyinperson
Copy link
Author

Commenting this block results in the intuitive behaviour (not writing compressed data to STDOUT even if STDOUT is not a TTY):

lz4/programs/lz4cli.c

Lines 680 to 688 in 8a31e64

if (!IS_CONSOLE(stdout) && mode != om_list) {
/* Default to stdout whenever stdout is not the console.
* Note : this policy may change in the future, therefore don't rely on it !
* To ensure `stdout` is explicitly selected, use `-c` command flag.
* Conversely, to ensure output will not become `stdout`, use `-m` command flag */
DISPLAYLEVEL(1, "Warning : using stdout as default output. Do not rely on this behavior: use explicit `-c` instead ! \n");
output_filename = stdoutmark;
break;
}

According to the warning this apparently seems to be an unstable default. If it's undecided anyway, how about making it behave the same way as other compression clis?

@Cyan4973
Copy link
Member

Cyan4973 commented Jan 2, 2024

Looking back at this issue,
I think the problem was that the CLI was doing exactly what it was being told,
aka it removes the input file (--rm) and sends the compressed output to stdout (> /dev/null),
but what was being told wasn't correctly understood, because lz4 uses a syntax which is not identical to gzip.

Historically, lz4 did not initially behave the same as gzip at all. Over time though, the behavior and capability set have been progressively fleshed out to mimic gzip whenever it's not directly incompatible.
An important example of incompatibility is lz4 file1 file2, which actually means "compress file1 and write the compressed result into file2". Another difference is the implicit stdout policy.

To address these last incompatibilities, and make lz4 behave a lot closer to gzip, the official work-around is to use lz4 -m. Now lz4 -m file1 file2 means "compress file1 and file2 into file1.lz4 and file2.lz4", like it would with gzip. And now, when applied to this current issue:

❯ touch bla
❯ lz4 -m --rm bla > /dev/null
❯ ls
bla.lz4

lz4 -m exists specifically to make lz4 behave closer to gzip command line. So I think that's the solution to this issue.

On a longer term perspective, we can still consider evolutions to lz4, but impact on the ecosystem must be carefully considered.
One topic that has been already discussed is to make lz4 -m the default behavior.
But I'm concerned about the risk of breaking compatibility with existing lz4 users and scripts.

Another one is to ditch the "implicit stdout" policy even when invoking lz4 without -m.
That one has been considered for a while, and this is why there is currently a warning message:

Warning : using stdout as default output. Do not rely on this behavior: use explicit `-c` instead !  

Since this warning has been set in 2019, almost 5 years ago, we could consider that enough time has passed to warn most lz4 users about this impending change, and we could now proceed on the second step of the plan, which is to remove the "implicit stdout" policy from lz4, even when -m is not specified.
I would still expect some usages to be altered by this change, but the vast majority should no longer rely on this behavior, so that might be preferable to the risk of new users inadvertently triggering this behavior because it's different from gzip.

@t-mat t-mat added this to the v1.9.5 milestone Feb 20, 2024
@t-mat
Copy link
Contributor

t-mat commented Feb 20, 2024

Since it seems this should be triaged, I set a milestone v.1.9.5 as a mark of it. We may leave it as is and will revisit in future release(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants