Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the compression mechanism #400

Closed
chainsawriot opened this issue Apr 26, 2024 · 3 comments
Closed

About the compression mechanism #400

chainsawriot opened this issue Apr 26, 2024 · 3 comments

Comments

@chainsawriot
Copy link
Collaborator

As there are many bugs (#395 #396 #399) after really testing this bunch of code (#354)

Other than the issues of compression detection, this

rio/R/compression.R

Lines 25 to 44 in 2fb1373

filename <- normalizePath(filename)
tmp <- tempfile()
dir.create(tmp)
on.exit(unlink(tmp, recursive = TRUE), add = TRUE)
file.copy(from = filename, to = file.path(tmp, basename(filename)), overwrite = TRUE)
wd <- getwd()
on.exit(setwd(wd), add = TRUE)
setwd(tmp)
if (type == "zip") {
o <- utils::zip(cfile2, files = basename(filename))
} else {
if (type == "tar") {
type <- "none"
}
o <- utils::tar(cfile2, files = basename(filename), compression = type)
}
setwd(wd)
if (o != 0) {
stop(sprintf("File compression failed for %s!", cfile))
}

is super not robust (e.g. it won't return a file and o is still 0). Instead of using this, the plan is to use the battle-tested R.utils compression. Well, even data.table is using it and that's the reason for introducing R.utils in the first place #362 .

https://github.com/Rdatatable/data.table/blob/2487c61656335764980e478c323f7e6ce4e6d4ca/R/fread.R#L123

But the problem is that R.utils doesn't support xz. (But I think it should be okay to cut xz. No one complains about xz not functioning, although it is not functioning for so many years.)

@chainsawriot chainsawriot changed the title Completely rewrite of the compression mechanism Completely rewrite the compression mechanism Apr 26, 2024
@chainsawriot
Copy link
Collaborator Author

chainsawriot commented Apr 27, 2024

Despite the code has been added since 2016 e980077 , the (current) doc never says that .{format}.xz, .{format}.bzip, and .{format}.gz (except ".csv.gz") are supported.

rio/R/export.R

Line 11 in 144ebca

#' The output file can be to a compressed directory, simply by adding an appropriate additional extensiont to the `file` argument, such as: \dQuote{mtcars.csv.tar}, \dQuote{mtcars.csv.zip}, or \dQuote{mtcars.csv.gz}.

("compressed directory" is weird, probably means "compressed file.")

I think another solution is to clarify only ".{format}.zip", ".{formart}.tar", ".{format}.tar.gz", ".csv.gz" (via data.table::fwrite()) and ".csv.bz" (via data.table::fwrite()) are supported, because they are tested. Probably this is not a breaking change, because gzip, bzip2 and xz never work.

@schochastics Thoughts?

@chainsawriot chainsawriot changed the title Completely rewrite the compression mechanism About the compression mechanism Apr 27, 2024
@schochastics
Copy link
Member

If we properly document what is supported and what not we can take the "not-support-xz" road and use R.utils to fix the remaining compression

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented May 2, 2024

Cool. Then let's do it.

chainsawriot added a commit that referenced this issue May 12, 2024
chainsawriot added a commit that referenced this issue May 13, 2024
* Adjust checking in parse_archive

to prepare for the retoration of `tar.gz` support

* Add basic `tar.gz` support

Ref #400

* doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants