Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CVE-2024-27322 and this package #406

Open
chainsawriot opened this issue May 1, 2024 · 15 comments
Open

CVE-2024-27322 and this package #406

chainsawriot opened this issue May 1, 2024 · 15 comments

Comments

@chainsawriot
Copy link
Collaborator

CVE-2024-27322 is partially fixed in R 4.4.0. But the attack surface is still there. First, this package supports R > 3.6 therefore the partial fix in 4.4.0 is not applied in many supported versions. Second, even with 4.4.0 deserialization of .Rdata and .RDS in some cases can still invoke arbitrary code execution. See this message by gws.

We can't be too nanny but should we rethink the "any R object" policy of .Rdata

rio/R/import.R

Line 31 in c529994

#' \item Saved R objects (.RData,.rda), using [base::load()] for single-object .Rdata files. Use `which` to specify an object name for multi-object .Rdata files. This can be any R object (not just a data frame).

.RDS

rio/R/import.R

Line 32 in c529994

#' \item Serialized R objects (.rds), using [base::readRDS()]. This can be any R object (not just a data frame).

and qs

rio/R/import.R

Lines 33 to 35 in c529994

#' \item Serialized R objects (.qs), using [qs::qread()], which is
#' significantly faster than .rds. This can be any R
#' object (not just a data frame).

There are several options:

  1. Warn about non data frame object
  2. Completely forbid non data frame object
  3. Only forbid Promise
  4. Just warn in the doc
  5. Don't be nanny
@chainsawriot
Copy link
Collaborator Author

@schochastics @jsonbecker thoughts?

@jsonbecker
Copy link
Collaborator

In my opinion, rio's intent is to load data.frame like objects, not restore R sessions. Support for RDS is support for one way someone might save those kind of objects. I'd be comfortable allow-listing certain classes of objects on both write and load and specify that, possibly with a message. If the intent is full session storage, this is not the right method/package.

I mention allow list because I think we'd want to include things like data.table and tibble objects.

This would be a breaking change, potentially, and may warrant a major version number update if we wanted to be closer to semver.

@hrbrmstr
Copy link

hrbrmstr commented May 1, 2024

I just made https://github.com/hrbrmstr/rdaradar and I think doing some quarantining of R data files in the rio load process to then pick out "data frame-like objects" might be the way to go

@jsonbecker
Copy link
Collaborator

I just made hrbrmstr/rdaradar and I think doing some quarantining of R data files in the rio load process to then pick out "data frame-like objects" might be the way to go

So in this case, we'd:

  1. Load into a new.env()
  2. Filter the list of objects by class against an allow list of classes.
  3. Load the objects with the correct classes.
  4. Remove the quarantine environment in cleanup.

Are there any concerns about how R makes it... incredibly simple to just declare you're one class (along with others)? Put another way, should we restrict certain classes explicitly as well, regardless of whether they also declare they are a data.frame-alike?

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented May 1, 2024

Thank you all for the feedback.

Specifically, this is the line that makes exceptions for the serialization formats.

rio/R/import.R

Lines 146 to 148 in c529994

if (inherits(file, c("rio_rdata", "rio_rds", "rio_json", "rio_qs")) && !inherits(x, "data.frame")) {
return(x)
}

And for other formats, rio will force the output to be data.frame. So anything that cannot be converted to data frame would produce an error.

SEE BELOW

rio/R/set_class.R

Lines 43 to 52 in c529994

.ensure_data_frame <- function(x) {
out <- structure(x, class = "data.frame")
if (nrow(out) == 0) {
return(out)
}
if (!length(rownames(out))) {
rownames(out) <- as.character(seq_len(length(out[, 1L, drop = TRUE])))
}
return(out)
}

We do not have tests for that, I think for format such as R (dput(), dget()) and dump, it's possible.

rio/R/import_methods.R

Lines 126 to 145 in c529994

#' @export
.import.rio_r <- function(file, which = 1, ...) {
.docall(dget, ..., args = list(file = file))
}
#' @export
.import.rio_dump <- function(file, which = 1, envir = new.env(), ...) {
source(file = file, local = envir)
if (missing(which)) {
if (length(ls(envir)) > 1) {
warning("Dump file contains multiple objects. Returning first object.")
}
which <- 1
}
if (is.numeric(which)) {
get(ls(envir)[which], envir)
} else {
get(ls(envir)[grep(which, ls(envir))[1]], envir)
}
}

Let me give it a test.

@chainsawriot
Copy link
Collaborator Author

I was wrong. Data will be forced to data frame anyway.

tmp <- tempfile(fileext = ".R")
dput(LETTERS, tmp)
rio::import(tmp)
#> NULL
#> <0 rows> (or 0-length row.names)

tmp <- tempfile(fileext = ".R")
dput(iris[1:2,], tmp)
rio::import(tmp)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa

tmp <- tempfile(fileext = ".dump")
dump(list = as.character(substitute(LETTERS)), file = tmp)
rio::import(tmp)
#> NULL
#> <0 rows> (or 0-length row.names)

Created on 2024-05-01 with reprex v2.1.0

@chainsawriot
Copy link
Collaborator Author

And of course, I have to pwn myself.

tmp <- tempfile(fileext = ".Rdata")
delayedAssign("x", system("gnome-calculator"))
save(x, eval.promises = FALSE, file = tmp)
rio::import(tmp)

@chainsawriot
Copy link
Collaborator Author

@hrbrmstr @jsonbecker Maybe I've missed something, but it doesn't appear to be safe.

tmp <- tempfile(fileext = ".Rdata")
delayedAssign("x", system("gnome-calculator"))
save(x, eval.promises = FALSE, file = tmp)

quarantine <- new.env(parent = emptyenv())
load(file = tmp,  envir = quarantine) ## not pwned yet

## load(file = tmp,  envir = quarantine); print(ls.str(quarantine, all.names = TRUE), max.level = 999, give.attr = TRUE) ## pwned
## load(file = tmp,  envir = quarantine); inherits(quarantine[["x"]], "data.frame") ## pwned
## load(file = tmp,  envir = quarantine); body(quarantine[["x"]]) ## pwned
## load(file = tmp,  envir = quarantine); class(quarantine[["x"]]) ## pwned

@chainsawriot
Copy link
Collaborator Author

traversc/qs#93 patched upstream.

@traversc
Copy link

traversc commented May 3, 2024

If you want to ensure that a loaded object contains no PROMSXPs I think the best way would be to recursively inspect it using the C API. There may be a pure R way to do it but using C is more straightforward.

@chainsawriot
Copy link
Collaborator Author

@traversc Yeah, I think this is probably the only viable solution. But I also think that it is not a thing this package should do. Maybe it should be the task of R itself (as I said in HenrikBengtsson/Wishlist-for-R#162) or a new package with a promise-free version of load().

@jsonbecker
Copy link
Collaborator

@chainsawriot I think that’s right.

Another thought that may be “worthwhile” — maybe loading an RDS requires a new flag called something like “trust” — that defaults to FALSE and provides a message like “Deserializing RDS may lead to arbitrary code execution. You should only load RDS files you produced or trust. Set trust=TRUE to successfully load the data.”

@chainsawriot
Copy link
Collaborator Author

@jsonbecker That's a nice idea to nudge users into not using those serialization formats (even non-binary ones like .R and .dump) for I/O-ing data frames. At the same time, it's a breaking change. So, we should consider this in rio 2.x.

Python's pickle has this warning in the doc. We could recommend users using other formats. It circles back to the discussion of adding an open (but at the same time, safe) binary format as the default for storing data frame #315 . Now the "default" tier of our supported formats is in the sorry state of either proprietary (excel spss stata) or unsafe serialization formats.

@jsonbecker
Copy link
Collaborator

jsonbecker commented May 6, 2024

@jsonbecker That's a nice idea to nudge users into not using those serialization formats (even non-binary ones like .R and .dump) for I/O-ing data frames. At the same time, it's a breaking change. So, we should consider this in rio 2.x.

Python's pickle has this warning in the doc. We could recommend users using other formats. It circles back to the discussion of adding an open (but at the same time, safe) binary format as the default for storing data frame #315 . Now the "default" tier of our supported formats is in the sorry state of either proprietary (excel spss stata) or unsafe serialization formats.

Yup, I thought about that from a semver perspective. Another two choices here are:

  1. Deprecate this now, and commit to the functionality going away in 2.0. OR
  2. Implement the trust flag with a default to true right now, with a message whenever trust is a missing argument that rio 2.x will change this to default to false and encourage that if you are continuing to use this feature in your code, you specify the trust flag to ensure future compatibility.

@chainsawriot
Copy link
Collaborator Author

chainsawriot commented May 16, 2024

Even with this d63ccb5 one can still be pwned with import_list(). The trust mechanism is not implemented in import_list().

tmp <- tempfile(fileext = ".Rdata")
delayedAssign("x", system("gnome-calculator"))
save(x, eval.promises = FALSE, file = tmp)

import_list(tmp, trust = FALSE)

rio/R/import_list.R

Lines 82 to 90 in ca019c9

.read_file_as_list <- function(file, which, setclass, rbind, rbind_label, ...) {
if (R.utils::isUrl(file)) {
file <- remote_to_local(file)
}
if (get_info(file)$format == "rdata") {
e <- new.env()
load(file, envir = e)
return(as.list(e))
}

chainsawriot added a commit that referenced this issue May 16, 2024
chainsawriot added a commit that referenced this issue May 16, 2024
* Add `trust` also for `import_list()` ref #406

* Add tests

* Update doc

* Use .import.rio_rdata
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants