New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to cache file hashes #133
base: master
Are you sure you want to change the base?
Conversation
Within a session, this should avoid recomputing file hashes where the file size and mtime have not changed. This exploits remake's internal caching, which is not tested in this commit (and this entirely lacks any sort of integration test). For #110
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks.
expect_equal(st$db$get(filename)$hash, h) | ||
|
||
## Does not call back to the underlying hash function: | ||
with_mock("hash_files" = lava, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to set "remake::hash_files" so that tests work in R CMD check. (Don't know why.)
@@ -17,3 +17,66 @@ test_that("file store", { | |||
st$del(file) | |||
expect_false(st$exists(file)) | |||
}) | |||
|
|||
test_that("caching file store", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about splitting the test so that each test_that() call tests only one aspect of the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be better style. It can get a bit tedious though as there's so much side effect heavy bits here (remake is basically all side effect)
## Then, we could change the file *size* and force rehashing. | ||
writeBin(bytes[-1], filename) | ||
h2 <- digest::digest(bytes[-1], serialize = FALSE) | ||
with_mock("hash_files" = lava, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
} | ||
info$hash <- hash_files(filename, named = FALSE) | ||
self$db$set(filename, info) | ||
info$hash | ||
} else { | ||
stop(sprintf("file %s not found in file store", filename)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling abnormal conditions first saves one indent, but that's a matter of style.
)) | ||
|
||
file_info <- function(filename) { | ||
info <- file.info(filename, extra_cols = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this function support len(filename) != 1? How about renaming to file_mtime_and_size() or similar?
} | ||
} | ||
info$hash <- hash_files(filename, named = FALSE) | ||
self$db$set(filename, info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would inserting the whole object at once provide better encapsulation?
self$db$set(filename, list(info = info))
Within a session, this should avoid recomputing file hashes where the file size and mtime have not changed. This exploits remake's internal caching.
Fixes #110.