Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to cache file hashes #133

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Try to cache file hashes #133

wants to merge 3 commits into from

Conversation

krlmlr
Copy link
Collaborator

@krlmlr krlmlr commented Oct 20, 2016

Within a session, this should avoid recomputing file hashes where the file size and mtime have not changed. This exploits remake's internal caching.

Fixes #110.

Within a session, this should avoid recomputing file hashes where
the file size and mtime have not changed.  This exploits remake's
internal caching, which is not tested in this commit (and this
entirely lacks any sort of integration test).

For #110
Copy link
Collaborator Author

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

expect_equal(st$db$get(filename)$hash, h)

## Does not call back to the underlying hash function:
with_mock("hash_files" = lava,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to set "remake::hash_files" so that tests work in R CMD check. (Don't know why.)

@@ -17,3 +17,66 @@ test_that("file store", {
st$del(file)
expect_false(st$exists(file))
})

test_that("caching file store", {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about splitting the test so that each test_that() call tests only one aspect of the behavior?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that would be better style. It can get a bit tedious though as there's so much side effect heavy bits here (remake is basically all side effect)

## Then, we could change the file *size* and force rehashing.
writeBin(bytes[-1], filename)
h2 <- digest::digest(bytes[-1], serialize = FALSE)
with_mock("hash_files" = lava,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

}
info$hash <- hash_files(filename, named = FALSE)
self$db$set(filename, info)
info$hash
} else {
stop(sprintf("file %s not found in file store", filename))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling abnormal conditions first saves one indent, but that's a matter of style.

))

file_info <- function(filename) {
info <- file.info(filename, extra_cols = FALSE)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function support len(filename) != 1? How about renaming to file_mtime_and_size() or similar?

}
}
info$hash <- hash_files(filename, named = FALSE)
self$db$set(filename, info)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would inserting the whole object at once provide better encapsulation?

self$db$set(filename, list(info = info))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants