Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should pin_hash be used to track changes in the metadata as well as pin contents? #739

Open
juliasilge opened this issue May 5, 2023 · 2 comments
Labels
feature a feature request or enhancement

Comments

@juliasilge
Copy link
Member

In both #727 and #735 we noticed that pin_hash only tracks pin contents, not pin metadata. If we ever use that hash to compare, we don't pick up on changes to metadata. Some elements of the metadata that folks may want to update and would not change pin_hash include title, description, tags, and user metadata.

🎯 Should we change something about the hashing strategy to also check for changes in a subset of the metadata?

Some ideas:

  • Concatenate pin contents and metadata and take one hash (not loving this idea)
  • Add a metadata_hash to the metadata to check separately
  • Hash the metadata and do hash(hashes) of pin contents hash plus metadata hash, the way we do for pinning multiple files:

pins-r/R/pin-read-write.R

Lines 261 to 262 in cc3c160

hashes <- map_chr(paths, hash_file)
hash(hashes)

@juliasilge juliasilge added the feature a feature request or enhancement label May 8, 2023
@dareneiri
Copy link

Hi @juliasilge, just adding some user feedback on this question. We use pins frequently for production purposes (which has been a game-changer for us!).

Having flexibility to not include the metadata as part of the hashing strategy would be preferred. It's acceptable if the default option includes metadata as part of the hash checking, however.

We add our own metadata to determine if the version of the pin content is used for testing/development purposes or production purposes. We also use the hash of the content to verify if the pin content is the same or not.

For example, we may set the metadata for a version of a specific pin, which in the background is defined by R_CONFIG_ACTIVE:

user:
  mode: prod

R_CONFIG_ACTIVE is defined differently in our Workbench environment and production environment.
In production, our scripts know which version of the pin to read.
In development, we won't accidentally overwrite pins that are currently used in production until we're absolutely ready. In that case, the mode is set to test

This avoids having multiple pin content on Connect with similar names like "MyGLMModelSummary-prod" and "MyGLMModelSummary-test" and having to switch back and forth in Workbench on which pin to read in when running the script. Or switching between our staging server or production server for which board to register.

Hope this perspective is useful!

@juliasilge
Copy link
Member Author

Thanks for this perspective @dareneiri! 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants