-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
github: include submodules in zip archives from github #1036
Comments
Thank you for the suggestion! One thing to consider is whether we would have the rights to include all submodules code in the archive along with the user's code. |
Regarding rights: this would not be an issue for repositories where all submodules are also on zenodo, so this could at least be enabled for restricted cases. Including the submodules is of course essential for the code to actually be reproducible. |
As workaround one could use a CI script which generates releases that include the content of the relevant submodules as asset. However, due to #1235 , this currently also does not work. For example, this is the setup I use for Travis CI: before_deploy:
- zip -r inamo-${TRAVIS_TAG}.zip . -x out\* plots\* .git\* regRefData/.git\*
deploy:
provider: releases
edge: true
api_key:
secure: "***"
file: inamo-${TRAVIS_TAG}.zip
release_notes_file: README.md
tag_name: ${TRAVIS_TAG}
name: InaMo ${TRAVIS_TAG}
on:
tags: true
draft: true |
On a different note, could something like this be a solution for the licensing issues? Or would that mean too much configuration effort in the backend? I think not including submodules and not warning users about the fact that important files might be missing could be a major issue for reproducibility. For example, I only noticed that the submodule content was missing for one of my uploads, because an especially diligent reviewer actually downloaded the Zenodo version and tried to run the simulations described in my article. |
Its not reasonable for Zenodo to know or try to verify what submodules the user intends to include or has the rights to so just let it be a config item in |
Including submodules, potentially after confirming the necessary rights from the user, seems like an important feature to me! As a work-around, I am currently using the following script, which inlines all submodules, which I do for temporary commits that become the releases on Zenodo: #!/usr/bin/env bash
# Go to root directory of repo
cd "$(dirname "${BASH_SOURCE[0]}")"
cd "$(git rev-parse --show-toplevel)"
# Test if submodules need to be updated
if [[ -z "$(git submodule foreach echo hello)" ]]
then
echo "No git submodules found. Maybe you need to run 'git submodule update'?"
exit 1
fi
# Collect info about sub-modules
data="$(git submodule foreach -q 'echo "$sm_path\t$sha1\t$(git config remote.origin.url)"')"
# Remove sub-modules
echo "$data" | while read line
do
path="$(echo "$line" | cut -f1)"
git rm -rf "$path"
done
# Commit temporary commit and record its SHA1
git commit -m "TMP: Removing git submodules."
first_sha1="$(git rev-parse HEAD)"
# Add content of old submodules back, one merge commit at the time
echo "$data" | while read line
do
path="$(echo "$line" | cut -f1)"
sha1="$(echo "$line" | cut -f2)"
rurl="$(echo "$line" | cut -f3)"
git subtree add --prefix "$path" "$rurl" "$sha1" --squash
done
# Amend all merge commits into the previous one
git reset --soft "$first_sha1"
git commit --amend -m "Inline sub-modules." I am using it as follows:
|
The fact that this feature remains unaddressed since 6.5 years does not speak for Zenodo. As someone noted above, submodules are essential for archiving the state of a repository (note that the ZIP contains empty folders and not even a reference to the submodules' remotes and commits, which would somewhat help reproducibility). Also it has been noted above, that the problem could be somewhat circumvented by addressing #1235 but here is yet another issue unaddressed since (soon to be) 6.5 years. Our example showcases at what point archiving a GitHub "meta-repository" with Zenodo is dysfunctional. By meta-repository I mean a repository whose main purpose it is to group other repositories which it includes as submodules (all part of the same organization, with the same license). This is the case, for instance, for our Annotated Corpus of Tonal Piano Music from the Long 19th Century which is accompanied by a data paper that is currently in press and references this record's DOI. However, the archived ZIP file does not contain a single relevant data item (except the top-level metadata). We cannot work around the problem either because Zenodo does not allow us to add files to a public record in retrospect, that is, after it was automatically created by a GitHub release. If #1235 was addressed, at least we could automatically include the Frictionless datapackages which we include with every release but in combination, these two issues together are a blocker for our purposes. Too bad because as Lausanne-based institution (EPFL) we have a very strong incentive to be preferring Zenodo over figshare or Dryad. But it would be great to see some development efforts on the Zenodo side that turn involuntary users into happy users. Investing in the technical infrastructure would also help instigate better trust in Zenodo's longterm archival efforts. |
It would be great if the archive that gets automatically created when I create a release on github would include any git submodules that my git repo might contain. I believe that e.g. this package https://github.com/Kentzo/git-archive-all makes that pretty simple.
The text was updated successfully, but these errors were encountered: