Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

github: include submodules in zip archives from github #1036

Open
davidanthoff opened this issue Mar 27, 2017 · 7 comments
Open

github: include submodules in zip archives from github #1036

davidanthoff opened this issue Mar 27, 2017 · 7 comments

Comments

@davidanthoff
Copy link

It would be great if the archive that gets automatically created when I create a release on github would include any git submodules that my git repo might contain. I believe that e.g. this package https://github.com/Kentzo/git-archive-all makes that pretty simple.

@krzysztof
Copy link
Contributor

krzysztof commented Mar 30, 2017

Thank you for the suggestion!

One thing to consider is whether we would have the rights to include all submodules code in the archive along with the user's code.

@lnielsen lnielsen changed the title The archives created from github should include submodules github: include submodules in zip archives from github Apr 4, 2017
@Glignos Glignos added the GitHub label Jul 5, 2019
@cmbant
Copy link

cmbant commented Oct 31, 2019

Regarding rights: this would not be an issue for repositories where all submodules are also on zenodo, so this could at least be enabled for restricted cases. Including the submodules is of course essential for the code to actually be reproducible.

@CSchoel
Copy link

CSchoel commented Nov 26, 2020

As workaround one could use a CI script which generates releases that include the content of the relevant submodules as asset. However, due to #1235 , this currently also does not work.

For example, this is the setup I use for Travis CI:

before_deploy:
  - zip -r inamo-${TRAVIS_TAG}.zip . -x out\* plots\* .git\* regRefData/.git\*

deploy:
  provider: releases
  edge: true
  api_key:
    secure: "***"
  file: inamo-${TRAVIS_TAG}.zip
  release_notes_file: README.md
  tag_name: ${TRAVIS_TAG}
  name: InaMo ${TRAVIS_TAG}
  on:
    tags: true
  draft: true

@CSchoel
Copy link

CSchoel commented Nov 26, 2020

On a different note, could something like this be a solution for the licensing issues? Or would that mean too much configuration effort in the backend?

Zenodo_submodules_suggestion

I think not including submodules and not warning users about the fact that important files might be missing could be a major issue for reproducibility. For example, I only noticed that the submodule content was missing for one of my uploads, because an especially diligent reviewer actually downloaded the Zenodo version and tried to run the simulations described in my article.

@ianknowles
Copy link

Its not reasonable for Zenodo to know or try to verify what submodules the user intends to include or has the rights to so just let it be a config item in .zenodo.json

@ingomueller-net
Copy link

Including submodules, potentially after confirming the necessary rights from the user, seems like an important feature to me!

As a work-around, I am currently using the following script, which inlines all submodules, which I do for temporary commits that become the releases on Zenodo:

#!/usr/bin/env bash

# Go to root directory of repo
cd "$(dirname "${BASH_SOURCE[0]}")"
cd "$(git rev-parse --show-toplevel)"

# Test if submodules need to be updated
if [[ -z "$(git submodule foreach echo hello)" ]]
then
    echo "No git submodules found. Maybe you need to run 'git submodule update'?"
    exit 1
fi

# Collect info about sub-modules
data="$(git submodule foreach -q 'echo "$sm_path\t$sha1\t$(git config remote.origin.url)"')"

# Remove sub-modules
echo "$data" | while read line
do
    path="$(echo "$line" | cut -f1)"
    git rm -rf "$path"
done

# Commit temporary commit and record its SHA1
git commit -m "TMP: Removing git submodules."
first_sha1="$(git rev-parse HEAD)"

# Add content of old submodules back, one merge commit at the time
echo "$data" | while read line
do
    path="$(echo "$line" | cut -f1)"
    sha1="$(echo "$line" | cut -f2)"
    rurl="$(echo "$line" | cut -f3)"

    git subtree add --prefix "$path" "$rurl" "$sha1" --squash
done

# Amend all merge commits into the previous one
git reset --soft "$first_sha1"
git commit --amend -m "Inline sub-modules."

I am using it as follows:

  1. Make a fresh recursive clone of the main repository from Github.
  2. Create a temporary branch at the commit you want to release.
  3. Run inline-submodules.sh from this folder. This removes all git submodules, copies their content into their respective
    original paths, and creates a new commit with the now inlined files. This is necessary for including the files into the Zenodo archive, which does not automatically contain the files from submodules.
  4. Tag the new commit and push the tag to Github.
  5. Create a new release or pre-release on Github. This automatically updates the entry on Zenodo as well.

@johentsch
Copy link

johentsch commented Nov 22, 2023

The fact that this feature remains unaddressed since 6.5 years does not speak for Zenodo. As someone noted above, submodules are essential for archiving the state of a repository (note that the ZIP contains empty folders and not even a reference to the submodules' remotes and commits, which would somewhat help reproducibility). Also it has been noted above, that the problem could be somewhat circumvented by addressing #1235 but here is yet another issue unaddressed since (soon to be) 6.5 years.

Our example showcases at what point archiving a GitHub "meta-repository" with Zenodo is dysfunctional. By meta-repository I mean a repository whose main purpose it is to group other repositories which it includes as submodules (all part of the same organization, with the same license). This is the case, for instance, for our Annotated Corpus of Tonal Piano Music from the Long 19th Century which is accompanied by a data paper that is currently in press and references this record's DOI. However, the archived ZIP file does not contain a single relevant data item (except the top-level metadata). We cannot work around the problem either because Zenodo does not allow us to add files to a public record in retrospect, that is, after it was automatically created by a GitHub release.

If #1235 was addressed, at least we could automatically include the Frictionless datapackages which we include with every release but in combination, these two issues together are a blocker for our purposes. Too bad because as Lausanne-based institution (EPFL) we have a very strong incentive to be preferring Zenodo over figshare or Dryad. But it would be great to see some development efforts on the Zenodo side that turn involuntary users into happy users. Investing in the technical infrastructure would also help instigate better trust in Zenodo's longterm archival efforts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants