Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] "Open in Binder" option for appropriate GitHub repos #1416

Open
RaoOfPhysics opened this issue Feb 6, 2018 · 21 comments
Open

Comments

@RaoOfPhysics
Copy link

RaoOfPhysics commented Feb 6, 2018

Background info


Request: Add a button in relevant Zenodo records for opening the GitHub repo in Binder for interactive notebooks etc., to encourage reproducibility in research, using GitHub ↔ Zenodo link.

  1. If source GitHub repo's README has Launch Binder badge, offer similar badge on Zenodo below the "Available in GitHub" badge.
  2. Work with Binder team so that contents of zipped repo can be launched in Binder directly from the Zenodo archive, if original GitHub repo disappears (preservation).

cc: @betatim from Binder

@betatim
Copy link

betatim commented Feb 6, 2018

This would be very cool in general.

I'd be even more interested in (2) so that binder can resolve Zenodo DOIs and launch directly from those instead of git repositories. Most of the work (on the binder side) would probably be in repo2docker which is the tool we use to actually build the containers. Right now it uses git to fetch contents or uses a local directory.

In jupyterhub/binderhub#216 we discussed this idea a bit for launching from OSF.io

@choldgraf
Copy link

Seconding Tim's comment - let the Binder team know if there's anything we can do to help out!

@RaoOfPhysics
Copy link
Author

I think this is related to the WIP feature of previewing .tar.gz and other compressed formats: #557.

@lnielsen
Copy link
Member

Agreed, it would be a very cool feature. Technically is looks like Binder uses Repo2Docker, which as far as I can tell needs a git repository in order to work. This I think it the main obstacle as Zenodo only archives a Zip-ball of the specific release. The work-around would be to simply point to the GitHub repository (because we do have the SHA of the release we archived), but then we essentially just bypass Zenodo, and there's no real added value over just having the badge on the GitHub repo.

@RaoOfPhysics
Copy link
Author

RaoOfPhysics commented Feb 12, 2018

Thanks for getting back to us on this issue, @lnielsen! Some thoughts:

The work-around would be to simply point to the GitHub repository (because we do have the SHA of the release we archived) […]

Rather than point to the GitHub repository directly, it would make sense to have a "Binder" badge on Zenodo pointing to the specific commit/tag that was archived on Zenodo (since Binder can handle branches, tags or commits). This means that you're able to directly jump to the same version of the code/repo that is linked from the DOI.

[…] then we essentially just bypass Zenodo, and there's no real added value over just having the badge on the GitHub repo.

Well, if you point to the specific commit/tag, there is still value, since the badges in GitHub typically point to the latest commit in master. However, from the point of "preservation" and "persistence" that a DOI is supposed to provide, it would make sense if we could indeed bypass GitHub and render the repo directly from Zenodo, so that the content is still "interactive" even if the original GitHub repo gets taken down.


@choldgraf, @betatim: Is there a way to "fake" a Git repo from the Zip-ball? By adding an essentially purposeless* git init of some kind in the repo2docker workflow? So:

  • repo2docker unpacks Zip-ball → repo2docker runs git init → Binder points to contents/notebook(s).

  • edit

@choldgraf
Copy link

@choldgraf, @betatim: Is there a way to "fake" a Git repo from the Zip-ball? By adding an essentially git init of some kind in the repo2docker workflow?

that's a great question - I think this would be feasible, probably as a buildpack for repo2docker (that could either be done within r2d, or as an "extension" that lives in a separate repository). That buildpack would insert the lines into a dockerfile that does the unzipping etc.

I just opened this issue to discuss within r2d: jupyterhub/repo2docker#234

@yuvipanda
Copy link

This would be awesome!

I think there are two parts for this:

  1. Adding ability to read from a ZIP file at a given URL to repo2docker
  2. Adding ability to read a versioned zonodo identifier to the appropriate zip file + caching semantics to binderhub.

In the meantime, I think adding a link to the tagged version on github is the simplest thing to do!

@RaoOfPhysics
Copy link
Author

Hey, @yuvipanda. At the moment, yeah, looking for a Binder badge and then linking to the appropriate version on GitHub is an interim solution -- depending on how @lnielsen and co. prioritise this, of course! :)

Concerning:

  1. Adding ability to read a versioned zonodo [sic] identifier to the appropriate zip file + caching semantics to binderhub.

Zenodo grabs repos only when a new release is issued and I think the GitHub badge on the Zenodo entry itself points to the appropriate tree on GitHub. Does this help at all?

@lnielsen
Copy link
Member

The badge would be pretty easy to add, if we already know that the github repo supports binder, but it's not easy for us to detect if binder is supported. What we could do is allow adding links in "releated identifiers" field, that would then render a logo like github that allows you to launch it in binder.

@choldgraf
Copy link

@lnielsen a few thoughts that come to mind:

  1. Check if a repo has a binder badge in their README
  2. Check if a repo has a tag of some kind (e.g. "binder-ready", "binder")
  3. Check if a repo has one or more of the config files and, if so, try and build it via the Binder build API...if it returns as successful, then proceed.

Just spitballing here :-)

@betatim
Copy link

betatim commented Jun 20, 2018

I think knowing if a repo will do something useful if you launch it on a BinderHub is very hard for a computer. many repositories will build and launch but most of those don't work :-( So I would look for the binder badge in the README, but that is also a crude heuristic (how would you find (at scale) repositories that have a binder badge that points to a different instance than mybinder.org?) -> Making the 'binder-ready' status human opt-in is probably the best and then it can be machine-readable as well.

Is there a format/file that zenodo looks at to extract extra information for a repository? Similar to a .travis.yml or some such?

@lnielsen
Copy link
Member

I was trying to avoid having to the parse files in the repository :-)

@lnielsen
Copy link
Member

I would say the best way would be via CodeMeta somehow - https://codemeta.github.io since we're planning to enable reading metadata from the codemeta file.

@betatim
Copy link

betatim commented Jun 15, 2019

BinderHub and repo2docker now support launching from Zenodo DOIs: https://twitter.com/mybinderteam/status/1139136841792315392

@Glignos Glignos added the GitHub label Jul 5, 2019
@slint slint added this to To do in Asclepias/GitHub Sprint (Q3 2019) via automation Jul 8, 2019
@slint slint moved this from To do to GitHub Triage in Asclepias/GitHub Sprint (Q3 2019) Jul 8, 2019
@slint slint removed this from GitHub Triage in Asclepias/GitHub Sprint (Q3 2019) Jul 8, 2019
@slint
Copy link
Member

slint commented Sep 30, 2019

As mentioned in #1416 (comment), I think a sensible solution would be to display a Binder logo with the proper mybinder link (similar to the GitHub one), when there is a link to https://mybinder.org in the "related identifiers" (example record: https://zenodo.org/record/3402938)

My only concern, and probably Binder team's (cc @betatim, @yuvipanda, @choldgraf), is creating a much bigger exposure of the MyBinder service, and DoS-ing it, which would end up making users follow a link to a "broken" page. Imagine that users that end up on a Zenodo software record which has a Binder logo, might just click it out of curiosity.

I've read the Reliability docs, and the rate-limiting mechanisms that are in-place look good, so I guess it's just a question if the MyBinder service maintainers are ok with that :)

@choldgraf
Copy link

choldgraf commented Sep 30, 2019

As a general rule, spikes in traffic shouldn't be too big of a deal so long as they aren't gigantic spikes. What kind of traffic do you all imagine sending? :-)

As a reference, you can get an idea for the load and "spikiness" of repositories for the public binderhub deployment (the one at mybinder.org) here:

https://grafana.mybinder.org/d/fZWsQmnmz/pod-activity?refresh=1m&orgId=1&var-cluster=default
We've had folks launch ~100-200 binder at once when they were using it to teach courses and such, sometimes the launch takes longer if we need to scale up to a new node, but in general it should be OK.

The hard limit is 100 simultaneous sessions for a single repository.

@rgayler
Copy link

rgayler commented Oct 1, 2019

... when there is a link to https://mybinder.org in the "related identifiers" (example record: https://zenodo.org/record/3402938)

As a related issue, would it be possible to have more specific metadata in the "related identifiers" for this use case? The metadata values associated with that URL in the "related/alternate identifiers" section are pretty uninformative ("Supplementary material" & "Other"). Would it be possible to add new metadata values like "Executes this upload" and "Live computing environment" to make it clear that the link allows the reader to execute the software? I think this will become a relatively common use case. Thanks

@nuest
Copy link
Contributor

nuest commented Oct 8, 2019

👍 for the relation type. My suggestion would be "Interactive (computing) environment", as Binders are for humans to use, and not a one time execution (which "Executes this upload" could mean).

@slint
Copy link
Member

slint commented Oct 8, 2019

The available relation type vocabulary for the related identifiers is based on the DataCite v4.1 schema, so I would avoid adding a new "custom" relation type.

IMHO, the most fitting relation type would be isSourceOf (i.e. "has this upload as its source" in the upload form), in the sense that the Zenodo record is the source that Binder uses to execute it:

image

If we have a general consensus on that, I believe we can ship this in the next release :)

PS (@choldgraf): Today's silly question: copyright wise, is it ok for us to use the Binder logo from your repo?

@betatim
Copy link

betatim commented Oct 8, 2019

@slint yes you can use the logo. Without us doing anything extra it is licensed like this. Which is probably not ideal for artwork.

If you are going to make a "button" for people to press there is also https://static.mybinder.org/badge_logo.svg which we recommend as the "button to launch a binder"

@rgayler
Copy link

rgayler commented Oct 8, 2019

@slint I hadn't realised the relation type for related/alternate identifiers was taken from the DataCite v4.1 schema. Perhaps that could be stated in the head text of the esection, after the text stating the range of identifiers that are accepted.

I agree that of the available relation types, isSourceOf is the most appropriate and I have updated my Zenodo record that is being used as an example.

Is the resource type field based on resourceTypeGeneral in DataCite 4.1 (Table 7)? If so, the interactiveResource ("A resource requiring interaction from the user to be understood, executed, or experienced") seems to me to be the most appropriate value. Unfortunately, this isn't available in the drop down list, so I opted for "Other".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants