Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo metadata #42

Open
mrchristian opened this issue Sep 5, 2018 · 16 comments
Open

Repo metadata #42

mrchristian opened this issue Sep 5, 2018 · 16 comments
Assignees

Comments

@mrchristian
Copy link
Contributor

mrchristian commented Sep 5, 2018

The issue has been raised by @danielskatz on Twitter https://twitter.com/danielskatz/status/1036992161508667392 about the need to 'declare the metadata for the repository'.

I will review our current coverage of this issue and look how to proceed.

I will document the issue in full below.

@mrchristian
Copy link
Contributor Author

The current position for recording metadata of the repository has been for a 'lite' approach. This is mainly informed by trying to keep the amount of ground covered in the instructions to a minimum.

Here is what is currently described for recording metadata:

https://github.com/OpenScienceMOOC/Module-5-Open-Research-Software-and-Open-Source/blob/master/content_development/Task_2.md#getting-a-doi-

To summarise the process:

  1. Zenodo captures author names from the GitHub repository
  2. Admins of the Zenodo can edit metadata
  3. Zenodo generates version numbers
  4. Zenodo assigns DOIs
  5. Zenodo has a variety of metadata fields that can be filled in

@Protohedgehog
Copy link
Contributor

Brilliant, thanks @mrchristian. Is there a way we can use the communities function of Zenodo to make things a little easier here? I'm not sure exactly what sort of things this allows just yet https://zenodo.org/communities/open-science-mooc/?page=1&size=20

@mrchristian
Copy link
Contributor Author

Communities - i think it just functions as a collection of some sorts. I'll work back through Zonodo's metadata editing and generation process, I have a bunch of repos on Zenodo I can try this out on. Then I'll have a think about how to best wade through the swamp :-)

@mrchristian
Copy link
Contributor Author

Looking at Zenodo's metadata representation of a deposit it would seem to use that as the editing site of the metadata and then put a file back into the GitHub repository as some point if needed as whatever format is preferred, BibLatex, etc.

You can also see the fields listed here http://developers.zenodo.org/#depositions

The owner of a Zenodo deposit can edit the metadata via the web interface, not sure if there is group access.

The reason I suggest using Zenodo as the key location for maintaining metadata is that Zenodo will do the job of distributing the metadata.

As an idea for later it would be nice to get use the Zenodo API to write the metadata back to your repo in whatever flavor of markup preferred.

I'll give it a spin on a dummy repo

@Protohedgehog
Copy link
Contributor

OK, awesome, thanks @mrchristian! Will be interesting to see how this can ultimately feed back in either to how we index the MOOC content, or as part of the learning content.

@mrchristian
Copy link
Contributor Author

Back on the case now, will get this sorted this week. First is to consult @zenodo support and get a usable representation of their metadata schema, then consult the #softwarecitation community about the dilemma of which route to take: Zenodo output, CFF, CodeMeta, BibTex.

Its so annoying that these things are not clear and worked out already. If only all that money being wasted on research service companies profits was actually used to fix basic plumbing problems in academia, Jees :-) The prisoner emerges from the cave.

@tosteiner
Copy link
Contributor

tosteiner commented Sep 16, 2018

@mrchristian not sure if this helps, but Chris Gorgolewski has written a neat run-down on how this might work automatically:

and I guess sticking to a minimal content scheme for author names of

{
      "name": "Rabbit, Roger",
      "orcid": "0000-0002-468-1234"
}

would easily suffice, don't you think?

(I think we've had the same issue over at Open-Scholarship-Strategy/site#30 hence I'm just copying it here 😉 - sadly, my personal skills at proper metadata coding are rather limited, it was rather a copy&paste try 'n' error thing :) )

@mrchristian
Copy link
Contributor Author

Hey, thank you, brilliant. Do you think this approach enables the contributor information to get incorporated into the Zenodo and DataCite records for the repository?

That's one of the goals I'm trying to achieve as thats the information others are harvesting.

Thanks again :-)

@tosteiner
Copy link
Contributor

As far as I understood it, it adds the possibility to push author info to the Zenodo repo, so yes, it's incorporated with Zenodo... and DataCite then picks that up and uses it for its own purposes :)

@mrchristian
Copy link
Contributor Author

AOK, the super.

Zenodo outputs the 'deposit' metadata in a variety of formats so others can use it.

I can see on the example repo they have extensive metadata, I'll try out the process on a test repo, or on Zenodo's sandbox and see if the creator names get picked up into the system.

https://zenodo.org/record/581704/export/dcite4

@mrchristian
Copy link
Contributor Author

Hi,

Glacially slow reply, must be on some low frequency packet radio system.

But I'm finally back on it and I've got it cracked. Well at least whats going on. More to do to really sort out the full situation, a bit out of my scope, but at least I can now recommend a better solution than we started with.

So, whats the 'craic' as they say.

Zenodo picks up a file called .zenodo.json to read metadata. Of course no one makes this clear, instead its hidden in tab, deep in the Zenodo repository area.

JSON Export
Zenodo automatically extracts metadata about your repository from GitHub APIs. For example, the authors are determined from the repository's contributor statistics. The automatic extraction is solely a best guess. Add a .zenodo.json file the root of your repository to explicit define the metadata. The format of file is the same as for our REST API (use e.g. below JSON to get started).

The results of doing this is what @tosteiner pointed me too, thank you. But I then needed to understand whats going on.

I did a test in Zenodo's sandbox site.

https://sandbox.zenodo.org/record/246036

from repo

https://github.com/hybrid-publishing-group/book-coding/tree/master

You can actually write lots of the metadata here, see example, but not things like any UIDs.

https://github.com/hybrid-publishing-group/book-coding/blob/master/.zenodo.json

This is more like what we would need, just names, although even in this case there can be 'contributors' and 'creators', also with types, 'editor', 'researcher'. etc.

Soooooo.... In a nutshell my recommendation is as follows.

A key objective is to get rich person metadata into the DOI information ecology and in the repository.

So using the .zenodo.json file is a vast improvement over the GitHub user name.

NEXT

I need to refine the process, workflow and give exact instructions, with an example, and find out from Zenodo and their API documentation and support the extent of what person fields can be added. http://developers.zenodo.org/#metadata-formats

Consult with Zenodo support, software citation community. As I have heard that CodeMeta files can also be read, maybe others can too, like BibTeX?

My aim would be a write up for tomorrow, then consult and then wrap it up. I'll also write a blog post on this as it needs more profile as currently I couldnt find any documentation on the process.

Cheers

Simon

@mfenner
Copy link

mfenner commented Oct 4, 2018

Adding support for codemeta is on the Zenodo roadmap and should make this much easier.

@danielskatz
Copy link
Contributor

I don't know if the CodeMeta part is working yet, but it certainly will be. Caltech Data can do this now, and they use the same underlying software as Zenodo. see https://twitter.com/CaltechData/status/972163704585269248

@mrchristian
Copy link
Contributor Author

Thanks for CodeMeta pointers. The CalTech example also helps make the picture clearer as well, its just a choice of what file the Zenodo instance is instructed to pick up, in CalTech's case like so https://github.com/caltechlibrary/dataset/blob/master/codemeta.json

@danielskatz
Copy link
Contributor

Caltech, please :)

@tosteiner
Copy link
Contributor

@mrchristian sorry for nagging on about this... any news on the creation and layout for a OSMOOC-specific .zenodo.json? Or can we adapt the one you mentioned earlier, from the sandbox example?

I guess starting with the built-in option would be great to get things going, and then evolve from that to future implementations such as the CalTech / codemeta.json - would that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants