Include doi as required field in meta.yaml #158

jbusecke · 2023-01-13T19:38:24Z

I am working through a documentation with @rabernat which outlines how a proper citation using pangeo-forge data would look like. We noticed that the catalog page does not display the doi, which is needed to cite the original data source in a paper.

I propose to add a required field in the meta.yaml that contains the doi (or possibly a list of dois) for a given dataset. This could be then be used to have a 'copy citation' button on each catalog entry.

The text was updated successfully, but these errors were encountered:

rabernat · 2023-01-13T19:50:25Z

More broadly, we could think about how we want recipes to be cited. Here is example we came up with today for this dataset.

The data in this study originated from the NASA "GPM IMERG Late Precipitation L3 1 day 0.1 degree x 0.1 degree V06 (GPM_3IMERGDL)" dataset (Huffman et al., 2019)
The data were accessed via the Pangeo Forge ARCO data repository (Stern et al., 2022) on Jan. 13, 2023.
The Pangeo Forge recipe that generated the data is located at https://pangeo-forge.org/dashboard/feedstock/81

Huffman, G.J., E.F. Stocker, D.T. Bolvin, E.J. Nelkin, Jackson Tan (2019), GPM IMERG Late Precipitation L3 1 day 0.1 degree x 0.1 degree V06, Edited by Andrey Savtchenko, Greenbelt, MD, Goddard Earth Sciences Data and Information Services Center (GES DISC). https://doi.org/10.5067/GPM/IMERGDL/DAY/06

Stern, Charles, R. Abernathey, J. Hamman, R. Wegener Rachel, C. Lepore, S. Harkins, A. Merose.
Pangeo Forge: Crowdsourcing Analysis-Ready, Cloud Optimized Data Production.
Frontiers in Climate, 10 February 2022
https://doi.org/10.3389/fclim.2021.782909

What's missing here is a good citation for the recipe author (in this case, @briannapagan). Brianna, I'm curious, what sort of acknowledgement of your role would make sense here?

briannapagan · 2023-01-16T17:50:30Z

I would second showing the doi/list of doi's. Recipe authors have the responsibility of properly citing the original dataset. As for acknowledging the recipe auhors - is adding a doi for the recipe itself over-doing it? Do we need recipe author acknowledgement at all? Using NASA as an example, the archivers/folks who are working at the data centers do not get acknowledgement for maintaining the data collections themselves, just a nod potentially to the data center itself.

jbusecke · 2023-01-18T17:08:18Z

Thanks for that @briannapagan!

I personally think we have the opportunity to change the status quo for the better here. I would personally advocate for a doi per recipe, which I think will acknowledge the important work which will be the foundation of how climate data science might be done in the future?

We cannot assume that recipe maintainers are financially compensated for their work (as is the case at NASA?), so I think providing an easy way to acknowledge their efforts would be fair, and might create a needed incentive for a diverse group of people to contribute recipes?

A practical consideration for reproducibility: If we e.g. decide to implement a zenodo webhook for feedstocks, we could get a doi + a secondary archived location for the code. This would increase the chance of researchers in the future to actually reproce a given dataset with a particular version of the recipe (even if it has to be run on your local computer).

briannapagan · 2023-01-18T17:15:54Z

Along the same lines, is the recipe maintainer which receives the acknowledgement also responsible for maintaining in perpetuity? I am going to sound like a broken record, but data archives are very much alive. If some reprocessing error is caught, and original data source republished, the zarr store must be updated. Is the onerous on the shoulders of the maintainer to always ensure the zarr store is accurate? How do we connect the upstream data providers to this?

briannapagan · 2023-01-18T17:17:29Z

I personally think we have the opportunity to change the status quo for the better here. I would personally advocate for a doi per recipe, which I think will acknowledge the important work which will be the foundation of how climate data science might be done in the future?

Also great! +2 for doi per recipe.

jbusecke · 2023-01-18T18:58:05Z

Is the onerous on the shoulders of the maintainer to always ensure the zarr store is accurate? How do we connect the upstream data providers to this?

Excellent point! Naively Id think we should aim to ~~make~~ involve them feedstock maintainers/contributors, but I realize this might be hard.

cisaacstern · 2023-12-07T21:54:36Z

👋 all, I've moved this issue here to pangeo-forge-runner because as of #134, the schema for meta.yaml lives here.

jbusecke · 2023-12-14T18:07:10Z

I think there are several questions mixed in this discussion:

Should we require the doi as part of meta.yaml (clearly a runner issue)
How do we cite the code of a recipe itself (to me this is not related to runner, not sure where it belongs but its more of a meta/docs question I guess)

Any suggestions where to separate the discussion on 2?

Moving forward here: I am a strongly for enforcing dois in the meta.yaml as a default! Perhaps we can have some sort of an opt-out option for testing though?

cisaacstern transferred this issue from pangeo-forge/pangeo-forge.org Dec 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include doi as required field in meta.yaml #158

Include doi as required field in meta.yaml #158

jbusecke commented Jan 13, 2023

rabernat commented Jan 13, 2023

briannapagan commented Jan 16, 2023

jbusecke commented Jan 18, 2023 •

edited

briannapagan commented Jan 18, 2023

briannapagan commented Jan 18, 2023

jbusecke commented Jan 18, 2023 •

edited

cisaacstern commented Dec 7, 2023

jbusecke commented Dec 14, 2023

Include doi as required field in meta.yaml #158

Include doi as required field in meta.yaml #158

Comments

jbusecke commented Jan 13, 2023

rabernat commented Jan 13, 2023

briannapagan commented Jan 16, 2023

jbusecke commented Jan 18, 2023 • edited

briannapagan commented Jan 18, 2023

briannapagan commented Jan 18, 2023

jbusecke commented Jan 18, 2023 • edited

cisaacstern commented Dec 7, 2023

jbusecke commented Dec 14, 2023

jbusecke commented Jan 18, 2023 •

edited

jbusecke commented Jan 18, 2023 •

edited