Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting steps for Errata tracking #28

Open
meteorologist15 opened this issue Jan 7, 2022 · 0 comments
Open

Documenting steps for Errata tracking #28

meteorologist15 opened this issue Jan 7, 2022 · 0 comments

Comments

@meteorologist15
Copy link

Proposal series of steps for processing Errata on an automated basis for ESGF AWS cloud:

  1. Determine the best time frame for an automated checking, reporting on, and acting on new retractions on the ESGF network. I propose once-a-week since the activity regarding the uploading and removing of datasets has slowed considerably over the past many months.

  2. Perform a diagnosis of retracted data. A prototype of such a tool has been created that checks the status of a dataset based upon PID and an individual file's "tracking_id" value. Afterwards, using ESGF search API (metadata tag: retracted=true), retracted dataset ID's can be determined and ultimately cross compared with those of the PID values found earlier.

  3. The datasets flagged in need of retraction are written into an overall "Errata Report" file. This Errata Report can then be shared amongst the community. The need to retract Zarr data based upon the report can also be mentioned.

  4. Remove flagged datasets that are determined not to have a replacement version on ESGF. Include these dataset ID's (PID's can also be used here) in the report.

  5. Replace flagged datasets that are determined to have a replacement version on ESGF with that new version. Include these dataset ID's (or PID's) in the report.

  6. The ESGF search API can be very useful to aggregate datasets described in steps 4 and 5.

Other note: The main errata page (errata.es-doc.org) can also be used in this process if one wishes to provide a description and severity of an errata issued for dataset ID's (or PID's) mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant