Reading metadata from additional file #45

StephenQuirolgico · 2019-11-14T20:44:59Z

@IanLee1521 - Can't recall if this was already requested elsewhere, but is it possible to enhance the scraper to also read metadata from an additional file in a repo? The rationale would be to allow developers to have more control over the metadata that is provided, and to provide metadata that may not be scraped by the scraper.

leebrian · 2019-11-14T22:49:58Z

I think it would be helpful to read a code.json file in the root of the repo. During the GSA calls, at least two programs said they did something similar. I would like to bring this up on a GSA call and have them put out some guidance on code.gov to help shape the implementation here.

The local process we use on top of scraper is to read a code.json and use its values to override the project settings in the combined agency code.json. It's a bit of a hack, but it lets me use the exact same schema. We do this on the openCDC repo.

IanLee1521 · 2019-11-15T17:09:50Z

Certainly doable, I believe this was last on @jcastle's plate as there was to be a discussion in the bi-weekly calls (or other spin off calls) to figure out the best way to implement this. (and e.g. what to name the file).

jcastle-zz · 2019-11-20T13:32:06Z

Let's add this to the metadata brainstorm. Will send out an invite for that discussion to begin next week.

IanLee1521 · 2020-01-22T20:33:28Z

I will wait for the official answer from @jcastle / Amin but I propose that we name the file .code_gov.json and that it should have the same format as the “repository” object in the metadata schema (currently called “release”).

If it does, any fields that match what comes from the API will be replaced. Example from gsa.gov/code.json, where all the values are explicitly in the file:

{
      "contact": {
        "URL": "https://github.com/18F",
        "email": "18f@gsa.gov"
      },
      "date": {
        "created": "2013-07-17",
        "lastModified": "2019-05-02"
      },
      "description": "A hosted, shared-service that provides an API key, analytics, and proxy solution for government web services.",
      "downloadURL": "https://api.github.com/repos/18F/api.data.gov/downloads",
      "homepageURL": "https://github.com/18F/api.data.gov",
      "laborHours": 1216,
      "languages": [
        "HTML",
        "Ruby",
        "CSS",
        "JavaScript"
      ],
      "name": "api.data.gov",
      "organization": "18F",
      "permissions": {
        "licenses": [
          {
            "name": "NOASSERTION"
          }
        ],
        "usageType": "openSource"
      },
      "repositoryURL": "https://github.com/18F/api.data.gov",
      "status": "Development",
      "tags": [
        "github"
      ],
      "vcs": "git"
}

Example where only a couple fields (tags and contact:email) are overridden:

{
      "contact": {
        "email": "jcastle@gsa.gov"
      },
      "tags": [
        "github",
        "code_gov"
      ]
}

What do you all think of that?

jcastle-zz · 2020-01-22T23:24:00Z

@JosephAmalfitanoSSA, @AminPIC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading metadata from additional file #45

Reading metadata from additional file #45

StephenQuirolgico commented Nov 14, 2019

leebrian commented Nov 14, 2019

IanLee1521 commented Nov 15, 2019

jcastle-zz commented Nov 20, 2019

IanLee1521 commented Jan 22, 2020 •

edited

jcastle-zz commented Jan 22, 2020

Reading metadata from additional file #45

Reading metadata from additional file #45

Comments

StephenQuirolgico commented Nov 14, 2019

leebrian commented Nov 14, 2019

IanLee1521 commented Nov 15, 2019

jcastle-zz commented Nov 20, 2019

IanLee1521 commented Jan 22, 2020 • edited

jcastle-zz commented Jan 22, 2020

IanLee1521 commented Jan 22, 2020 •

edited