Skip to content
This repository has been archived by the owner on Oct 2, 2019. It is now read-only.

The JSON generated by tool does not conform to the standard #20

Open
cew821 opened this issue Oct 26, 2013 · 5 comments
Open

The JSON generated by tool does not conform to the standard #20

cew821 opened this issue Oct 26, 2013 · 5 comments
Labels

Comments

@cew821
Copy link
Contributor

cew821 commented Oct 26, 2013

The generator makes every field "standard JSON" i.e.

{ "keywords":"this, that, the other" }

This is not compliant with the standard, which has more specific requirements for how to represent the objects. For example:

{ "keywords": ["this", "that", "the other"] }

See @dwcaraway's helpful schema: https://github.com/dwcaraway/podschema/blob/master/schema/schema.json

Because the JSON generated by this tool isn't in the right format, I'm not sure it will be that useful? I guess better than nothing.

I wonder if the generator could be made to make better output? Specifically:

  • handle date conversions into the proper format
  • parse comma separated values into an array of values
  • validate URIs

I can try to help with this, but I'm having a hard time figuring where in the library this is done. I'm a little familiar with Backbone, but not enough to quickly identify where "the work" of processing the input into JSON is happening. Can you point me in the right direction?

@dwcaraway
Copy link

@cew821 Glad the JSON schema is useful. I just issued a pull request (project-open-data/project-open-data.github.io#172) to project open data to get the JSON schema in as a common format that we'll express the Common Core Metadata requirements in.

Just an FYI, in addition to automatically validating JSON (see http://dwcaraway.github.io/podschema/validate.html) the schema can be used to generate a form automatically (see http://dwcaraway.github.io/podschema/form.html) which can easily be hooked to a database and can pull in the latest JSON schema from project-open-data so it's always up-to-date.

@gbinal
Copy link
Contributor

gbinal commented Nov 24, 2013

Thanks. I'm also seeing this. I definitely think that this is a significant resource but I'm not sure if the best use of time is to fix each of these elements or focus on alternate paths like building off of Dave's schema.

@benbalter - any thoughts on this?

@gbinal
Copy link
Contributor

gbinal commented Nov 24, 2013

To update, below is a sample of an output. It looks like the issue of parsing into arrays comes into play with 'keyword', 'theme', and 'references'; but also there's a related issue of how 'distribution' work correctly with this. I'm not sure if the best move is to address them in conjunction or if that's mixing up too much logic.

[
    {
        "title": "data 1 ",
        "description": "what it is",
        "keyword": "key1, key2",
        "modified": "2012-01-15",
        "publisher": "GSA",
        "contactPoint": "John Smith",
        "mbox": "john.smith@gsa.gov",
        "identifier": "gsa-1123",
        "accessLevel": "public",
        "accessLevelComment": "In order to access this dataset, visit 123 washington st.  ",
        "bureauCode": "011:22",
        "programCode": "011:111",
        "accessURL": "http://www.agency.gov/data.xml",
        "webService": "http://www.agency.gov/data.json",
        "format": "application/xml",
        "license": "CC-0",
        "spatial": "United States",
        "temporal": "2011",
        "theme": "energy, education",
        "dataDictionary": "http://www.agency.gov/data/data.html",
        "dataQuality": "true",
        "accrualPeriodicity": "monthly",
        "distribution": "notsurewhattoput?",
        "landingPage": "http://www.agency.gov/data_this",
        "language": "en-US",
        "PrimaryITInvestmentUII": "12-121234121",
        "references": "http://www.agency.gov/data.pdf, http://www.agency.gov/otherhub/data.doc",
        "issued": "2012-01-22",
        "systemOfRecords": "http://www.agency.gov/oira/data-record.html"
    }
]

@gbinal
Copy link
Contributor

gbinal commented Nov 26, 2013

Charles, it seems to me that the only proactive problem with the file generation is the issue of comma separated v. array of strings for keywords, themes, and references. I have split that off as a specific issue #21. Do you think I'm missing anything else crucial? [e.g., I think that changing the date format for an end user to a proper date format would be good but is not essential.]

@cew821
Copy link
Contributor Author

cew821 commented Nov 26, 2013

There are a few additional fields that need to be in arrays of strings, not strings, regardless of how many items are in the array. These include:

  • bureauCode
  • programCode
  • keywords
  • theme
  • references
  • language

Also, dataQuality needs to be a boolean, not a string, i.e. true not "true".

@gbinal gbinal added bug and removed bug labels Mar 26, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants