Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorporate dataportal.json guidance? #554

Open
rebeccawilliams opened this issue Mar 24, 2016 · 7 comments
Open

incorporate dataportal.json guidance? #554

rebeccawilliams opened this issue Mar 24, 2016 · 7 comments

Comments

@rebeccawilliams
Copy link
Contributor

see: https://www.opendatasoft.com/2016/03/22/metadata-for-open-data-portals/ and https://gist.github.com/NTerpo/b81a0b195ceb99a7e53a

cc @philipashlock @JJediny

@rebeccawilliams rebeccawilliams changed the title incorporate dataportal.json guidance incorporate dataportal.json guidance? Mar 24, 2016
@rebeccawilliams
Copy link
Contributor Author

@jpmckinney
Copy link
Contributor

dataportal.json needs some work. https://gist.github.com/NTerpo/b81a0b195ceb99a7e53a

@waldoj
Copy link

waldoj commented Mar 24, 2016

I'd love to learn more about this.

Regrettably, that blog entry doesn't explain what, specifically, data.json doesn't do that they want it to do. As I recall, data.json will validate if it includes metadata about a repository without including an inventory, so that core aspect of dataportal.json seems to be unnecessary. (And, even if I'm wrong about that, it seems like it'd be collectively better to talk about modifying the data.json spec to accommodate that use case.)

Also, dataportal.json appears to exist only in the sense that there's a Gist with an example—there's no schema, validator, generator, or documentation yet. We put together data.json over the course of months, in consultation with dozens of people at a dozen federal agencies, and then it has continued to mature in the years since. Although we kept simplicity as our guide star, the reality turned out to be that there are more edge cases than "normal" cases, because government agencies' needs vary so much.

Surely there are specific shortcomings of data.json that spurred the creation of dataportal.json, and it'd be great to know about those, but they're not listed there. Kudos to OpenDataSoft to seeing a problem and working to correct, anyway!

@jpmckinney
Copy link
Contributor

I think one possible contribution that it makes is distinguishing "portal" from "catalog", but I'm not sure how important a contribution that is. It adds some new fields that aren't in data.json, but as @waldoj points out, these could easily be added or extended from data.json instead of creating a conflicting format. I think they are somewhat naively approaching the problem as, "Why is data.json so complicated? It should be simple! Let's make it simple." without understanding why things are complicated.

@NTerpo
Copy link

NTerpo commented Mar 25, 2016

Hi everyone and thanks for your interest on dataportal.json :)

data.json is obviously the closest thing to what we imagine. dataportal.json is a suggestion and we feel like there is something missing in terms of metadata at the portal level right now, but if extending data.json is a better solution that's totally fine for us!

The first concern we have about data.json is that it's often quite heavy. Firefox almost crashes when I open NYC's data.json.
screen shot 2016-03-25 at 11 15 12

We like the idea of having separate light files with links between them.

But the main issue is that there are a lot of cases where there isn't any information about the portal itself. The datasets are really perfectly described, and it's easy to work with their metadata. However, the portal level is often forgotten. So it isn't that it's not possible or that it's too complicated, but it's often not done. Hence, why we propose a separate file.

It might be naive; we haven't been in discussions with the agencies that you're talking about, and we never created a norm before. But we just feel there is something missing in the portals we encounter. If it's better to have it integrated to data.json, sure: we'll be glad to contribute and help. If a fresh start with a dataportal.json (plus compliant names suggested by @jpmckinney on the gist) is necessary we'll keep pushing for it. And if dataportal.json can be a experimentation before an integration to data.json, that can be done too: it's always nice to test something on a very agile way before suggesting that everybody implement something.

Anyway, I'm very glad to have your comments and I'm going to comment on the more technical suggestion on the gist right now.

@jpmckinney
Copy link
Contributor

I see value in separating out the catalog/portal-level information into a smaller file. Some aggregators may only want to know the top-level information, without aggregating all the datasets.

I haven't compared the catalog fields in data.json against dataportal.json. @NTerpo Can you have a look at comparing these two to see what's already in data.json and what is unique to dataportal.json?

@jpmckinney
Copy link
Contributor

Having read @philipashlock's comment, here's a possible way forward:

  1. Bring dataportal.json into line with DCAT and data.json, as they are now.
  2. Add more catalog-level fields to data.json from DCAT, and consider also adding some of the fields that dataportal.json has that aren't in DCAT. Then, dataportal.json could just be a data.json file without any datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants