Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tooling for schema validation #90

Open
andrewpollock opened this issue Oct 11, 2022 · 6 comments
Open

Tooling for schema validation #90

andrewpollock opened this issue Oct 11, 2022 · 6 comments

Comments

@andrewpollock
Copy link
Contributor

As part of #761, we became aware that the Cloud Security Alliance has a schema validator.

It seems like shipping a canonical, authoritative validator tool and library with the schema would be best, rather than each ecosystem integrator needing to reinvent the wheel (and possibly less comprehensively than desired).

@marco-silva0000
Copy link

I would love a canonical mypy schema object that could be imported and used in any python tooling, same for other languages.

@kurtseifried
Copy link
Contributor

Some comments (I'm the tool author):

We can't do this with a "pure" JSON schema because (this is the shortlist):

  1. OSV only requires id and modified, so a vuln entry could basically be blank and this is "correct", I suspect the GSD might end up with a minor fork of OSV that basically just has an expanded set of required fields
  2. CVE v4 has 3 schemas, one each for REJECT, RESERVED, and PUBLIC, so you need to read the state tag to know which schema to apply, CVE v5 fixes this with a single schema, you could use the JSON "one of" but it still might be incorrect
  3. there are many projects using OSV, but not JSON to host the data, instead, they use YAML so either you convert the YAML to JSON, or you need a programmatic validator that can read a YAML string
  4. GSD needs to implement schema validators for each data source we ingest anyways, for one simple reason: every data source we ingest typically has at least one or more errors, malformed CVE ID's, blank entries, etc. We find these and have had good luck so far getting upstreams (e.g. Mozilla, Mageia) to fix their data
  5. Long term the plan is to have our validator also correct the data we serve, e.g. vendor names of "[Red Hat]", "RedHat", all should be normalized to the correct "Red Hat"

@kurtseifried
Copy link
Contributor

Also (obviously) we'll be releasing what we build as open source, in that directory long term so keep an eye and hopefully I'll get some more time to build this soon.

@oliverchang
Copy link
Contributor

+1 that a pure JSON schema is not sufficient. There are other reasons:

  • We need to validate package names, versions are valid.
  • We can check for e.g. schema_version being required when fields from a newer schema version is used.
  • And others that may overlap with @andrewpollock 's work on detecting invalid entries. Maybe some of that should be usable as a standalone tool.

@marco-silva0000
Copy link

marco-silva0000 commented Mar 29, 2023 via email

@frasertweedale
Copy link

frasertweedale commented Jun 27, 2023

Related: #168 - schema.json "score" pattern too strict in metric ordering, optional metrics not recognised

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants