Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ecosystem support for tooling that is outside of the language or os specific package management system #94

Open
westonsteimel opened this issue Nov 1, 2022 · 20 comments

Comments

@westonsteimel
Copy link

westonsteimel commented Nov 1, 2022

It would be incredibly useful to have some standard way of referring to generic tooling that is part of a language or OS ecosystem, but not actually installed via that ecosystem's package registry.

An example might be cargo, which is a part of the rust ecosystem and has advisories issued from the Rust Advisory Database but is apart from the existing crates.io OSV ecosystem (some previous discussion on this)

Or a very recent example would be some way to refer to generic nodejs that hasn't been installed via a system package manager and would be separate from the existing npm OSV ecosystem, but still quite important to have some standard way of representing security advisories for. That would also hopefully open the door to flagging these tools within the existing GitHub Advisory Database for the languages they currently support

@westonsteimel
Copy link
Author

westonsteimel commented Nov 1, 2022

My original thought was to potentially add a new ecosystem value for these, so could be something like:

  • Rust
  • Nodejs
  • Python

Where the name would then be the name of the tool within that ecosystem, but that doesn't work particularly well for go since the existing package-centric OSV namespace is already called Go, and having both a language name and registry name ecosystem might be confusing for people anyways

@joshbressers
Copy link

I'm not entirely sure ecosystem makes sense here, this could be a new way to categorize software.

Let's use Node.js and OpenSSL as an example.

The Node.js binary, node, has OpenSSL statically linked in. If I install node, it's not part of an ecosystem or package manager. So now we have a binary file, that statically links in OpenSSL, that's not installed via any traceable system.

I think the value in the ecosystem tags is knowing where to go look for more details.

In the case of this OpenSSL, you can maybe go see if Node.js published anything. But for other things we have to try to track every possible binary vendor for details about whatever binaries they are building and distributing which is not realistic.

I think we need a tag that denotes this is a known problem, like tagging an OpenSSL vulnerability with node affects details, but also makes it clear this is not something we can easily programmatically determine. We maybe need humans to get involved to add and update the data.

@westonsteimel
Copy link
Author

As a note I would expect this to start out very narrowly scoped to cover existing well-known tools that are important parts of language ecosystems and are not frequently installed via package managers

@oliverchang
Copy link
Contributor

oliverchang commented Nov 2, 2022

Hmm, this is certainly missing in the OSV schema, but I'm also a little wary of building something similar to CPEs, where we define our own custom registry of identifiers (as opposed to our ecosystem ones which just defer to that ecosystem).

Are there any other alternatives where we can be unambiguous and less bespoke? One possibility might be to use the source/repo path to do this. e.g.

There are probably other alternatives, but something like this would be much more deterministic and predictable as opposed to a custom dictionary.

@westonsteimel
Copy link
Author

@oliverchang, yes I definitely agree on not maintaining our own registry of identifiers if we can possibly avoid it. I think the idea of using the source repo path could potentially work. What might the actual OSV entry look like in that case?

@westonsteimel
Copy link
Author

Hmm, what about something specific for GitHub release artifacts? And maybe something general for source control URLs or just general URLs for published binaries?

@oliverchang
Copy link
Contributor

@oliverchang, yes I definitely agree on not maintaining our own registry of identifiers if we can possibly avoid it. I think the idea of using the source repo path could potentially work. What might the actual OSV entry look like in that case?

Maybe something like:

{
  "type": "Program",
  "name": "https://github.com/python/cpython:Programs/python.c"
}

If we go with this there will need to be some rules around canonicalising git/repo URL, and a bit other details to figure out that I'm handwaving here.

GitHub release artifacts, general URLs could also work, but it may introduce more inconsistencies because there can be many different correct IDs if it's mirrored in a lot of places. Everything goes back to the source repo, so perhaps that would be more stable as an identifier.

@captn3m0
Copy link
Contributor

captn3m0 commented Jan 5, 2023

Or a very recent example would be some way to refer to generic nodejs that hasn't been installed via a system package manager

The PURL spec accounts for this via the generic type. Syft, for eg - already uses pkg:generic/node when it detects nodejs installed outside the system package manager.

Ref: https://github.com/anchore/syft/blob/bb6fc6525c6b791999a21d014b7557075202a2e8/syft/pkg/cataloger/binary/default_classifiers.go#L74-L82

OSV should still support this usecase, and maybe support a generic ecosystem. Or given that we already have a PURL, which should be resolvable to the relevant ecosystem anyway - why do we need a separate ecosystem field?

Edit: The source references are also supported via PURLs:

  • pkg:github/python/cpython
  • pkg:generic/nodejs?download_url=https://nodejs.org/dist/v18.12.1/node-v18.12.1-linux-x64.tar.xz

@oliverchang
Copy link
Contributor

The problem with generic is that it's essentially a free-for-all that does not enforce any form of consistency. i.e. is pkg:generic/node the canonical Node, or is pkg:generic/Node.js or some other variation?

I think we need something more machine readable and consistent here. One possibility that can be made to be more consistent without us maintaining a custom registry (similar to CPEs) is something like #94 (comment), but there are likely other approaches.

@oliverchang
Copy link
Contributor

Thinking more here, everything really goes back to source here, and we can already encode vulns in things like language interpreters through git commits hashes and version tags. This gives us the most consistent way to describe vulns in open source software that don't have a canonical package ecosystem.

e.g. from https://github.com/google/oss-fuzz-vulns/blob/main/vulns/mruby/OSV-2020-744.yaml

id: OSV-2020-744
summary: Heap-double-free in mrb_default_allocf
details: ...
modified: '2022-04-13T03:04:39.780694Z'
published: '2020-07-04T00:00:01.948828Z'
references:
- type: REPORT
  url: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=23801
affected:
  ranges:
  - type: GIT
    repo: https://github.com/mruby/mruby
    events:
    - introduced: 9cdf439db52b66447b4e37c61179d54fad6c8f33
    - fixed: 97319697c8f9f6ff27b32589947e1918e3015503
  versions:
  - 2.1.2
  - 2.1.2-rc
  - 2.1.2-rc2
  ecosystem_specific:
    severity: HIGH

The entrypoint is missing here from my original proposal at #94 (comment), but that was also flawed in that entrypoints can easily move across versions as part of refactoring.

@dfandrich
Copy link
Contributor

The curl project is experimenting with publishing its security vulnerabilities in OSV (in curl/curl-www#237) and has hit this OSV limitation. Technically, OSV can't be used for this because ecosystem is mandatory and there's no appropriate value for upstream packages. Most PURL types can specify a unversioned package (e.g. pkg:deb/debian/curl) but the generic type cannot; it is tied to a download URL that must point to a specific version of a package.

Fixing PURL to allow an unambiguous, unversioned generic package type would be one way to fix this. e.g. pkg:generic/curl?package_url=https://curl.se/download/#curl

@oliverchang
Copy link
Contributor

The curl project is experimenting with publishing its security vulnerabilities in OSV (in curl/curl-www#237) and has hit this OSV limitation. Technically, OSV can't be used for this because ecosystem is mandatory and there's no appropriate value for upstream packages. Most PURL types can specify a unversioned package (e.g. pkg:deb/debian/curl) but the generic type cannot; it is tied to a download URL that must point to a specific version of a package.

Fixing PURL to allow an unambiguous, unversioned generic package type would be one way to fix this. e.g. pkg:generic/curl?package_url=https://curl.se/download/#curl

That's awesome to hear! And sorry we don't have a clear story for this yet.

Would the approach of just including the source repository information (e.g. the example in #94 (comment)) without an ecosystem/package work for curl?

@oliverchang
Copy link
Contributor

oliverchang commented May 2, 2023

@dfandrich
Copy link
Contributor

Using a Github URL would work to disambiguate this curl from any others, but there's a technicality: the curl releases are almost-but-not-quite what's tagged in git, so doing so would leave the wrong impression. The git sources are the basis of a release, but then things like autoreconf are run to get automake files, man pages are prebuilt, etc. and that final result ends up as the curl release tarball. That's also why we can't really use a Github PURL like pkg:github/curl/curl@curl-8_0_1 to talk about a curl release since a release is more than just those tagged files. An autoconf security bug that necessitates a new curl point release could (theoretically) use exactly the same tagged git files yet not contain the security bug.

The OSS-Fuzz example is a bit unfair because OSS-Fuzz gets its own ecosystem and can do whatever it wants with it, in this case defining "curl" conveniently as our project. The OSV docs also say that the OSS-Fuzz ecosystem is only to be used for bugs related to OSS-Fuzz findings, so we can't use it. It's interesting that they're actually using the almost-useless too-generic PURL pkg:generic/curl as curl also has ended up doing due to nothing better being available.

@oliverchang
Copy link
Contributor

Using a Github URL would work to disambiguate this curl from any others, but there's a technicality: the curl releases are almost-but-not-quite what's tagged in git, so doing so would leave the wrong impression. The git sources are the basis of a release, but then things like autoreconf are run to get automake files, man pages are prebuilt, etc. and that final result ends up as the curl release tarball. That's also why we can't really use a Github PURL like pkg:github/curl/curl@curl-8_0_1 to talk about a curl release since a release is more than just those tagged files. An autoconf security bug that necessitates a new curl point release could (theoretically) use exactly the same tagged git files yet not contain the security bug.

Ah, that's a very interesting point that I don't have a good answer for.

If it's available though, I believe the git metadata would still be useful in most cases though to consumers though, for people who are pulling curl by source (e.g. as a submodule to use as a library) and as a fallback identification mechanism that works in most cases. This enables them to make use of this vulnerability feed in an automated way just by looking at their git hashes.

The OSS-Fuzz example is a bit unfair because OSS-Fuzz gets its own ecosystem and can do whatever it wants with it, in this case defining "curl" conveniently as our project. The OSV docs also say that the OSS-Fuzz ecosystem is only to be used for bugs related to OSS-Fuzz findings, so we can't use it. It's interesting that they're actually using the almost-useless too-generic PURL pkg:generic/curl as curl also has ended up doing due to nothing better being available.

Yeah the PURL really is a best effort at that point as a hint in this case. Even pkg:generic/curl?package_url=https://curl.se/download/#curl seems hard to maintain consistency around with the URL formatting, and across the open source ecosystem with other projects. We've tried to avoid adding a similar "Generic", as such fields are hard to automate on and maintain consistency.

Instead, how about we define a "Curl" ecosystem in the OSV spec? That way we can define the naming and the version rules very precisely and remove any ambiguity.

@bagder
Copy link

bagder commented May 3, 2023

Instead, how about we define a "Curl" ecosystem in the OSV spec?

I think that would be a rather poor fix.

What if we next want to provide JSON objects for flaws from @libssh2 or @c-ares etc? Should they too get new imaginary ecosystems? These projects are not "ecosystems", they are stand-alone tools/libraries.

@bagder
Copy link

bagder commented May 16, 2023

In the curl project we now provide JSON objects according to this schema for all published CVEs. 141 of them at today's count.

We can however not identify the project in the JSON objects because curl is not part of any valid "ecosystem". I assume this might be problematic for some users of this data.

@sethmlarson
Copy link
Contributor

Noting here that we're running into the same problem for projects like CPython, there is no ecosystem value for OSV that matches PURL's "generic" ecosystem.

@oliverchang
Copy link
Contributor

Noting here that we're running into the same problem for projects like CPython, there is no ecosystem value for OSV that matches PURL's "generic" ecosystem.

Would something like the suggestions in #94 (comment) or #94 (comment) work for the CPython use case?

We need a well defined namespace for describing non-package-manager ecosystems and the versions associated with them. The problem with "generic" is it offers little consistency nor automatability for consumers, which is what OSV has tried to fix.

@sethmlarson
Copy link
Contributor

sethmlarson commented Jul 18, 2023

@oliverchang Thanks for the suggestions! I believe #94 (comment) would work for CPython's use-case if I'm reading it correctly, essentially omitting the affected.package key altogether and use only ranges and versions (I'm also assuming that ranges of type ECOSYSTEM continue to work)

The OSV database structure I'm planning already separates OSV documents (is that the right word for them?) into separate directories depending on the project, so advisories/python/CVE-YYYY-NNNN.json and then the content of the file wouldn't need to have an identifier putting the advisory as one for Python?

Will this structure and omission of affected.package play nicely with the OSV database?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants