Unified field for describing affected functions #127

oliverchang · 2023-03-28T04:52:45Z

Go, GitHub, and Rust all define their own way of specifying affected functions:

We should define a unified field for expressing these in a way that works for all languages and their nuances.

oliverchang · 2023-03-28T04:53:02Z

@darakian FYI! filing this issue as promised for starting discussion.

darakian · 2023-03-29T19:30:25Z

Many thanks for starting the thread! If we want this to be a single unified field then I think a first step would be to define that field as typed with possible type values of whatever language is supported.
Two reasons for that in my mind

Each language is going to have different rules ex. function overloading
Packages in the different ecosystems can contain more than one language ex. Tensorflow on pypi and commonmarker on rubygems

A second point to make is that ideally we should aim to capture more than functions. I can dig up an example if you like but I've come across a number of redos vulns which affect some variable which exists outside a function body and the fix tends to be altering that variable. That said, if you'd like to keep this issue scoped to functions only then I'm happy to do so 😄

darakian · 2023-07-27T20:52:00Z

To pop this back onto the stack would it make sense to start with a minimal, functional format where we root the affected functions/constants/code points at the entry to the package.
eg. a package foo with a function bar would be mapped as
foo.bar or foo::bar
I think this works for go as well, but please correct me if not.

The separator syntax is something I'm unsure if we can normalize or not, but perhaps for v1 we accept whatever the "norm" is for the ecosystem is.
For language I think we can either define a default language per ecosystem or create a small list which covers the default for each. I have a slight preference for the latter since we will eventually need to type the languages anyway.

so, the proposal for a functions v1 would introduce values that look like
python foo.bar.biz.baz
rust blah::fu::fun:faz
where the language type would define the path separator.
This doesn't account for language versions, but I don't know of any examples of a language version change causing a change in code paths so I think we're good on that front.

Known gaps

Does not account for anonymous functions
Does not account for function over loading

Thoughts?

oliverchang · 2023-08-10T06:23:40Z

Sorry for the delay on the response! Wanted to give a proper one.

I think for some ecosystems, e.g. Go, there are additional considerations.

e.g. with Go, the function/symbol needs to be qualified with the package within the module (https://go.dev/security/vuln/database), and there's additional qualifiers like goos, goarch.

I think we should use whatever is canonical for the ecosystem, rather than try to normalize separators and introduce potential confusion. The question here though, is how we can generalize this in a way that works for all, while giving enough flexibility to express additional ecosystem_specific details.

Looking at the current formats out there today, there's Go, GHSA's affected_functions, and Rust Advisory DB:

Go

    "ecosystem_specific": {
        "imports": [
          {
            "path": "cmd/link",
            "goos": [
              "js"
            ],
            "goarch": [
              "wasm"
            ],
            "symbols": [
              "Link.address"
            ]
          }
        ]
      }

      "ecosystem_specific": {
        "imports": [
          {
            "path": "golang.org/x/text/encoding/unicode",
            "symbols": [
              "bomOverride.Transform",
              "utf16Decoder.Transform"
            ]
          },
          {
            "path": "golang.org/x/text/transform",
            "symbols": [
              "String"
            ]
          }
        ]
      }

Rust

      "ecosystem_specific": {
        "affects": {
          "arch": [],
          "os": [],
          "functions": [
            "lzf::compress",
            "lzf::decompress"
          ]
        }
      },

GHSA

      "ecosystem_specific": {
        "affected_functions": [
          "tokio::task::JoinHandle::abort"
        ]
      },

As a first attempt to standardize this: perhaps we can define something like:

"scope": [ 
  {
    "symbol": string, # language specific function name
    "os": string # language specific OS qualifier  
    "arch": string # language specific arch 
    "additional_qualifiers": {
      <anything goes here, e.g. Go package path > 
    }
  }
]

But then, this ends up being essentially equivalent to ecosystem_specific but more awkward to use.

Alternatively should we just aim to unify the definitions of all the ecosystem_specific fields in one place in the OSV schema to provide this functionality?

darakian · 2023-08-15T18:36:00Z

I think we should use whatever is canonical for the ecosystem, rather than try to normalize separators and introduce potential confusion.

100% agree.

The question here though, is how we can generalize this in a way that works for all, while giving enough flexibility to express additional ecosystem_specific details.

Stepping back a moment. We don't necessarily need to generalize a for all path. Perhaps it's better to add language support on a case by case basis to ensure we have quality data and don't start generating bad data which will exist forever™️

With respect to the generalized format you're suggesting; how would os and/or arch specifiers interact with the broader advisory payload if the broader payload disagreed with either? Do we need to iterate all os versions and uarch types? The additional qualifiers field feels like a free form field and I doubt we'd see standardized usage of it unless we nail down what it's for.

For the generalized case my $0.02 is that if we head down that path we start small and add on as need be, so I'd suggest we trim your suggestion down to the symbol part. Indeed it won't capture os and uarch specifics, but uarch and os specific advisories tend to be pretty rare (for us at github at least). Fwiw I think the way we render rust functions is the same as how rustsec does

https://rustsec.org/advisories/RUSTSEC-2021-0072.html
https://github.com/github/advisory-database/blob/8ef032ef781e0d854fe1041744b90a8b89e8e792/advisories/github-reviewed/2021/08/GHSA-2grh-hm3w-w7hv/GHSA-2grh-hm3w-w7hv.json#L97

Edit: Thinking on the above a little more. Maybe it does make sense to include the os and arch but to make them optional and to interpret them missing as the generic case where all possible options are affected. I think we still need to deal with enumerating them, but maybe we can start the lists small.

darakian · 2023-08-16T20:01:12Z

Also just so we have them here are a few references on what some of the languages think a fully qualified function is
https://peps.python.org/pep-3155/
https://docs.oracle.com/javase/specs/jls/se11/html/jls-6.html#jls-6.7
https://www.php.net/manual/en/language.namespaces.rules.php
https://doc.rust-lang.org/reference/paths.html#qualified-paths

oliverchang · 2023-08-22T06:35:02Z

Thanks @darakian ! If we minimize the definition to just "symbols", I wonder if that will cause confusion with other similar data included in ecosystem_specific.

Even if we extend this hypothetical new field to be more flexible with ecosystem-specific fields, it becomes pretty much the same as existing ecosystem_specific field. I wonder if the best approach here is to just fully lean on ecosystem_specific to define these, because inevitably any client tooling will have to be customized based on the language/ecosystem.

i.e. we encourage ecosystems to clearly define their ecosystem_specific to capture vulnerable symbols, OS, arch etc similar to how Go already does it in https://go.dev/security/vuln/database. Then the OSV schema would link to these canonical definitions.

Many language ecosystems (e.g. Rust, Python, Haskell, R) already have canonical advisory-database repositories so potentially they can just live there.

darakian · 2023-08-22T18:16:05Z

If we minimize the definition to just "symbols", I wonder if that will cause confusion with other similar data included in ecosystem_specific.

Certainly if both paths are supported for a given ecosystem at the same time I could see that causing confusion. Perhaps it makes sense to cut over on an OSV schema version bump.

I wonder if the best approach here is to just fully lean on ecosystem_specific to define these, because inevitably any client tooling will have to be customized based on the language/ecosystem.

That's how I feel for sure. I think there's a question about what to do if languages end up being shared across ecosystems (C springs to mind), but generally I tend to want to lean on the expertise of those already involved with the specific domain. For Go specifically I have no problems leaning on their definitions.

oliverchang · 2023-10-06T01:30:43Z

I believe there's also openjs-foundation/security-collab-space#66 for the JS discussion.

oliverchang added the discussion label Mar 28, 2023

oliverchang mentioned this issue Aug 25, 2023

Standardising on a format for describing affected functions. pypa/advisory-database#149

Open

westonsteimel mentioned this issue Aug 18, 2023

🪄 Schema version 6 wishlist anchore/grype-db#108

Open

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified field for describing affected functions #127

Unified field for describing affected functions #127

oliverchang commented Mar 28, 2023

oliverchang commented Mar 28, 2023

darakian commented Mar 29, 2023

darakian commented Jul 27, 2023 •

edited

oliverchang commented Aug 10, 2023

darakian commented Aug 15, 2023 •

edited

darakian commented Aug 16, 2023

oliverchang commented Aug 22, 2023

darakian commented Aug 22, 2023

oliverchang commented Oct 6, 2023

Unified field for describing affected functions #127

Unified field for describing affected functions #127

Comments

oliverchang commented Mar 28, 2023

oliverchang commented Mar 28, 2023

darakian commented Mar 29, 2023

darakian commented Jul 27, 2023 • edited

oliverchang commented Aug 10, 2023

Go

Rust

GHSA

darakian commented Aug 15, 2023 • edited

darakian commented Aug 16, 2023

oliverchang commented Aug 22, 2023

darakian commented Aug 22, 2023

oliverchang commented Oct 6, 2023

darakian commented Jul 27, 2023 •

edited

darakian commented Aug 15, 2023 •

edited