Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for evidence.licenses.confidence, methods #459

Open
prabhu opened this issue May 8, 2024 · 3 comments
Open

Support for evidence.licenses.confidence, methods #459

prabhu opened this issue May 8, 2024 · 3 comments

Comments

@prabhu
Copy link
Contributor

prabhu commented May 8, 2024

The accuracy of license IDs and expressions reported by tools might be limited based on the detection methods used. Attributes like confidence and concludedValue could help with explainability and reasoning.

@jkowalleck
Copy link
Member

confidence for evidence? i guess not. evidence are observed behavior - there is no confidence rating for that, or is there?
💡 Confidence for concluded licenses - this might be a thing ...

@prabhu
Copy link
Contributor Author

prabhu commented May 8, 2024

We currently have confidence for evidence.identity and for the methods.

Same way, different license detection methods could have different confidence. For example, identifying license by reading just the package.json (low confidence) vs parsing the license headers and code-snippets of all underlying files to identify the licenses list (medium confidence) vs a service that used both humans and tools to triage and identify the list like clearlydefined (high confidence).

@jkowalleck
Copy link
Member

jkowalleck commented May 8, 2024

Here is what I've learned from a talk a lawyer gave at ORT conference:

Reading a package manifest gives you the declared license.
Declared license is the intention of the package owners. Nothing to observe at all. Nothing to have confidence about, it's a fact.

The raw license headers are evidence, because they are actually observed. Nothing to have confidence about, it's facts.

Parsing/interpreting license headers and making sense out of it brings a somehow concluded license. This value could have a "confidence" property.
There could be multiple concluded values - all from different mechanisms or people doing the job...

In the end, concluded license is the only thing that matters it is based on observation (evidence) and intention (declared).

For example, lawyer do that: they make a conclusion based on the other data.
Let's say the declared license in the project manifest was "MIT", and in some file headers they found the license headers for "Apache-2.0", they would conclude a SPDX license expression "(MIT AND Apache-2.0)" with a high confidence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants