Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to License data in the API #87

Open
geekygirldawn opened this issue May 14, 2024 · 3 comments
Open

Improvements to License data in the API #87

geekygirldawn opened this issue May 14, 2024 · 3 comments

Comments

@geekygirldawn
Copy link
Member

geekygirldawn commented May 14, 2024

Ideally, I would love to be able to easily get data out of the GitHub API that shows when a repository has changed their license, especially when it has changed from an open source license to a non-open source one or a more restrictive license.

As @gyehuda mentioned:

It would be wonderful for GitHub to document when a repo that was licensed as for at least a week gets relicensed under where either are OSI-approved.

Details in this discussion: todogroup/ospology#480

Or maybe it would be cool for GitHub to surface this another way - maybe something like https://innovationgraph.github.com/? I know Innovation Graph itself is focused on metrics broken out across various economic areas, so not exactly like that, but maybe there are some other things that companies care about (e.g., licenses, dependents info, supply chain security metrics) that could be grouped together in a way that let's people explore / analyze that data more easily?


On a related note, the way that the GraphQL API handles data from the licenseInfo object seems counter intuitive to me, partly because it returns very different things from what the API returns for other auto discovered file objects, like codeOfConduct.

Here's an example query:

query license{
  repository(owner: "chaoss", name: "wg-metrics-development"){
    licenseInfo{
      name
      url
  	}
    codeOfConduct{
      url
      name
      resourcePath
    }
  }
}

And the output of the query:

{
  "data": {
    "repository": {
      "licenseInfo": {
        "name": "MIT License",
        "url": "http://choosealicense.com/licenses/mit/"
      },
      "codeOfConduct": {
        "url": "https://github.com/chaoss/.github/blob/main/CODE_OF_CONDUCT.md",
        "name": "Other",
        "resourcePath": "/chaoss/.github/blob/main/CODE_OF_CONDUCT.md"
      }
    }
  }
}

From the licenseInfo object, I can't seem to get to the actual name, url, or path of the file where the license is stored in the repository. This is unlike the codeOfConduct object, which returns the url / resourcePath, which lets me programmatically determine where I can find the file within the repository.

If I could derive the location / name of the license file in the repo via licenseInfo (or some other method), I could use it as the input into another query to get details about the commits for the file. In the below example, I hardcoded the name of the license file after manually looking it up in the repo, but ideally, I could get this from the GitHub API and pass it in as a variable into a query that would give me commit details.

query licenseCommits{
  repository(owner: "chaoss", name: "wg-metrics-development"){
    defaultBranchRef {
      name
      target {
        ... on Commit {
          history(path: "LICENSE", first: 100) {
            nodes {
              committedDate
              url
              additions
              deletions
            }
          }
        }
      }
    }
  }
}
{
  "data": {
    "repository": {
      "defaultBranchRef": {
        "name": "main",
        "target": {
          "history": {
            "nodes": [
              {
                "committedDate": "2022-05-09T17:45:16Z",
                "url": "https://github.com/chaoss/wg-metrics-development/commit/6c4dbe9822f430ed3c809a49adcc32d619e34a31",
                "additions": 1,
                "deletions": 1
              },
              {
                "committedDate": "2021-03-29T19:28:13Z",
                "url": "https://github.com/chaoss/wg-metrics-development/commit/f872d9caf03f55bf0e884fe1d9bbfa589b54c0b5",
                "additions": 1,
                "deletions": 1
              },
              {
                "committedDate": "2019-04-18T15:31:44Z",
                "url": "https://github.com/chaoss/wg-metrics-development/commit/d5159c33c600b041ca8530ae96c14d6b87247787",
                "additions": 21,
                "deletions": 0
              }
            ]
          }
        }
      }
    }
  }
}

Maybe there is another way to do this that I just haven't found?

cc: @ahpook

@ahpook
Copy link

ahpook commented May 15, 2024

Hey @geekygirldawn thanks for filing this! You're right that the licenseInfo is inconsistent with the other "community standards" type of docs. We did a little digging internally and there is already a (private) API method that we could use to expose the filename, exactly like the resourcePath field does on codeOfConduct that you noted. Would adding that to the API be sufficient to get you going on this?

FWIW I don't expect the special case of tracking license content changes over time as a first-class API endpoint to happen; it seems like quite a niche that would have a high engineering cost. We don't in general do time-series/historical changes due to storage constraints, and, as you're proposing, with the file info it could be derived from the commit history.

@ahpook
Copy link

ahpook commented May 15, 2024

Oh, and: regarding The url field, I too find it a bit strange that it returns a link to choosealicense.com rather than the github.com URL to the file, but changing that would be considered a breaking API change 😢

@geekygirldawn
Copy link
Member Author

We did a little digging internally and there is already a (private) API method that we could use to expose the filename, exactly like the resourcePath field does on codeOfConduct that you noted. Would adding that to the API be sufficient to get you going on this?

That would be super helpful, thank you!

FWIW I don't expect the special case of tracking license content changes over time as a first-class API endpoint to happen; it seems like quite a niche that would have a high engineering cost. We don't in general do time-series/historical changes due to storage constraints, and, as you're proposing, with the file info it could be derived from the commit history.

I didn't think so, but I thought it wouldn't hurt to ask :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants