Skip to content

akugarg/GSoC2021_Report

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 

Repository files navigation

Google Summer of Code 2021 Final Report

Organization : Aboutcode

Project : Detect Unknown Licenses and Indirect License References in Scancode

Description -

  • The main motive of this project was to improve license detection of unknown licenses and follow references to indirect license references in Scancode-TK

    Improvement in the License Data Model Definition

  • Unknown Licenses are the ones which are matched to a license rule tagged with 'unknown' license key . Since these are some of the 'special' licenses , reporting them with special attributes will provide more clarification. Now unknown licenses are tagged with a new flag "is_unknown" to identify them beyond just the naming convention of having "unknown" as part of their name. Rules that match at least one unknown license have a flag "has_unknown" set in the returned match results.

    PR Status
    #2548 MERGED

    Reporting known and Unknown licenses separately

  • We considered having a separate section for of scan results to report 'unknown licenses' separately and not mixed with main license detection results. But after implementing a separate section for unknown ones ,it doesn't seem to be good idea to have currently.

    PR Status
    #2578 CLOSED

    Follow License References to another file

  • Some license references such as "see license in file LICENSE.txt" e.g. mentions to look for license details in another file are reported as unknown license references and we could instead follow the referenced file to find what was detected there. The approach was to use already contained attribute refrenced_filenmes in license RULE data files. Since this was a process_codebase step in scan plugin , it was needed that our API function should return refrenced_filenmes to keep track of these files corresponding to licenses detected. This was tracked in -

    PR Status
    #2632 MERGED
  • The process_codebase step is tracked in -

    PR Status
    #2616 MERGED

Improve license detection of Unknown Licenses

  • The approach was to use index of n-grams for detecting unknowns besides having our actual detection of "unknown" license rules. Firstly matches were filtered after running our normal procedure of license detection and the remaining spans are run through a automaton index containing n-grams from all regular license texts and rules. This is tracked in -

    PR Status
    #2592 OPEN

Addition of some new Licenses

  • There were some licenses that were not present in Scancode-toolkit as for now. They have been added now.

    PR Status
    #2625 OPEN
  • I’ve had a wonderful summer during these 10 weeks journey and have learned plenty of things. I am thankful to Google and Aboutcode for giving me this opportunity to work with such an amazing community. I am fortunate to have Philippe Ombredanneand Ayan Sinha Mahapatra as mentors who helped me a lot throughout my GSoC project and provided constant support.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published