Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packages with multiple LICENSES files not shown in --with-license-file #71

Open
johnthagen opened this issue Jun 25, 2020 · 15 comments
Open

Comments

@johnthagen
Copy link
Contributor

johnthagen commented Jun 25, 2020

$ pip install -U pip pip-licenses
$ pip install opencv-python-headless==4.2.0.34
$ pip-licenses --from=mixed --with-license-report

The opencv-python-headless wheels include two license files, LICENSE.txt and LICENSE-3RD-PARTY.txt, but only LICENSE-3RD-PARTY.txt is displayed. Could support be added to display both license files since some packages like opencv have multiple files covering different parts of the library.


Format LicensePath LicenseText
Plain/Plain Vertical \n \n\n
JSON JSON List [] JSON List []
CSV ;
HTML
@raimon49
Copy link
Owner

@johnthagen Yes, I am aware of the problem.

This issue needs careful consideration.The type may change in structured formats like JSON.

For example, I have the idea of introducing a new option --with-multiple-license-files to separate the display from the existing --with-license-file.

Do you have any other ideas?

@johnthagen
Copy link
Contributor Author

Another idea could be to put multiple license files in the same column and separate them them with some kind of deliminator. That way there would still be single rows for each package, but LicenseFile and LicenseText could just contain concatenated information.

The issue I see with --with-multiple-license-files is that really getting all the license files needs to be the default state, or else people will invariably miss license files. For something like licensing, I personally feel it's important that the defaults be as all-encompassing as possible. Missing a license file could technically mean you are violating a license and wouldn't know if you didn't know the internals of pip-licenses.

Maybe #60 and this together could be part of a 3.0 release that tries to make the default experience for pip-licenses a little more "correct"?

@raimon49
Copy link
Owner

Certainly, I think it is desirable for many users to change the behavior in the 3.0 release.
It seems that the idea of connecting with a delimiter and outputting as the default is good.

@johnthagen
Copy link
Contributor Author

One easy idea for the license file contents themselves could be to concatenate them together with a newline in-between.

@johnthagen
Copy link
Contributor Author

johnthagen commented Aug 19, 2020

Another popular package with multiple license files is Matplotlib: https://github.com/matplotlib/matplotlib/tree/master/LICENSE

(Note that currently matplotlib doesn't bundle it's LICENSE files correctly (matplotlib/matplotlib#18296) but once that is corrected, it would be another example that pip-licenses could support if updated).

@raimon49
Copy link
Owner

@johnthagen Thanks for all the ideas!

I have started work today on the release of version 3.0.0.

This release allows for breaking changes.

If you have ideas for changes to the --with-license-file option, please make a Pull Request for the release-3.0.0 branch.

Version 3.0.0 is expected to ship around the time Python 3.5 becomes EOL. That's 2020-09-13, according to the Ptyhon team's announcement.

@johnthagen
Copy link
Contributor Author

@raimon49 After thinking about this issue, the only way I could think of to make this work while maintaining a "one row per package" in the table layout would be to concatenate LicenceFile and LicenseText results together with \n.

This would look something like:

my/license/path/LICENSE1
my/license/path/LICENSE2

and

MIT License
...
End of license

Second License
...
End of license 2

What do you think of this technique?

@raimon49
Copy link
Owner

@johnthagen I agree with you about the Plain or Plain Vertical format.

The difficulty is the formatting, where \n are meaningful and structured, as in the following

  • JSON
  • HTML
  • CSV

We have to support users who use these formats.

@johnthagen
Copy link
Contributor Author

@raimon49 For the LICENSE files themselves using --with-license-file, the files themselves already have lots of \n in them, so appending more LICENSE files together wouldn't change the status quo, would it?

@johnthagen
Copy link
Contributor Author

We could also consider what you suggested --with-multiple-license-files, but the problem is that the end user really has no way of knowing ahead of time if there will be multiple LICENSE files or not. If they miss LICENSE files they likely can inadvertently violate the LICENSE requirements by not including all of the license files.

@johnthagen
Copy link
Contributor Author

johnthagen commented Aug 26, 2020

For HTML, JSON, and CSV, could we use a different deliminator or just use a raw \n rather than the actual newline ASCII code?

[
  {
    "licenses": ["BSD"],
    "name": "Django",
    "version": "2.0.2",
    "LicensePath": "/path/to/LICENSE1\n/path/to/LICENSE2"
  },
]

Then users can split on the deliminator. We could also use another character, such as ; or for non Plain/Plain Vertical?

[
  {
    "licenses": ["BSD"],
    "name": "Django",
    "version": "2.0.2",
    "LicensePath": "/path/to/LICENSE1;/path/to/LICENSE2"
  },
]

@raimon49
Copy link
Owner

JSON can be represented as data in a list.

[
  {
    "licenses": ["BSD"],
    "name": "Django",
    "version": "2.0.2",
    "LicensePath": ["/path/to/LICENSE1", "/path/to/LICENSE2"],
    "LicenseText ": ["LICENSE Text 1", "ICENSE Text 2"]
  },
]

Users will find it useful to be able to pass the output to the | jq command.

Some kind of delimiter is required in CSV. But for example ; also appears in the Apache License 2.0.
https://choosealicense.com/licenses/apache-2.0/

It's a very vexing issue...

@johnthagen
Copy link
Contributor Author

JSON can be represented as data in a list.

Ah yes, that is even better for JSON, nice. Perhaps we should try to enumerate a list of workable solutions as we go. I will start a list in the first comment to track this.

@johnthagen
Copy link
Contributor Author

It seems hard to imagine someone could use --with-license-file with CSV output effectively. The LICENSE files will have lots of \n that will destroy the formatting. Same could be said for Markdown, reST, and Confluence.

For example, when I first tried to use pip-licenses I wanted to use RST -> Sphinx. I quickly realized that RST using --with-license-file never actually worked as the tables were a complete mess. What we actually did instead (and works really nicely) is to:

$ pip-licenses --from=mixed --format=rst --output-file summary.rst
$ pip-licenses --from=mixed --format=plain-vertical --with-license-file --no-license-path --output-file with_license.txt

And then the RST and plaintext are ingested by Sphinx: the RST is rendered nicely, and the plain-vertical is just shown as monospace text. The end result is nice.

All that to say, maybe we should consider specifically not supporting certain formats with --with-license-file, basically any of the formats that will break from multi-line LICENSE files anyway. I think they are basically broken currently for --with-license-file anyway.

@johnthagen
Copy link
Contributor Author

johnthagen commented Aug 30, 2020

I propose --with-license-file be replaced with --with-license-files so that it is more accurate, and --with-license-files changed to only support formats that can support license files (they need to be able to handle \n without breaking the table/output).

Formats supported with --with-license-files:

  • HTML
<table>
    <tr>
        <th>Name</th>
        <th>Version</th>
        <th>License</th>
        <th>LicenseFiles</th>
    </tr>
    <tr>
        <td>Django</td>
        <td>2.0.2</td>
        <td>BSD</td>
        <td>
          <ul>
            <li>License File Contents ...</li>
            <li>License File 2 Contents ...</li>
          </ul>
        </td>
    </tr>
</table>
  • JSON
[
  {
    "licenses": ["BSD"],
    "name": "Django",
    "version": "2.0.2",
    "LicensePath": ["/path/to/LICENSE1", "/path/to/LICENSE2"],
    "LicenseText ": ["LICENSE Text 1", "ICENSE Text 2"]
  },
]
  • JSON LicenseFinder This format does not take into account --with-license-file currently. Perhaps a warning or error should be thrown in this case?
  • Plain Vertical
pytest
5.3.4
MIT LICENSE
<LICENSE File 1 Contents>

<LICENSE File 2 Contents>

Other formats will result in a CLI error if combined with --with-license-files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants