Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.egg directories not considered in get_pkg_included_file #97

Open
philipaxer opened this issue Mar 24, 2021 · 7 comments
Open

.egg directories not considered in get_pkg_included_file #97

philipaxer opened this issue Mar 24, 2021 · 7 comments

Comments

@philipaxer
Copy link

Hi All,

I noticed that some license files are not correctly identified. This seems to happen because only .dist-info directories are considered and .egg are not tried.

This specifically assumes data will reside in .dist-info which is not always true.

pkg_dirname = "{}-{}.dist-info".format(
            pkg.project_name.replace("-", "_"), pkg.version)

In my .venv i have numpy-1.20.1-py3.9-win-amd64.egg which is not detected and skipped. Similarly others

regards
Philip

@philipaxer
Copy link
Author

Changing the function as follows fixes the issue (no time to create a pull request, sorry).

    def get_pkg_included_file(pkg, file_names):
        """
        Attempt to find the package's included file on disk and return the
        tuple (included_file_path, included_file_contents).
        """
        included_file = LICENSE_UNKNOWN
        included_text = LICENSE_UNKNOWN
        pkg_dirname = "{}-{}.dist-info".format(
            pkg.project_name.replace("-", "_"), pkg.version)
        patterns = []
        [patterns.extend(sorted(glob.glob(os.path.join(pkg.location,
                                                       pkg_dirname,
                                                       f))))
        for f in file_names]
        

        [patterns.extend(sorted(glob.glob(os.path.join(pkg.location,
                                                       'EGG-INFO',
                                                       f))))
        for f in file_names]
        
              
        
        for test_file in patterns:
            if os.path.exists(test_file):
                included_file = test_file
                with open(test_file, encoding='utf-8',
                          errors='backslashreplace') as included_file_handle:
                    included_text = included_file_handle.read()
                break
        return (included_file, included_text)

@raimon49
Copy link
Owner

@philipaxer Thanks for the report. This issue will be resolved in the next patch version release.

@raimon49
Copy link
Owner

The egg package is legacy and I haven't used it much.

Looking at the specs, it looks like there are two types.

There are two basic formats currently implemented for Python eggs:

  1. .egg format: a directory or zipfile containing the project’s code and resources, along with an EGG-INFO subdirectory that contains the project’s metadata
  2. .egg-info format: a file or directory placed adjacent to the project’s code and resources, that directly contains the project’s metadata.

@philipaxer Please provide the full path of the egg package you want to explore for the license file.

Of course, you don't need any private information that you don't want printed on your machine.

@philipaxer
Copy link
Author

philipaxer commented Mar 25, 2021

This is interesting, i recreated the venv and installed the packages. Now numpy shows as a dist-info package.
Any idea when it will come up as EGG-INFO?

By going through my native site-packages, i can pick some examples, I am giving you the PATH which contains the LICENSE* (see note below)
site-packages\lxml-4.6.2-py3.9-win-amd64.egg\EGG-INFO

and
site-packages\pefile-2019.4.18-py3.9.egg-info

Interestingly, I cannot find any py3.9.egg-info directory which has LICENSE. The directory contains the following files:

$ ls -lha
total 109K
drwxr-xr-x 1 XYZ 1049089    0 Mar 21 19:11 ./
drwxr-xr-x 1 XYZ 1049089    0 Mar 25 10:26 ../
-rw-r--r-- 1 XYZ 1049089    1 Mar 21 19:11 dependency_links.txt
-rw-r--r-- 1 XYZ 1049089  404 Mar 21 19:11 installed-files.txt
-rw-r--r-- 1 XYZ 1049089 1.5K Mar 21 19:11 PKG-INFO
-rw-r--r-- 1 XYZ 1049089    7 Mar 21 19:11 requires.txt
-rw-r--r-- 1 XYZ 1049089  291 Mar 21 19:11 SOURCES.txt
-rw-r--r-- 1 XYZ 1049089   25 Mar 21 19:11 top_level.txt

XYZ@XYZ MINGW64 /c/Python39/Lib/site-packages/pefile-2019.4.18-py3.9.egg-info
$

Perhaps only option 1. from your list has the LICENSE as an explicit file.

@raimon49
Copy link
Owner

OK, thanks for your information.

@raimon49
Copy link
Owner

@philipaxer Hi, I tried to respond to the issues you reported.

I don't have the egg package installed in my environment, and it is a release candidate version.

Can you please report back if this version works well in your environment?

# Install the release candidate in your environment
$ pip install 'pip-licenses==3.3.2rc1'  

If it doesn't work well, please create a pull request in the following branch. You are always welcome to do so.
https://github.com/raimon49/pip-licenses/tree/release-3.3.2

@cdce8p
Copy link
Contributor

cdce8p commented Mar 30, 2021

Thought I would add some background information.

By default no version of setuptools includes license files inside the .egg-info folder. It's up to the individual developer to do so. I would recommend installing wheel first. This will create the dist-info folder for each newly installed package which includes license files.

As for the .egg: AFAIK this format is deprecated and has been replaced by wheel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants