Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package and Environment Introspection Library #247

Open
ghost opened this issue Mar 2, 2019 · 33 comments
Open

Package and Environment Introspection Library #247

ghost opened this issue Mar 2, 2019 · 33 comments

Comments

@ghost
Copy link

ghost commented Mar 2, 2019

There should be an official package examination library.

Basic tasks I suggest:

  1. get package name,
  2. get package metadata (METADATA, pytoml.yml, ...)
  3. list package dependencies,
  4. possibly more basics in the future.

This could be PEP517, or something else - which I propose should be decided upon.

I already implemented these here: pypa/pyproject-hooks#44 Feel free to ignore this if you don't like this concrete implementation, I'm just throwing one idea out there. Example usage:

# Getting package name from pip reference:

from pep517.metadata import get_package_name
print(get_package_name("pillow"))
# Outputs: "Pillow" (note the spelling!)


# Getting package dependencies:

from pep517.metadata import get_package_dependencies
print(get_package_dependencies("pep517"))
# Outputs: "['pytoml']"


# Get package name from arbitrary package source:

from pep517.metadata import get_package_name
print(get_package_name("/some/local/project/folder/"))
# Outputs package name

Questions are probably where could such a library/functionality be (e.g. part of pep517 or not), and how can it come to existence

Rationale

The initial reaction (which I can relate to) to my implementation suggestion here pypa/pyproject-hooks#44 was roughly "pep517 only deals with local source trees, this also deals with remote packages and it's not an ideal fit".

So I realized as already pointed out here #224 there is no official library which offers getting absolute basics like above for an unbuilt/not installed package. (which may in some cases not even be locally available)

Why this is IMHO vital/important:

  1. Cleanliness point of view to not pollute a system: people shouldn't install packages, or figure out how to not need to install them, to get absolute basic info
  2. For cross compilation environments like python-for-android where a package might not install everywhere
  3. Of course everyone can just implement this somehow manually with venv/virtualenv and pip, but can they do it correctly for corner cases?
  4. "Yet another unofficial lib" is kinda useless since this is what makes dealing with such basics so infuriating in the first place: there are plenty of unofficial tools, many old, many bad, many not aware of non-setuptools... and the only way that works is using pip, which is not an option as a library

Therefore, I propose such an official library should exist, and I offer my implementation above as a starting point for what could be in it (feel free to discard if you don't like!)

@pfmoore
Copy link
Member

pfmoore commented Mar 2, 2019

One specific issue I have is that it's using pip to download packages. There's a lot of work going on in the packaging ecosystem to move away from having a single implementation of any functionality. And so while it's true that currently pip is the primary build frontend (in PEP 517 terms) I'm uncomfortable with baking the use of pip into the pep517 library (or any foundational library). Having a pluggable means of getting packages (from PyPI, local indexes, files, ...) is not an easy problem, and not one that anyone has really tried to address yet, but it's why building foundational libraries in this area is hard.

You may want to investigate distlib, which is another approach to this whole question.

BTW, to head off any accusations of my position being inconsistent ;-) I know that pep517 currently uses pip in its environment builder. I'd like to see that change so the caller supplies an installer, that way the functionality could be used by other frontends.

"Yet another unofficial lib" is kinda useless

Note that every existing official tool started off as an "unofficial personal library". The PyPA doesn't have the resources to develop "official solutions" from nothing, the process is very much one of taking working, popular solutions, and recommending them as the best practice. So while I agree that yet another contender in the already crowded space of projects trying to build on Python's base packaging infrastructure is likely to struggle to get noticed, I firmly believe that getting noticed by being a really good solution is a far better answer than getting noticed because it's been arbitrarily put under the PyPA banner.

@ghost
Copy link
Author

ghost commented Mar 2, 2019

@pfmoore Edit: I overreacted, but let me sum up why I'm frustrated:

This is such a basic problem that should have been put out there years ago and it hasn't happened, which seems to indicate that all the packaging experts are busy with other things than implement such basics

What you are suggesting (organically growing) may happen, but it hasn't yet, so how do you think this will ever happen now?

@ghost
Copy link
Author

ghost commented Mar 2, 2019

Edit: removed for now, was unnecessarily confrontational. I apologize (see edit history if you're curious)

@ghost
Copy link
Author

ghost commented Mar 2, 2019

Edit: removed for now, worded better below (see edit history if you're curious)

@ghost
Copy link
Author

ghost commented Mar 2, 2019

I now clarified by editing my initial post that I am just hoping this will happen and pointing out how much I missed this as an end user, and my implementation is just one suggestion of how it could start out. I put this poorly the first time, I'm really not trying to take this project over although I am interested in helping out (I'm also willing to maintain that code/look into any bugs that pop up. I just can't provide you with the perfect non-pip expert solution which you seem to hope for or lead an entire library around this, I'm really the wrong person for that)

@ghost
Copy link
Author

ghost commented Mar 2, 2019

@pfmoore I just looked into distlib, it seems for analyzing existing sdist packages, is that right? This could replace my METADATA code and get it a little shorter, although I would argue it's not particularly long so this seems nice-to-have but not like a huge pain point to me

As for the pip use however, your main objection, this is just for downloading and everywhere else I use pep517 or pep517.envbuild. I am currently not seeing a module for downloading in distlib. (although I might be missing it) Were you suggesting it to help with that, and if not, any other suggestions to help if the pip use is the main problem for you? I'm really not familiar with the expert packaging libraries available that could replace pip here... I know very little in this area 😬

But maybe there should first be some decision on whether such a library will even exist officially or not

@pfmoore
Copy link
Member

pfmoore commented Mar 2, 2019

@Jonast I apologise if my comments left you feeling frustrated. Let me make it clear, these are just my views, and in particular I'm not in charge of the pep517 project, so it's not me you have to persuade (you don't even need to take any notice of my comments, and I mean that sincerely).

However, my frustration here is also showing through, and that's meant I've been unnecessarily negative. In particular, I am frustrated at how much is expected (by the "general community", not by you) of the people who work on Python's packaging (in their free time). Everyone seems to believe that their use case is immensely important, and expect us to care about it as much as they do. As a result, I have to be very careful to be positive and encouraging when dealing with new contributors - and I wasn't with you, for which I apologise.

Your question:

What you are suggesting (organically growing) may happen, but it hasn't yet, so how do you think this will ever happen now?

is very relevant. I genuinely don't know the answer here. We need more contributors, is about all I can say - and I've obviously not helped in this case, as I've given you a pretty bad experience when you tried to help :-(

Hopefully someone will be able to help you move your proposal forward. Please don't be discouraged by what I've said.

@ghost
Copy link
Author

ghost commented Mar 2, 2019

Please don't be discouraged by what I've said

I won't be, I certainly overreacted. I was just frustrated at the prospect of founding yet another packaging helper library (which I never planned to create), this feels so basic surely it needs to go into something existing! But I suppose that may be brought forward if more people answer here

In particular, I am frustrated at how much is expected

I can imagine 😬 I have also been a little overly pushy from over here. I'll try to sit back and wait more. Maybe someone comes up with an idea where to include this that is more productive than another separate project?

I just feel like python packaging is already splintered up enough! I hoped to make it easier to deal with the chaos with my contribution for inexperienced users, and not add more to it (although I suppose every addition means more entropy and chaos... a vicious cycle)

@pradyunsg
Copy link
Member

pep517 is definitely not the correct package for this. distlib likely does implement a lot of what you're describing here.

FWIW, this is what a separated pkg_resources is supposed to become IIUC. It's probably best to aid whatever the current effort is in this area. See pypa/setuptools#863 and pypa/setuptools#1664 (comment).


I am frustrated at how much is expected (by the "general community", not by you) of the people who work on Python's packaging (in their free time). Everyone seems to believe that their use case is immensely important, and expect us to care about it as much as they do.

Gee, Yes. Ditto for me.

(not pointing at this thread specifically) A lot of the comments that I come across daily that are unnecessarily corrosive. I've even resorted to actually going in and editing/hiding such comments so that I don't have to read them again -- taking cues from Brett Cannon.

@pradyunsg pradyunsg changed the title There should be an official package examination lib to: 1. get package name, 2. get package metadata (METADATA, pytoml.yml, ...) 3. list package dependencies, 4. possibly more basics in the future Package and Environment Introspection Library Mar 2, 2019
@ghost
Copy link
Author

ghost commented Mar 2, 2019

@pradyunsg I apologize for my impatience 😬

As for where to put this sort of functionality:

It is hard for me to tell (since I'm quite clueless) but from a docs peek distlib appears, like what most existing libraries tend to be, mostly focused around installed packages or packaged-up sdists. I'm aiming for unpackaged local source folders of any kind, and any other sort of uninstalled/unpackaged/remote package reference, which seems to be at an earlier point than that

As for pkg_resources that sounds interesting, I wonder if @jaraco has any input on this?

@gaborbernat
Copy link

Let me know if I'm wrong but here's what I think/know of.

At the moment as far as I'm aware the only kind of PEP specified metadata we can extract from is the wheel format. distlib can also extracts source distribution information (I assume uses the egg-info), but this only works in a non-PEP-517 world. The egg.info in source packages is no longer needed for PEP-517 (e.g. poetry/flit may decide to just no longer generate it). This leaves us with no official way to get metadata for both source trees and source distributions, without building that into a wheel first.

Given PEP-517 split between build backend and frontend the side that could answer this information is the build backend. The frontend or any utility may be able to acquire the package, but only the build backend actually knows where to look for this metadata information. As such if anything we need to extend PEP-517 (but more likely write a new PEP) that requires build backends to provide this information without building a wheel. As far as level and kind of information the backend should provide it's safe bet to go with the same level the wheel specification already mandates.

Finally one can write some tool (that with time we can mark it as official) to provide this information for remote packages (this would end more or less as a frontend builder, that acquires, provision the backend and invokes it). First and firemost though we should agree that backends SHOULD provide this data. Thoughts? @pfmoore @Jonast @pradyunsg

@ghost
Copy link
Author

ghost commented Mar 2, 2019

Well I personally think that at least name, description, dependencies (including all the version pins, conditionals, and all of that) should be provided without building anything. Even with PEP517 I'll need to install setup_requires to get these which can take a while. (even if not actually building the wheel afterwards as apparently not necessary for setuptools, which would also process install_requires) So my vote would definitely be on making this basic as immediately queryable as possible, with a little install needed as possible.

Then again I only half understood what you wrote so don't give my vote too much weight 😆

@gaborbernat
Copy link

It is build requires, not setup requires (setup requires is setuptools specific).

Even with PEP517 I'll need to install setup_requires to get these which can take a while.

No way to get around this though. Only the right build backend can tell you the right answer. There are no guarantees that the frontends python has the right backend. At the minimum, one MUST install the build backend so that it can inspect the source tree/source distribution and give you the answer.

@ghost
Copy link
Author

ghost commented Mar 2, 2019

No way to get around this though.

But that seems really weird to me for setuptools, because the setup.py can be run, and it will spit out all the data via the setup() call in any case. (before any of setup_requires is installed! After all it needs to run through to return that in the first place)

So it seems like there is some problem with how things are abstracted here? Maybe this makes sense for other build tools, but for setuptools I just don't understand it

t the minimum, one MUST install the build backend

But wouldn't this just be build_requires in pytoml.yml? That's not the same as setuptools' setup_requires, which I am pretty sure will be installed for pep.prepare_metadata_for_build_wheel right now, which can take a long time depending on what it is

@gaborbernat
Copy link

setup_requires is deprecated and has been for a while so we can drop discussing it I think.

But that seems really weird to me for setuptools, because the setup.py can be run, and it will spit out all the data via the setup() call in any case

It will spit out some data, no guarantees the right data though. I mean if the package requires version 40.8.0 of setuptools and you have 38.0.0 there are no guarantees by blindly running without provisioning the right version first you will get the correct data. The packaging itself might just fail.

@ghost
Copy link
Author

ghost commented Mar 2, 2019

Ah, I suppose that makes sense. In that case temporarily installing setup_requires might indeed be inevitable. Thanks for the explanation 👍

@ghost
Copy link
Author

ghost commented Mar 2, 2019

As we just discussed here pypa/pyproject-hooks#44 (comment) the whole idea behind this request of this ticket / Package and Environment Introspection Library for me would be pretty much to have basics somewhere of what one might consider frontend library, or a pip lib if you want. (although I personally only need these utter inspection basics for now, not to actually install or do whatever else)

So basically the idea behind my code was something that doesn't need another layer or tool on top to reasonably be used, and that works for all types of package references including remote ones. I just thought I'd leave this thought here since it seems relevant for the discussion.

Edit: I really don't want to start a new project though. so I am really badly looking for existing ones where this might fit as a contribution. It's not a huge amount of code, I don't see where splitting & establishing this as yet another separate packaging tool would be very fruitful

@pfmoore
Copy link
Member

pfmoore commented Mar 2, 2019

the whole idea behind this request of this ticket / Package and Environment Introspection Library for me would be pretty much to have basics somewhere of what one might consider frontend library, or a pip lib if you want

Yes, but... No such library exists at the moment. The only PEP 517 frontend that currently exists is pip, and we deliberately don't provide a library interface - precisely because maintaining such a library is a big task, way beyond the level of resource we have. The pep517 library isn't a frontend, it's an interface library, designed to simplify the job of linking frontends to backends.

As you state it here, I support your goal - it would be good to have an alternative build frontend to pip, and it would be even more useful to have it expose a programming API, rather than just being a command line tool. But just wanting such a thing isn't enough - someone has to do the work of creating it, and at the moment, nobody has. OK, maybe such a thing does exist somewhere, but when you've got multiple people from the packaging community saying that they don't know of one, I think it's reasonable to assume they are probably right.

The nearest that exists, to my knowledge, is distlib. And that's the work of a single individual, working in his spare time, so as a result it's fallen behind recent developments in the packaging ecosystem, such as PEP 517. And as you found out, it doesn't precisely fit your use case. So it would be a lot more work than just contributing your code as it stands, to get this feature into distlib.

I'm sorry you feel frustrated that things aren't in a more complete state, but this is the reality, and we need to appreciate that.

@ghost
Copy link
Author

ghost commented Mar 3, 2019

Right, in that case feel free to leave this issue open or close it or whatever you prefer. I'll do what I need to do to continue ahead with what I originally tried to achieve, and just integrate the code downstream as misplaced as it is. thanks for the discussion and the detailed component explanations, these were very appreciated

@pradyunsg
Copy link
Member

As such if anything we need to extend PEP-517 (but more likely write a new PEP) that requires build backends to provide this information without building a wheel.

Yes. My understanding is that it'll need a new PEP with well justified reasons for the change. Nicer resolution is one, which would be more compelling once there's a resolver. ;)

I do plan on coming around to this, on the mailing list / Discourse eventually. right now though, I've not even had the time to even cover the stuff I have to do as 19.0's RM. 😞

@gaborbernat
Copy link

I thought this over and maybe all we need is to parse the wheel metadata prepared, not? Which already is part of pep 517.

@pfmoore
Copy link
Member

pfmoore commented Mar 3, 2019

As @takluyver said on one of the other threads this whole discussion has spawned, the problem is that projects may generate different metadata depending on things like build options, or the target platform. So there's actually no stable concept of "the metadata". In particular, older packages may generate different dependencies for different Python versions, rather than using Requires-Python. In an extreme case, a project using setuptools could have a project name of foo if you built it on a Tuesday, and bar if you built it any other day! Obviously that's silly, but it's standard-conforming 😞

So at a minimum, I think a PEP is needed to specify what metadata projects are required to keep stable regardless of the environment in which it's generated. It may be that the answer is "all of it", and projects that don't will simply not be conforming to the new standard, but that's fine. A library like this would have to decide how it wants to handle projects that don't conform to the relevant standards (a lot of projects don't yet conform to PEP 517/518, to give an obvious example) and this would just be another case of that.

@gaborbernat
Copy link

I don't believe we want to solve the problem for non standard complying packages. Those packages are invited to fix their configuration if they want to participate in features built on new standards. As far as metadata stability goes we should always generate the metadata on the platform requesting the information. Packages that do random thing as different dependencies based on datetime (or any other non stable) can be considered in violation of standards and not work with this system🤔 I would like to have a rough solution that works for 90 percent, and iterate on that to solve 100 percent.

@pfmoore
Copy link
Member

pfmoore commented Mar 3, 2019

Agreed - I'm not trying to claim that anything less than perfection is useless. Just that being clear what's in scope and what is out of scope (by means of defining clear standards) has worked well for us, and we should continue that approach.

@takluyver
Copy link
Member

takluyver commented Mar 3, 2019 via email

@ncoghlan
Copy link
Member

ncoghlan commented Mar 5, 2019

Tagging @uranusjr into this thread, as what @Jonast is describing here feels like a subset of what he's been attempting to extract from pipenv as an independent dependency resolver.

As for why such a shared library hasn't happened previously: building an internal API to solve the metadata retrieval problems of one specific tool is already difficult, and the potential audience for a shared API is likely small (probably dozens of projects at most).

While a shared library should reduce the maintenance costs for affected projects in the long run, in the near term it increases it due to the internal refactoring required, and also increases the risk of regressions in the releases that switch to the new implementation. And if existing projects decide not to adopt the extracted library, then we end up with a case of https://xkcd.com/927/

Accordingly, such refactoring efforts tend to get tied to specific user-facing features, as that enhancement then provides a more concrete pay-off for doing the work than a speculative potential reduction in future collective maintenance effort.

@uranusjr
Copy link
Member

uranusjr commented Mar 6, 2019

It does indeed. I’m currently using a mix of pep517, distlib, and pip internals to get package metadata, and it’s reasonable to refactor it into a separate library, and try to improve it gradually. But… I guess this falls into the same category with most of Python packaging problems of “yeah that’s a fantastic idea, I’m totally doing it if I have time” 😞

@ghost
Copy link
Author

ghost commented Mar 6, 2019

then we end up with a case of https://xkcd.com/927/

Aren't we already there? Apparently @uranusjr implemented this, and separately did I, and probably other people - not that it necessarily changes anything, but just to point this out. 🤷‍♀️

Edit; for what it's worth I managed to factor out that code part I had that depended on import location and I'm currently working on tests. Since it's a separate standalone file my implementation can therefore now be trivially moved out into a separate lib if anyone wants it

@uranusjr
Copy link
Member

uranusjr commented Mar 6, 2019

One of the most important “other people” is, of course, pip. Home baked solutions like yours and mine are fine to solve our own problems, but pip has tremendous backward compatibility luggage, and any “official” API would need to compete with it and suffers from the standards problem. The only way to have a blessed library without falling into the trap is to refactor the logic out of pip, but unfortunately that is a much more involved task.

@pfmoore
Copy link
Member

pfmoore commented Mar 6, 2019

The only way to have a blessed library without falling into the trap is to refactor the logic out of pip, but unfortunately that is a much more involved task.

This is sadly true. However, it's also why a "third party" solution is potentially a practical way forward. The third party library can implement the logic, and as it gains users those users will (inevitably) flag up "this doesn't work like pip" issues. These can be addressed, either by fixing the library, or by agreeing that pip's behaviour is wrong and filing an issue against pip. Over time, both the library's and pip's behaviours will converge to the point where replacing pip's logic with the external library (and at the same time blessing the external library as the "official" implementation of the behaviour) becomes possible.

The third party library route is overall going to be more work (probably a lot more work), and is going to involve a non-trivial additional support burden for the people maintaining the new library. But the advantage is that it avoids the problem of everything getting blocked on pip being the bottleneck.

@brainwane
Copy link
Contributor

Regarding pip's dependency resolver logic:

The Python Software Foundation's Packaging Working Group has secured funding to help finish the new dependency resolver, and is seeking two contract developers to aid the existing maintainers for several months. People in this thread: Please take a look at the request for proposals and, if you're interested, apply by 22 November 2019. And please spread the word to freelance developers and consulting firms.

@merwok
Copy link

merwok commented Nov 12, 2019

I thought there were already a few existing projects to inspect the environment.
From the top of my head: https://pypi.org/project/pkginfo/

@uranusjr
Copy link
Member

@merwok There are a few layers to OP’s problem. Tools like pkginfo (and distlib, which provide more than that) provide an API to inspect project metadata, but then you need to actually have that metadata available, which is why pep517 is mentioned (to build that metadata from user declaration a la setup.py). The proposal here is not to have code getting the information (we already do), but to collect the code into a coherent API that wraps all the required legwork, instead of having to pull in like five packages to handle slightly different cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants