Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract reverse dependencies #571

Open
2 tasks done
jamesmyatt opened this issue Dec 18, 2023 · 3 comments
Open
2 tasks done

Extract reverse dependencies #571

jamesmyatt opened this issue Dec 18, 2023 · 3 comments

Comments

@jamesmyatt
Copy link
Contributor

Checklist

  • I added a descriptive title
  • I searched open requests and couldn't find a duplicate

What is the idea?

There should be an easy way to extract the reverse dependencies. The conda-lock files store the forward dependencies, e.g. "pandas depends on numpy". However the reverse dependencies are also interesting and important, e.g. "numpy is required for pandas" or "pandas is required for environment.yml". I'd like a way to access this information.

Why is this needed?

Reverse dependencies help you to understand why a particular package was included in the conda-lock file, e.g. when tracking down conflicts, when trying to slim down an environment or debugging rogue packages.

What should happen?

I don't really mind how the information is accessed as long as there's an easy way to present the information.

Two main options (IMO):

  1. Option to include comments with reverse dependencies as comments in the "env" and "explicit" export "kinds" (default=disabled). Similar to pip-tools.
  2. Export "kind" that is focussed on the graph (e.g. one from https://networkx.org/documentation/stable/reference/readwrite/index.html), might be best.

Implementing both eventually might be an option.

Additional Context

No response

@maresb
Copy link
Contributor

maresb commented Dec 18, 2023

Thanks for the proposal! I agree that this is interesting info.

My inclination would be to support only the unified lockfile format, i.e. kind="lock". The reason is that "env" and "explicit" aren't really geared towards supporting extra metadata, so you have to shove it into comments. Are you interested in these formats because that's what you currently use? If so, then why do you use them as opposed to the unified lockfile format? Also, could you please provide a more specific reference to the feature in pip-tools? I don't really understand your networkx idea either.

How exactly do you think we should represent a reverse dependency? Like which data structures should be used? I see two cases: a reverse dependency can be a dependency from another package, or it can come from a dependency specification in a source file e.g. environment.yml.

@jamesmyatt
Copy link
Contributor Author

jamesmyatt commented Dec 19, 2023

I'm not sure that adding more information to the "lock" files would be helpful, since they're already too large to read and the information is basically already all in there, just spread out. This is why I thought that modifying one of the "render" targets or adding a new one might work best. Personally, if I want to understand what's in a "lock" file, I render it to an "env" one first, unless I'm just looking specific packages. I don't currently use the "explicit" one.

pip-tools compiles a pip "requirements.txt" file that looks like this (it's just the default output of pip-compile, equivalent to conda-lock lock) and I think this would be OK for the "env" or "explicit" files, especially as an option.

# ... header ...
asgiref==3.6.0
    # via django
django==4.1.7
    # via -r requirements.in
sqlparse==0.4.3
    # via django

The networkx idea was about creating a new render target that just contains the dependency graph in a format that networkx can parse for further dependency analysis. It's probably overkill unless the other ideas don't work.

I suspect that the main thing that's missing is tracking which environment file the base requirements came from. I haven't looked at how conda-lock stores or builds the dependency tree, but I assume it's not too hard to reverse. It's probably not necessary to use a dedicated package like networkx to do this.

@maresb
Copy link
Contributor

maresb commented Dec 19, 2023

Ah, I think I understand now, thanks a lot for the more detailed explanation!

I really like the idea of being able to render various representations of the data. I agree that a rendering approach is much more practical than adding fields to the lockfiles.

Tracking the sources of a dependency will not be easy due to the cruftiness of the current implementation. Some of the code here desparately need to be cleaned up or redone. But this might be an opportunity to improve the code quality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants