Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer $EBPYTHONPREFIXES over $PYTHONPATH #4496

Draft
wants to merge 17 commits into
base: 5.0.x
Choose a base branch
from

Conversation

Micket
Copy link
Contributor

@Micket Micket commented Apr 3, 2024

  1. I'm moving the abstraction point of update_paths to an internal only _update_paths, allowing prepend_paths and append_paths to simply call a safe update_paths directly without duplicating "__filter_paths" check. This also allows for me to add a place to add special rules for PYTHONPATH/EBPYTHONPREFIX is a safe manner without having do duplicate.

  2. I will add a conditional check that rewrites recognized PYTHONPATH's to EBPYTHONPREFIX paths (work in progress)

@Micket
Copy link
Contributor Author

Micket commented Apr 4, 2024

Remains to add:

  1. New option to make this a conditional change.
  2. tests

@Micket
Copy link
Contributor Author

Micket commented Apr 4, 2024

I think I added the option correctly, still need to try and even use this at all, haven't gotten that far yet :)

And write tests.

@Micket Micket changed the title Make new safe module abstraction for update_paths Prefer EBPYTHONPREFIX over PYTHONPATH Apr 4, 2024
@Micket Micket force-pushed the pythonprefix branch 2 times, most recently from e4979d6 to 2f6b631 Compare April 9, 2024 23:45
@Flamefire
Copy link
Contributor

Looks good to me so far. Just confused about the commit title of 2f6b631 as it is not a warning but an error that it should/must not occur. So you might want to change either the commit title or implementation to match such that the intention is clear in the future.

@Micket Micket force-pushed the pythonprefix branch 2 times, most recently from ddaa144 to 38e19f1 Compare April 11, 2024 15:38
@Micket
Copy link
Contributor Author

Micket commented Apr 11, 2024

I've finally gotten around to actually test the code (and to little surprise, there was a bunch of more things to fix).

I was requested to move the duplicate PYTHONPATH check outside of the rewrite code, as it still applies.

My biggest question is the command line option. It's long, and, will become even longer now than I realized I should have called it "ebpythonprefixes" since we use plural here.

I'm also not sure how we ought to be dealing with default-true store_true type flags, as there is no way to override these from the command line.
We currently do this with the following flags:
cleanup-builddir
cleanup-tmpdir
lib-lib64-symlink
lib64-fallback-sanity-check
lib64-lib-symlink
modules-tool-version-check
mpi-tests
pre-create-installdir
show-progress-bar
trace

@Micket Micket marked this pull request as ready for review April 11, 2024 15:41
Copy link
Contributor

@Flamefire Flamefire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure how we ought to be dealing with default-true store_true type flags, as there is no way to override these from the command line.
We currently do this with the following flags:
cleanup-builddir

We already have a way for that: --disable-cleanup-builddir

if not isinstance(paths, list):
paths = list(paths)
filtered_paths = [x for x in paths if x not in added_paths and not added_paths.add(x)]
# Coerce any iterable/generator into a list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about the change here. Why changing _filter_paths and even removing the ability to pass a string (which is a breaking change leading to surprising behavior)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the code (_update_paths) both converted the str case to a list. I simply moved this earlier, which also removes the need for _filter_paths to keep duplicate logic, since regardless it was always converted to a list immediately afterwards anyway.

You can pass a str to paths to append/prepend/update_paths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I see that our usages in this file basically make this part dead code. However someone or something else might already use this code, so I'm not sure if we should be a bit more careful here. E.g. at least error out when a string is passed instead of an iterable of strings

On the other hand: Calling this method from outside would have been wrong (it is "private") and this PR is targetting 5.x so a breaking change for simpler logic is fine.

easybuild/tools/module_generator.py Show resolved Hide resolved
easybuild/tools/module_generator.py Outdated Show resolved Hide resolved
This will ensure duplicate paths are filtered even when EBPYTHONPREFIXES
rewrite is in place.
paths = [paths]

if key == 'PYTHONPATH':
python_paths = [path for path in paths if re.match(r'lib/python\d+\.\d+/site-packages', path)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking the logic in _filter_paths I see an issue here: You iterate paths multiple times: Here and at https://github.com/easybuilders/easybuild-framework/pull/4496/files#diff-4af72e9777d353325a29df400ab4229e14defd653e413db51786dc2172d3baefR275 but inside _update_paths you call _filter_paths which "coerces a generator into a list" if it isn't a list yet.

This means it this point paths might be a generator which can only be iterated once as otherwise the code in _filter_paths wasn't required.

Similar to my previous reasoning about breaking change in favor of cleaner code: I would simply disallow passing anything but a a string or a list to update_paths and assert that it is a list in _filter_paths instead of trying to convert a potentially exhausted generator to a list here or there.

Quick example:

paths=(i for i in range(10))
[i for i in paths if i==2] # [2]
[i for i in paths if i==2] # []

@@ -965,6 +981,10 @@ def update_paths(self, key, paths, prepend=True, allow_abs=False, expand_relpath
:param allow_abs: allow providing of absolute paths
:param expand_relpaths: expand relative paths into absolute paths (by prefixing install dir)
"""
paths = self._filter_paths(key, paths)
if paths is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about how the filtered paths could be None:
This is inside _filter_paths:

if not filtered_paths:
    filtered_paths = None

I would remove both checks, neither is required and we are actually checking stuff twice. Having _filter_paths always return a list, even if it is empty is cleaner interface-wise. I mean otherwise you would need to document it as "returns a non-empty list of paths to add or None" instead of "returns a list of paths to add".

If we really want to exit early here, then it is better to remove the above check in _filter_paths and change this to:

Suggested change
if paths is None:
if not paths:

That would at least check only once

easybuild/tools/module_generator.py Outdated Show resolved Hide resolved
easybuild/tools/module_generator.py Outdated Show resolved Hide resolved
@branfosj branfosj added the EasyBuild-5.0 EasyBuild 5.0 label Apr 19, 2024
@@ -480,6 +480,8 @@ def override_options(self):
'int', 'store', None),
'parallel-extensions-install': ("Install list of extensions in parallel (if supported)",
None, 'store_true', False),
'prefer-ebpythonprefix-over-pythonpath': ("Replaces PYTHONPATH with EBPYTHONPREFIX when possible",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--prefer-ebpythonprefixes is specific enough, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We bounced around a bunch of different names, so i'm currently thinking maybe --replace-pythonpath is good enough. The EBPYTHONPREFIXES name is kind of an internal implementation detail in easybuild which wouldn't mean much to any user. The comment and documentation could go into more depth to explain it and how it connects to the site customize script our python installs have.

The fact that it doesn't always replace pythonpath, exceptions really are very rare, and this wouldn't be the first option that does what is claims only when it is supported.

Copy link
Contributor

@Flamefire Flamefire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about typing in EasyBuild just the other day as it makes the code so much clearer, so thanks for starting it! 🎉
I added some suggestions about using List[str] instead of the generic list

easybuild/tools/module_generator.py Outdated Show resolved Hide resolved
easybuild/tools/module_generator.py Outdated Show resolved Hide resolved
easybuild/tools/module_generator.py Outdated Show resolved Hide resolved
easybuild/tools/module_generator.py Outdated Show resolved Hide resolved
Micket and others added 4 commits April 21, 2024 19:53
Co-authored-by: Alexander Grund <Flamefire@users.noreply.github.com>
Co-authored-by: Alexander Grund <Flamefire@users.noreply.github.com>
Co-authored-by: Alexander Grund <Flamefire@users.noreply.github.com>
Co-authored-by: Alexander Grund <Flamefire@users.noreply.github.com>
Copy link
Contributor

@Flamefire Flamefire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for that potential enhancement @boegel mentioned.

BTW: You can apply all suggestions in a single commit from the code view (there will be a 2nd button like "add to batch")

easybuild/tools/module_generator.py Outdated Show resolved Hide resolved
Co-authored-by: Alexander Grund <Flamefire@users.noreply.github.com>
Copy link
Contributor

@Flamefire Flamefire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a serious issue!!!

While checking our hook we use:

    def module_write_hook(self, filepath, module_txt):
        if 'Python' in get_dep_names(self.cfg):
            return _replace_pythonpath(module_txt)

That condition there is crucial! We must not use EBPYTHONPREFIXES when Python is not a dependency. Given the possibility of transitive dependencies this gets quite hard, so a trade-off would be to disable this for SYSTEM level Easyconfigs

If we don't then e.g. the EasyBuild module installed with EB won't work anymore.

@Micket
Copy link
Contributor Author

Micket commented Apr 29, 2024

Yeah I was actually just testing that out with Conda packages as well actually (I got stuck with the testing as I was having issues just getting any of the Conda easyconfigs to even install at all). Thoughts:

  1. adding an option for easyconfigs/blocks to opt out of this transformation. Presumably needed for Conda packages and such, as only things that understand sitecustomize.py thing can work with these.
  2. Initially I wanted to make PythonBundle and PythonPackage just generate the EBPYTHONPREFIXES directly rather than relying on this transformation, but I was requested on making it a configurable parameter + there are many easyconfigs that specify PYTHONPATH via modextrapaths (which really complicated things) I went in this direction instead. Maybe it's worth revisiting that idea
  • Maybe easyconfigs should just lay off the PYTHONPATH variable the same way we don't specify LD_LIBRARY_PATH manually either. Detecting that this path is in use and adding it automatically seems appropriate. Then one could always make that code EBPYTHONPREFIXES-aware, as well as compatible with multi-deps. In fact, maybe this already happens???
  • If so, then there is no need for some rewrite-hook here, and instead it's just if up to the blocks to pick which path they want.

@Micket Micket marked this pull request as draft April 29, 2024 13:20
@boegel boegel changed the title Prefer EBPYTHONPREFIX over PYTHONPATH Prefer EBPYTHONPREFIXES over PYTHONPATH May 22, 2024
@boegel boegel changed the title Prefer EBPYTHONPREFIXES over PYTHONPATH Prefer $EBPYTHONPREFIXES over $PYTHONPATH May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

None yet

4 participants