New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW: Better package splitting for multi-output recipes with negative glob in outputs/files #5216
base: main
Are you sure you want to change the base?
Conversation
as keys in `files`
We require contributors to sign our Contributor License Agreement and we don't have one on file for @carterbox. In order for us to review and merge your code, please e-sign the Contributor License Agreement PDF. We then need to manually verify your signature, merge the PR (conda/infrastructure#891), and ping the bot to refresh the PR. |
CodSpeed Performance ReportMerging #5216 will not alter performanceComparing Summary
|
I'm not sure that the failure for linux 3.8 23.5.0 serial is related to this PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot. The PR is super clear and the documentation is on-point. I only added a couple comments to clarify my own doubts. The new positional argument could be considered API breaking, so let's see how the rest of the team feels about it. I don't have strong feelings, just applying an abundance of caution Just In Case ✨ .
Once we have discussed that, I'll be super happy to approve. 👍
conda_build/build.py
Outdated
metadata: MetaData, | ||
env, | ||
stats, | ||
new_prefix_files: set[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is adding a new positional argument to a (let's assume) public signature, what about making it a None
kwarg. This way downstream users won't run into number of arguments errors:
new_prefix_files: set[str], | |
new_prefix_files: set[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point; it is API breaking assuming these functions are part of the public API. I need to mark the new argument as typing.Optional
and/or set appropriate defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
conda_build/build.py
Outdated
metadata: MetaData, | ||
env, | ||
stats, | ||
new_prefix_files: set[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here about the API breakage. Not sure how prominent the usage of wheel outputs is, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1299,9 +1299,10 @@ You can specify files to be included in the package in 1 of | |||
Explicit file lists are relative paths from the root of the | |||
build prefix. Explicit file lists support glob expressions. | |||
Directory names are also supported, and they recursively include | |||
contents. | |||
contents. Files installed to the prefix by host dependencies will | |||
be matched by glob expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe even highlight this further with an admonition or something similar. I wasn't aware of that design issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with an admonition or something similar.
I don't know what you mean. You want stronger language? For example: "WARNING! Files installed to the prefix by host dependencies also will be matched by glob expressions."
I wasn't aware of that design issue!
Yeah, I'm not sure whether this is a bug/oversight or a feature. It's definitely unexpected to me. The most active contributor in this section is @msarahan; maybe they know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant an admonition in the RST sense. Triple-backtick + warning
kind of syntax. Equivalent to this:
Warning
Files installed to the prefix by host dependencies also will be matched by glob expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 642906b
files: | ||
- subpackage_file1 | ||
- somedir | ||
- "*.ext" | ||
include: | ||
- subpackage_file1 | ||
- somedir | ||
- "*.ext" | ||
exclude: | ||
- "*3.ext" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we replace the old test with a new one instead of adding a new case? Maybe I would add a new output in this meta.yaml and then test the different behaviors differently, if possible (e.g. how the list-of-str behaviour does copy anything in PREFIX, but include/exclude doesn't?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the test recipe to include multiple outputs: one uses the old file matching and one uses the new file matching.
Files can be excluded by specifying `files` as a dictionary separating | ||
files to `include` from those to `exclude`. Files installed to the prefix | ||
by host dependencies are automatically excluded when the include/exclude | ||
syntax is used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to mention the precedence here, i.e., exclude
entries have higher priority as per the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mentions the new behavior, i.e., "Files [...] excluded when the include/exclude syntax is used".
But since this is a deviation from the previous "include" logic (which would be more of a force-include
since it allows packaging pre-existing files.), I think we should make the difference more prevalent/explain the behavior for the old non-dict case more explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to mention the precedence here, i.e.,
exclude
entries have higher priority as per the implementation.
exclude
has a higher priority than which operation? It's unclear to me what ambiguity we would be removing by mentioning priority.
This mentions the new behavior, i.e., "Files [...] excluded when the include/exclude syntax is used". But since this is a deviation from the previous "include" logic (which would be more of a
force-include
since it allows packaging pre-existing files.), I think we should make the difference more prevalent/explain the behavior for the old non-dict case more explicitly.
Files can be excluded by specifying `files` as a dictionary separating | |
files to `include` from those to `exclude`. Files installed to the prefix | |
by host dependencies are automatically excluded when the include/exclude | |
syntax is used: | |
When defining `outputs/files` as a list, any file in the prefix (including those | |
installed by host dependencies) matching one of the glob expressions is | |
included in the output. Greater control over file matching may be | |
achieved by defining `files` as a dictionary separating files to | |
`include` from those to `exclude`. | |
When using include/exclude, only files installed by | |
the current recipe are considered. i.e. files in the prefix installed | |
by host dependencies are not matched. include/exclude may not be used | |
simultaneously with glob expressions listed directly in `outputs/files`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to mention the precedence here, i.e.,
exclude
entries have higher priority as per the implementation.
exclude
has a higher priority than which operation?
The kind of precedence where if you do
files:
include:
- lib/libfoo.so
exclude:
- lib/libfoo.so
the exclusion wins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now explained that when files match both exclude and include, they are excluded. d1b12a1
conda_build/build.py
Outdated
include = files.get("include", []) | ||
exclude = files.get("exclude", []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This syntax will be affected by the "key is set to an empty block, which means None in YAML" kind of issue. We should change it to:
include = files.get("include", []) | |
exclude = files.get("exclude", []) | |
include = files.get("include") or [] | |
exclude = files.get("exclude") or [] |
It might also be a good opportunity to sneak @jakirkham's changes added by #4971.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 1818:
- files = output.get("files", [])
+ files = output.get("files") or []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that files.get("include")
may return None
, but include
must be a list. So we use the falsiness of None
to replace it with []
. Equivalent to:
include = files['include'] if ('include' in files and files['include'] != None) else []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have
- output: something
files:
include:
- path
exclude:
- not-this-path # [linux]
and then you'll have files["exclude"] = None
on macOS, for example (linux is false). Not sure at what point we are ensuring that include
is a list.
Under that circumstance files.get("exclude", [])
will be None
instead of the expected []
. Is that what you mean? I think we are saying the same 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We are saying the same thing. I'm just trying to figure out if we should be doing something stronger than converting Falsy values into the empty list. Perhaps convert any non-list into a list? Something like:
include = files.get("include") if isinstance(files.get("include"), list) else []
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might silence errors when users populate it with a dict or something accidentally. I think we can just play it simple with falsy promotions.
conda_build/build.py
Outdated
@@ -2817,7 +2848,7 @@ def build( | |||
# This is wrong, files has not been expanded at this time and could contain | |||
# wildcards. Also well, I just do not understand this, because when this | |||
# does contain wildcards, the files in to_remove will slip back in. | |||
if "files" in output_d: | |||
if "files" in output_d and not isinstance(output_d["files"], dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if "files" in output_d and not isinstance(output_d["files"], dict): | |
if not isinstance(output_d.get("files") or (), dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this condition to:
if (
"files" in output_d
and output_d["files"] is not None
and not isinstance(output_d["files"], dict)
):
news/files-exclude.rst
Outdated
Enhancements: | ||
------------- | ||
|
||
* implemented new include/exclude sections for glob expressions in multi-output `outputs/files`. (#4196 via #5216) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the following the template file for news items. It should be markdown. Can you update it, please? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. https://github.com/conda/conda-build/blob/main/news/5222-deprecating-conda_interface
The headers should have ###, and the filename is extension-less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after fixing the little MD/RST nits.
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
:( |
What is the difference between the parallel and serial tests? Only the macos parallel tests are failing. It looks like it only changes parameters related to pytest and codecov, not conda itself? |
Serial jobs run tests that can't be run concurrently. Parallel jobs take rhe rest and run them with two processes wt the same time. The tests need to be marked as serial with a decorator to be considered as such, so the default is parallel. |
Package collisions cause errors.
I think the errors are coming from the fact that the subpackage tests use the same artifacts for both outputs, so the subpackages are colliding when they are installed to the same environment? |
@@ -18,6 +18,7 @@ | |||
|
|||
|
|||
@pytest.mark.slow | |||
@pytest.mark.serial |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.serial |
Try reenabling parallel tests once tests are fixed.
Needs approval to run the refactored tests. |
Description
Ressurects #4197. Closes #4196.
The purpose of this PR is to make it easier to split packages into multiple outputs using glob expressions. This is accomplished in two ways:
lib/**/libfoo*
, but notlib/**/*.a
include/**/*
without pulling in the headers of other packages. It is already the behavior of a single-output recipe to ignore files added to the prefix by host dependencies, so I'm not sure why this feature didn't make it into multi-output recipes.These new behaviors are only used if the include/exclude keywords are used under the files key. The previous behavior is retained.
Checklist - did you ...
news
directory (using the template) for the next release's release notes?