Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEW: Better package splitting for multi-output recipes with negative glob in outputs/files #5216

Open
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

carterbox
Copy link

@carterbox carterbox commented Mar 6, 2024

Description

Ressurects #4197. Closes #4196.

The purpose of this PR is to make it easier to split packages into multiple outputs using glob expressions. This is accomplished in two ways:

  1. A negative pattern match is provided. This makes it easier to use fewer glob expressions if you need to include an entire directory tree except for a single file (type). For example, you might want to glob lib/**/libfoo*, but not lib/**/*.a
  2. Only the files installed to the PREFIX in the top-level build are considered. This removes the need to craft your glob expressions to avoid the artifacts installed to the PREFIX by host dependencies. For example, you can now glob include/**/* without pulling in the headers of other packages. It is already the behavior of a single-output recipe to ignore files added to the prefix by host dependencies, so I'm not sure why this feature didn't make it into multi-output recipes.

These new behaviors are only used if the include/exclude keywords are used under the files key. The previous behavior is retained.

outputs:
  - name: foo
    files:
      - bin/*  # old behavior; includes artifacts from other packages
  - name: bar
    files:
      include:
        - bin/*  # new behavior; only matches artifacts from this recipe
      exclude:  # optional
        - bin/*.exe # new behavior
  - name: zee
    script: install.py  # old behavior

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

@conda-bot
Copy link
Contributor

We require contributors to sign our Contributor License Agreement and we don't have one on file for @carterbox.

In order for us to review and merge your code, please e-sign the Contributor License Agreement PDF. We then need to manually verify your signature, merge the PR (conda/infrastructure#891), and ping the bot to refresh the PR.

@carterbox carterbox marked this pull request as ready for review March 8, 2024 19:21
@carterbox carterbox requested a review from a team as a code owner March 8, 2024 19:21
@carterbox carterbox changed the title NEW: Negative matching 'files' for outputs NEW: Better package splitting for multi-output recipes with negative glob in outputs/files Mar 8, 2024
@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Mar 20, 2024
Copy link

codspeed-hq bot commented Mar 20, 2024

CodSpeed Performance Report

Merging #5216 will not alter performance

Comparing carterbox:files-exclude (4f988f9) with main (3f9346f)

Summary

✅ 3 untouched benchmarks

@carterbox
Copy link
Author

I'm not sure that the failure for linux 3.8 23.5.0 serial is related to this PR.

----------------------------- Captured stderr call -----------------------------
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/test/lib/python3.8/site-packages/conda/exception_handler.py", line 16, in __call__
  File "/usr/share/miniconda/envs/test/lib/python3.8/site-packages/conda/cli/main.py", line 66, in main_subshell
  File "/usr/share/miniconda/envs/test/lib/python3.8/site-packages/conda/cli/conda_argparse.py", line 31, in <module>
  File "/usr/share/miniconda/envs/test/lib/python3.8/site-packages/conda/base/context.py", line 24, in <module>
ModuleNotFoundError: No module named 'conda._vendor.appdirs'

Copy link
Contributor

@jaimergp jaimergp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. The PR is super clear and the documentation is on-point. I only added a couple comments to clarify my own doubts. The new positional argument could be considered API breaking, so let's see how the rest of the team feels about it. I don't have strong feelings, just applying an abundance of caution Just In Case ✨ .

Once we have discussed that, I'll be super happy to approve. 👍

metadata: MetaData,
env,
stats,
new_prefix_files: set[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is adding a new positional argument to a (let's assume) public signature, what about making it a None kwarg. This way downstream users won't run into number of arguments errors:

Suggested change
new_prefix_files: set[str],
new_prefix_files: set[str] = None,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good point; it is API breaking assuming these functions are part of the public API. I need to mark the new argument as typing.Optional and/or set appropriate defaults.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conda_build/build.py Show resolved Hide resolved
metadata: MetaData,
env,
stats,
new_prefix_files: set[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about the API breakage. Not sure how prominent the usage of wheel outputs is, though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1299,9 +1299,10 @@ You can specify files to be included in the package in 1 of
Explicit file lists are relative paths from the root of the
build prefix. Explicit file lists support glob expressions.
Directory names are also supported, and they recursively include
contents.
contents. Files installed to the prefix by host dependencies will
be matched by glob expressions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe even highlight this further with an admonition or something similar. I wasn't aware of that design issue!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with an admonition or something similar.

I don't know what you mean. You want stronger language? For example: "WARNING! Files installed to the prefix by host dependencies also will be matched by glob expressions."

I wasn't aware of that design issue!

Yeah, I'm not sure whether this is a bug/oversight or a feature. It's definitely unexpected to me. The most active contributor in this section is @msarahan; maybe they know?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant an admonition in the RST sense. Triple-backtick + warning kind of syntax. Equivalent to this:

Warning

Files installed to the prefix by host dependencies also will be matched by glob expressions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 642906b

news/files-exclude.rst Outdated Show resolved Hide resolved
Comment on lines 11 to 17
files:
- subpackage_file1
- somedir
- "*.ext"
include:
- subpackage_file1
- somedir
- "*.ext"
exclude:
- "*3.ext"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we replace the old test with a new one instead of adding a new case? Maybe I would add a new output in this meta.yaml and then test the different behaviors differently, if possible (e.g. how the list-of-str behaviour does copy anything in PREFIX, but include/exclude doesn't?).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the test recipe to include multiple outputs: one uses the old file matching and one uses the new file matching.

Comment on lines 1315 to 1318
Files can be excluded by specifying `files` as a dictionary separating
files to `include` from those to `exclude`. Files installed to the prefix
by host dependencies are automatically excluded when the include/exclude
syntax is used:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention the precedence here, i.e., exclude entries have higher priority as per the implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mentions the new behavior, i.e., "Files [...] excluded when the include/exclude syntax is used".
But since this is a deviation from the previous "include" logic (which would be more of a force-include since it allows packaging pre-existing files.), I think we should make the difference more prevalent/explain the behavior for the old non-dict case more explicitly.

Copy link
Author

@carterbox carterbox Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention the precedence here, i.e., exclude entries have higher priority as per the implementation.

exclude has a higher priority than which operation? It's unclear to me what ambiguity we would be removing by mentioning priority.

This mentions the new behavior, i.e., "Files [...] excluded when the include/exclude syntax is used". But since this is a deviation from the previous "include" logic (which would be more of a force-include since it allows packaging pre-existing files.), I think we should make the difference more prevalent/explain the behavior for the old non-dict case more explicitly.

Suggested change
Files can be excluded by specifying `files` as a dictionary separating
files to `include` from those to `exclude`. Files installed to the prefix
by host dependencies are automatically excluded when the include/exclude
syntax is used:
When defining `outputs/files` as a list, any file in the prefix (including those
installed by host dependencies) matching one of the glob expressions is
included in the output. Greater control over file matching may be
achieved by defining `files` as a dictionary separating files to
`include` from those to `exclude`.
When using include/exclude, only files installed by
the current recipe are considered. i.e. files in the prefix installed
by host dependencies are not matched. include/exclude may not be used
simultaneously with glob expressions listed directly in `outputs/files`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to mention the precedence here, i.e., exclude entries have higher priority as per the implementation.

exclude has a higher priority than which operation?

The kind of precedence where if you do

  files:
    include:
      - lib/libfoo.so
    exclude:
      - lib/libfoo.so

the exclusion wins.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now explained that when files match both exclude and include, they are excluded. d1b12a1

Comment on lines 1931 to 1932
include = files.get("include", [])
exclude = files.get("exclude", [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax will be affected by the "key is set to an empty block, which means None in YAML" kind of issue. We should change it to:

Suggested change
include = files.get("include", [])
exclude = files.get("exclude", [])
include = files.get("include") or []
exclude = files.get("exclude") or []

It might also be a good opportunity to sneak @jakirkham's changes added by #4971.

Copy link
Contributor

@jaimergp jaimergp Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 1818:

- files = output.get("files", [])
+ files = output.get("files") or []

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that files.get("include") may return None, but include must be a list. So we use the falsiness of None to replace it with []. Equivalent to:

include = files['include'] if ('include' in files and files['include'] != None) else []

Copy link
Contributor

@jaimergp jaimergp Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have

- output: something
  files:
    include:
      - path
    exclude:
      - not-this-path  # [linux]

and then you'll have files["exclude"] = None on macOS, for example (linux is false). Not sure at what point we are ensuring that include is a list.

Under that circumstance files.get("exclude", []) will be None instead of the expected []. Is that what you mean? I think we are saying the same 😬

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. We are saying the same thing. I'm just trying to figure out if we should be doing something stronger than converting Falsy values into the empty list. Perhaps convert any non-list into a list? Something like:

include = files.get("include") if isinstance(files.get("include"), list) else []

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might silence errors when users populate it with a dict or something accidentally. I think we can just play it simple with falsy promotions.

@@ -2817,7 +2848,7 @@ def build(
# This is wrong, files has not been expanded at this time and could contain
# wildcards. Also well, I just do not understand this, because when this
# does contain wildcards, the files in to_remove will slip back in.
if "files" in output_d:
if "files" in output_d and not isinstance(output_d["files"], dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if "files" in output_d and not isinstance(output_d["files"], dict):
if not isinstance(output_d.get("files") or (), dict):

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this condition to:

                    if (
                        "files" in output_d
                        and output_d["files"] is not None
                        and not isinstance(output_d["files"], dict)
                    ):

Enhancements:
-------------

* implemented new include/exclude sections for glob expressions in multi-output `outputs/files`. (#4196 via #5216)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the following the template file for news items. It should be markdown. Can you update it, please? 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. https://github.com/conda/conda-build/blob/main/news/5222-deprecating-conda_interface

The headers should have ###, and the filename is extension-less.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

jaimergp
jaimergp previously approved these changes Apr 10, 2024
Copy link
Contributor

@jaimergp jaimergp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after fixing the little MD/RST nits.

jaimergp
jaimergp previously approved these changes Apr 18, 2024
@jaimergp
Copy link
Contributor

Traceback (most recent call last):
  File "/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/popen-gw0/test_subpackage_recipes_copyin0/split_packages_file_list_1713435205299/test_tmp/run_test.py", line 51, in <module>
    assert os.path.isfile(filename)
AssertionError
WARNING: Tests failed for my_script_subpackage_files-1.0-0.tar.bz2 - moving package to /private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/pytest-of-runner/pytest-0/popen-gw0/test_subpackage_recipes_copyin0/broken

:(

@carterbox
Copy link
Author

What is the difference between the parallel and serial tests? Only the macos parallel tests are failing. It looks like it only changes parameters related to pytest and codecov, not conda itself?

@jaimergp
Copy link
Contributor

Serial jobs run tests that can't be run concurrently. Parallel jobs take rhe rest and run them with two processes wt the same time. The tests need to be marked as serial with a decorator to be considered as such, so the default is parallel.

@carterbox
Copy link
Author

I think the errors are coming from the fact that the subpackage tests use the same artifacts for both outputs, so the subpackages are colliding when they are installed to the same environment?

@@ -18,6 +18,7 @@


@pytest.mark.slow
@pytest.mark.serial
Copy link
Author

@carterbox carterbox Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.serial

Try reenabling parallel tests once tests are fixed.

@carterbox
Copy link
Author

Needs approval to run the refactored tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Status: Approved
Development

Successfully merging this pull request may close these issues.

Negative matching 'files' for outputs
6 participants