Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg_files(strip_prefix) behavior is confusing w.r.t. flattening a tree of files #354

Open
aiuto opened this issue May 13, 2021 · 13 comments · May be fixed by #656
Open

pkg_files(strip_prefix) behavior is confusing w.r.t. flattening a tree of files #354

aiuto opened this issue May 13, 2021 · 13 comments · May be fixed by #656
Assignees
Labels
breaking change P1 An issue that must be resolved. Must have an assignee
Milestone

Comments

@aiuto
Copy link
Collaborator

aiuto commented May 13, 2021

Example file tree

BUILD
docs/BUILD
docs/index.md
docs/reference/index.md

docs/BUILD

filegroup(name = "docs", srcs = glob(["**/*"]))

BUILD

pkg_files(
    name = "share_doc",
    srcs = [
        "//docs",
    ],
    strip_prefix = strip_prefix.from_pkg(),
    prefix = "usr/share/doc/foo",
)

This gives the output I expect - that the directory tree is preserved.

usr/share/doc/foo/index.md
usr/share/doc/foo/reference/
usr/share/doc/foo/reference/index.md

But.... if I do not have strip_prefix, pkg_files flattens the tree and I get the error. Error in fail: After renames, multiple sources (at least docs/index.md, docs/reference/index.md) map to the same destination. Consider adjusting strip_prefix and/or renames
This might be a carry over from legacy behavior, but it is strange and confusing.

Proposal

  • pkg_files adds an attribute squash_tree bool, default=False which controls squashing the tree.
  • strip prefix and any name filtering are applied first, then the results are squashed if squash_tree=False.
@aiuto aiuto assigned nacl and aiuto May 13, 2021
@nacl
Copy link
Collaborator

nacl commented May 17, 2021

This is indeed derived from legacy behavior. See strip_prefix in #36. pkg_tar.

I'm not fundamentally opposed to this idea, but my (selfish) counterpoint to this is that many source trees do have any reasonable mapping between where they are located in the source tree and where they end up being installed/packaged. As proposed, this would require the propagation of new attributes in this (relatively common) case.

In any case, I can see why this would be confusing and impede initial use of the framework. I'd call it a quirk more than anything else, but that could just be me having written and watched the framework used internally for a while. So far, nobody has complained about this, although, that isn't necessarily a sign that it's the best option.

To your points:

pkg_files adds an attribute squash_tree bool, default=False which controls squashing the tree.

I'd rather call this attribute flatten_tree or something involving the word "flatten" (flatten_files?)

strip prefix and any name filtering are applied first, then the results are squashed if squash_tree=False.

I would instead treat strip_prefix as mutually exclusive with any flatten_files. The behavior with no explicit strip_prefix and no flatten_files would be to emit files relative to the current bazel package. strip_prefix would no longer support files_only.

Thoughts?

@aiuto
Copy link
Collaborator Author

aiuto commented May 27, 2021

This is indeed derived from legacy behavior. See strip_prefix in #36. pkg_tar.

I'm not fundamentally opposed to this idea, but my (selfish) counterpoint to this is that many source trees do have any reasonable mapping between where they are located in the source tree and where they end up being installed/packaged. As proposed, this would require the propagation of new attributes in this (relatively common) case.

Which way to do you mean that? "many source trees to have a reasonable" or "do not have any reasonable". I think the former based on the rest of the text. I'm sort of surprised. We always have paths like "//websearch/some_product/iphone/.../the/app/structure". For distributing the app, we almost always use strip_prefix from the root through the BUILD file.

OR... are we talking about different things? That you mean the layout of files under the top level of the application and I am talking about the path to the application?

In any case, I can see why this would be confusing and impede initial use of the framework. I'd call it a quirk more than anything else, but that could just be me having written and watched the framework used internally for a while. So far, nobody has complained about this, although, that isn't necessarily a sign that it's the best option.

To your points:

pkg_files adds an attribute squash_tree bool, default=False which controls squashing the tree.

I'd rather call this attribute flatten_tree or something involving the word "flatten" (flatten_files?)

That sounds good to me. I may send a PR to add just that soon.

strip prefix and any name filtering are applied first, then the results are squashed if squash_tree=False.

I would instead treat strip_prefix as mutually exclusive with any flatten_files. The behavior with no explicit strip_prefix and no flatten_files would be to emit files relative to the current bazel package. strip_prefix would no longer support files_only.

Why should they be mutually exclusive? They operate on different pieces of the path. If you had
//a/b/product/stuff/.../.html, then strip_prefix.from_root("a/b") + flatten_files=True, would be able to produce "product/.html

@nacl
Copy link
Collaborator

nacl commented May 27, 2021

Which way to do you mean that? "many source trees to have a reasonable" or "do not have any reasonable". I think the former based on the rest of the text. I'm sort of surprised. We always have paths like "//websearch/some_product/iphone/.../the/app/structure". For distributing the app, we almost always use strip_prefix from the root through the BUILD file.

OR... are we talking about different things? That you mean the layout of files under the top level of the application and I am talking about the path to the application?

I meant to say "do not have any reasonable mapping". Bah.

I guess I'm referring to the layout of files under the top level of the application, although both can be a problem at times.

The particular application I am supporting in our monorepo installs various binaries to (mostly) self-contained locations. While many of the files used in the applications are present in under a particular prefix in the repository (e.g. PREFIX = //foo/bar/$APP), some are not, and there is no consistent mapping between where they are found and where they end up being installed. For example $PREFIX/db/foo.sql may be installed to /usr/local/$app/etc/database/foo.sql, or even /etc/database/foo.sql depending on the application.

Because of this, we often strip everything and name the precise locations where everything ends up living. I can imagine some open source projects having a similar problem -- directory structures in the source tree can be somewhat arbitrary there as well.

Why should they be mutually exclusive? They operate on different pieces of the path. If you had
//a/b/product/stuff/.../.html, then strip_prefix.from_root("a/b") + flatten_files=True, would be able to produce "product/.html

I see. I wasn't sure where the squashing/flattening was intended to begin; I assumed it was relative to the root.

With all that in mind, I feel that the precise behavior needs to be specified a little more. In particular, how will flatten_files (or whatever it's to be called) interact with the various modes of strip_prefix? It looks like it's only compatible with strip_prefix.from_root() at a glance.

@aiuto
Copy link
Collaborator Author

aiuto commented Jun 8, 2021

From offline discussion today, I think we agreed on

  • strip_prefix.from_root(x) is not needed. Paths should always strip from root to the package.
  • strip_prefix.from_pkg(x) is useful.
  • flatten and strip_prefix don't make sense together.

@aiuto aiuto added the P1 An issue that must be resolved. Must have an assignee label Jun 9, 2021
@aiuto aiuto added this to the 1.0 milestone Jun 9, 2021
@nacl
Copy link
Collaborator

nacl commented Jul 14, 2021

After some time pondering this: some use cases for strip_prefix.from_root() popped up:

  1. Creation of source archives from repository checkouts.
  2. Exporting files from a tree, preserving directory structure.

In these cases, the repository-relative path is helpful, otherwise, the repository paths paths need to be encoded in the BUILD files as prefixes, which can be inconvenient.

@nacl
Copy link
Collaborator

nacl commented Dec 2, 2021

Even if this is breaking behavior, we could theoretically mitigate this by providing a script that does the transformation for users.

If, for example, the default is simply switched from flattening to preserving from-package prefix roots, one could probably use buildozer or libcst to swap introduce and/or remove the attributes as needed.

@mzeren-vmw
Copy link
Contributor

I think preserving hierarchy without strip_prefix = strip_prefix.from_pkg() is the right default. We should consider taking a breaking change with buildifier assisted migration.

@aiuto
Copy link
Collaborator Author

aiuto commented Dec 3, 2021 via email

@nacl
Copy link
Collaborator

nacl commented Dec 3, 2021

Hm. Interesting.

Immediate thoughts:

  • files always have the path from root to package stripped.

Agreed. At the very least, this should be the default to reduce user surprise.

  • if you don't want that, there is the prefix attribute. To make it easy
    to bring back the package name, we could use a magic
    constant prefix.this_package().

I like the idea, but we would need a way to manipulate the repository path/package paths in a clean way.

This has the potential for making modifications from buildozer harder than necessary. I suppose I already broke this assumption in the origins of the pkg_filegroup framework, so I don't feel super strongly about this. Arguably, this could be considered a defect in buildozer too.

I guess another thought is that I like the idea of strip_prefix affecting the repository-local prefix, and prefix affecting the installation prefix -- there's a clear dichotomy, even though the names are confusing. We should strive to have something similar here. If we simply choose the path of a different default, we should also consider trying to find a better name than strip_prefix

  • then strip_prefix is always from the current directory down, so generally
    you won't use it.

I'm not fundamentally opposed to this idea (the "current directory" being the current package), but we should be as consistent as possible.

  • and we could have flatten as another attribute (to bring back the old
    behavior)
    OR strip_prefix could be strip_prefix.flatten().

I don't think there is any case for stripping the prefix from the current
path and flattening together. Flatten always wins.

strip_prefix.flatten() seems better to me. Less interactions to have to worry about.

@nacl
Copy link
Collaborator

nacl commented Dec 3, 2021

Maybe strip_prefix should be strip_local_prefix (like cc_library's strip_include_prefix)? Would that be more clear?

@nacl
Copy link
Collaborator

nacl commented Dec 6, 2021

We discussed this in our public meeting today. To summarize:

  • Default behavior change: preserve paths in srcs (breaking change but it makes the behavior the one of least surprise)
  • srcs_strip_prefix: replaces strip_prefix. Name is clearer in that it only strips from the srcs, not the rest
  • strip_prefix.flatten(): replaces strip_prefix.files_only(): will reduce glob(["**/*.txt"]) to a single level

Side note: we don't need the ability to manipulate the package path (strip_prefix.from_root()) because you can do it with package_name(). replace. E.g.

genrule(
   name = "a",
   cmd = "echo %s >$@" % package_name().replace('/src/', '/source/'),
   outs = ["x"],
)

I will likely have time to address the core parts of this issue (the non-from_root() stuff) later this month (December, 2021).

@nacl
Copy link
Collaborator

nacl commented Dec 6, 2021

Deliverables:

  • Code changes
  • Migration scriptage (probably using libcst)

@sitaktif
Copy link
Contributor

sitaktif commented Jan 5, 2024

I just opened #805 to document a surprising behavior whereby strip_prefix.files_only() and strip_prefix.from_pkg() have opposite effects in pkg_files and pkg_tar.

It feels like it might be in scope for this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change P1 An issue that must be resolved. Must have an assignee
Projects
None yet
4 participants