Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symlinking of crates causes issues with macos sandbox #482

Open
j-baker opened this issue Dec 14, 2023 · 2 comments · May be fixed by #489
Open

symlinking of crates causes issues with macos sandbox #482

j-baker opened this issue Dec 14, 2023 · 2 comments · May be fixed by #489

Comments

@j-baker
Copy link
Contributor

j-baker commented Dec 14, 2023

Hi!

I build on MacOS with the Nix sandbox enabled. This is because I run a MacOS build worker which pushes into a company-shared cache; I want to isolate builds so as to make it as hard as possible for one malicious user to poison the cache.

The MacOS sandbox definition that Nix uses contains all the store paths, and has a relatively low max size which clearly is somewhere in the 500-800 nix store paths region.

I have a project with around 700 crate dependencies. Due to the cargo vendor process symlinking, this means that there are > num dependency crates store paths in any cargo build derivation, which places a limit on the number of crates one can depend on.

I'm wondering if this project would consider copying crates instead of symlinking? Happy to make an MR which makes the change.

The downside would be slightly greater disk usage, the benefit would be that bigger projects can be built on Mac with sandboxing!

@j-baker j-baker changed the title symlinking of crates causes issues with macos sandbox! symlinking of crates causes issues with macos sandbox Dec 14, 2023
@ipetkov
Copy link
Owner

ipetkov commented Dec 16, 2023

Hi @j-baker thanks for the report!

As of today, downloading and unpacking crates happens in two derivations which is undoubtedly contributing to the increase in total derivation count. Sadly we cannot fold these into a single step because unpacking the tarball would result in a different hash (i.e. not the one in Cargo.lock, so we wouldn't be able to have it up front).

One thing we could do, is change vendorCargoDeps to take some kind of deferUnpack argument which would only download the tarball from the registry and build up some kind of manifest which maps the source to the crate name/version. Then we could do the expansion in the configureCargoVendoredDepsHook if present!


Possible workaround ideas in the meantime:

  • Use cargo vendor to download all dependency sources, then set cargoVendorDir = ./vendor; on the derivation. The downside is you'll need to remember to rerun this step whenever Cargo.lock changes
  • Alternatively you could use a fixed-output derivation (similar to what rustPlatform.buildRustPackage does) which itself calls cargo vendor and not have to commit/stage the results. The downside is you'll need to manually update a hash for it any time Cargo.lock changes

@j-baker
Copy link
Contributor Author

j-baker commented Dec 16, 2023

Hi, thanks for the reply. I realised I was a little unclear in my previous message. Here is what my understanding is.

In step 1, Crane downloads crates. Each crate gets a derivation. The input store paths for this derivation are ~=

input: [] + binary dependencies
output: dep1
transitive dependencies: []

In step 2, Crane extracts these crates:

input: [ dep1 ] + binary dependencies (tar etc).
output: dep1_extracted
transitive dependencies: [] (we have extracted the tar - there are no references to nix store paths in the output).

In step 3, Crane 'vendors' these crates per registry.

input: [dep1, dep2, ..., depN]
output: [ln -s dep1 crates/dep1, ln -s dep2 crates/dep2, ..., ln -s depN crates/depN] (as registry1)
transitive dependencies: [dep1, dep2, ..., depN] (we have symlinks to those paths, Nix picks up on this).

In step 4, Crane builds the inputs cargo dir.

input [registry1, registry2, ..., registryN]
output [ditto step 3]
transitive dependencies: [registry1, registry2, ..., registryN, dep1, dep2, ..., depN, etc]

And from this point on, actual cargo commands run.

When the sandbox for each build is built, it is granted access to all transitive dependencies, as these are the totality of what might be depended on. On Linux this would refer to bind mounting the paths into the sandbox.

The two phase download&extract I don't believe contributes to the problem, because there is no transitive dependency passed on.

The problem I believe I'm facing is with sufficient input crates from a registry, the number of store paths that the output depends on becomes much too large for MacOS.

One brute force 'fix' to this problem is j-baker@2087e8b. This is not cost free - it converts symlinking of directories into a directory traversal, but it is a oneliner, so it has that going for it. This only works because I while my total sandbox size is too large, the sandbox size contributed by any single registry is not over the size limit for me, right now.

There are many levels of sophistication one could apply to reduce the likelihood of hitting this problem without adding runtime cost, however many of them likely lead to unnecessary complexity.

My sense however is that one workable solution (that'd be probably a few lines of code on top of what currently exists) is:

  1. Partition crates into some fixed number of components (e.g. 256). Using a hash function for the partition would ensure that e.g. updating a single crate would only change a single bucket, but this is not a hard requirement - the main point is to bound the number of crates.
  2. Extract crates together into a shared output directory, extracting and not symlinking. Symlink from there.

This would therefore kind of combine the extraction and vendoring steps. It would reduce the number of inputs to any one derivation.

j-baker added a commit to j-baker/crane that referenced this issue Dec 29, 2023
Fixes ipetkov#482.

MacOS has trouble with derivations which (directly or transitively)
have many buildInputs.

Crane at present creates a build structure in which a given cargo
command will transitively depend on numCrates nix store paths. This
means that Crane fails to build projects with over about 600 crate
dependencies on MacOS if the sandbox is enabled.

This MR utilises a tiering approach to improve this.

Each registry is assigned to a shard based on the hash of the crate
name. If there are <32 crates in a registry there is one shard, if
<2048 there are 16 shards, otherwise 256. Crates are directly extracted
into these shard derivations rather than symlinking.

What this means is:

1. Crane will not create a vendoring derivation with many inputs
   unless a project has a truly crazy number of dependencies.
1. No downstream cargo derivation will have many inputs either.
@j-baker j-baker linked a pull request Dec 29, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants