Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building only dependencies fragile wrt symlinking strategy #385

Open
j-baker opened this issue Sep 15, 2023 · 9 comments
Open

Building only dependencies fragile wrt symlinking strategy #385

j-baker opened this issue Sep 15, 2023 · 9 comments

Comments

@j-baker
Copy link
Contributor

j-baker commented Sep 15, 2023

Hi! Recently crane changed to symlinking rlib and rmeta files when reading dependencies. For the random reader, Crane generates the dependencies in one nix build, and then uses them in a later one. Effectively, you generate your cargo /target dir in one build step, and then you 'put it into place' in the next build step.

This has many practical advantages.

Obviously, this is awesome - way less time spent copying voluminous object files. Unfortunately, it leads to a common failure mode - if a dependency is out of date somehow and cargo decides it needs to rebuild it, this rebuild will fail - it tries to write to the target of the symlink and this fails, as it's a nix store path.

I've seen this caused for 2 reasons:

  1. Dodgy build.rs. I found that the onnxruntime crate emitted a 'rerun-if-changed' on its own output - thereby causing the buildscript to be re-run every time. This is now fixed on their develop. I found another crate to emit a rerun-if-changed on a file that no longer exists.
  2. A differing build environment. I've found this to occur for two reasons. Firstly, I attempted to make a dependency available in tests only (a test in my project relied on being able to shell out to procps binaries. This exploded. The more insidious case that I'm currently dealing with is that the OpenCV crate rebuilds depending on CMAKE_PREFIX_PATH not having changed. Somehow, it is changing between the root build and the 'clippy' build. The only difference I can tell is that clippy is added to the nativeBuildInputs. I'm wondering if clippy itself causes the cmake prefix path to change. Either way, build explodes.

This can be worked around by setting doNotLinkInheritedArtifacts = true in most cases, but not all. In particular, I struggled to get a build which used cxx to build, because cxx created absolute symlinks which were dangling post copy into the Nix store, meaning that cp itself then fails. This is now fixed on master of that project.

The takeaway for me is that this change makes Crane significantly more brittle when building Rust crates, and the error which is emitted can be rather hard to debug.

A sample error would be:

error: output file /private/tmp/nix-build-retina-clippy-0.0.0.drv-0/source/target/release/deps/libopencv-6a1a900b840d12b6.rmeta is not writeable -- check its permissions

which if you know how crane works, makes sense - but it's not very introspectable for abstract random dev joining the project, because the symlinking behaviour isn't that obvious. I'm trying to make it so that new engineers don't need to know how crane works in order to build their code - rather, they just need to know that it does.

I've got a few proposals for ways out:

The opencv issue

Here, crane is leading to a diff in CMAKE_PREFIX_PATH, between building deps and other tests. This genuinely just seems like a bug in crane - arguably Crane should be ensuring that insofar as the env can lead to cargo changes, it should be compatible between all builds. In my head, this is likely being caused by Crane amending the nativeBuildInputs being passed in. I'm wondering if it might be more robust to add all the relevant inputs to nativeBuildInputs every time, rather than amending to the minimal set. However, it's possible I've misdiagnosed this specific issue - the person I'm working with has not yet provided me with the CMAKE_PREFIX_PATH of the 'build deps' and 'run clippy' tests.

The more general issues

  1. On supported filesystems, Crane could prefer the 'reflink' functionality used by the filesystem. This clones the file but points to the same data locks on disk as a copy-on-write clone of the data. Supported filesystems are xfs, btrfs, and in theory, APFS (although it does not work on my Mac machine, I'm not sure why at present - would need to build coreutils outside of a nix environment to properly validate whether this is a Nix or GNU issue). This work would likely represent checking whether some test file can be copied with cp --reflink=always file1 file2 between the two directories, and upon success, doing cp -r --reflink=always as opposed to the symlinking. A further blocker to this might be that Nix builds in temp directories on a different filesystem to the Nix store - I am not sure about the specifics. I believe this would likely be ok on btrfs but may prove more of an issue on xfs. In any case, an optimisation rather than a total fix. However, if you could make this work on Linux, it might prove better on Mac.
  2. Use unionfs-fuse in order to mount the old target and new target as a union rather than as the root. This would probably be even better from a performance perspective as it leads to less directory traversal (of which much is likely redundant). However, fuse-unionfs is presently marked as broken on MacOS, so like 1, this would be a partial solution. That's not to say it can't be easily fixed however. It's likely that this would want to do some testing wrt linux versions. This would likely also require adding /dev/fuse to the nix allowlist. So possibly not viable: https://github.com/NixOS/nix/blob/2a52ec4e928c254338a612a6b40355512298ef38/src/libstore/build/local-derivation-goal.cc#L1782
  3. Better debug information. Could grep for such an error in cargo output and print specific solution information. Perhaps putting this in the docs would be helpful. One other thing might be to have a sample output which just tests that rebuilds are 'safe' in this model
  4. Modify Cargo to handle this case. Cargo could check if the file is writeable, and if not, and it's also a symlink, it could remove the symlink and overwrite that way. I will file a similar issue on Cargo, but have less faith on them being amenable to this because it's a Crane workflow. Alternatively, if the file needs rewriting, it's not quite as 'content addressible' as it should be, yes?
  5. Not do this optimisation as 'too error prone', or at least make it be not the default.
  6. It's possible that I'm also just missing something and e.g. the symlinks are somehow themselves being written as non-writeable.

Just spitballing here - don't have any particular agenda - just have personally found this confusing - and as the person doing most of the 'new nix stuff' at my company, this leads to others finding this way more confusing.

@dpc
Copy link
Contributor

dpc commented Sep 15, 2023

AFAICT all the symlinking stuff is not worth pursuing, and taring the target dir (use-zstd) is best, including being faster, especially on non-trivial projects. I haven't redone the comparison after symlink changes, but I'm sceptical that they can overcome these problems.

My intuition tells me, that ff it was possible to implement target deduplication via extracing layered archives things would really work well.

I've been responsible for CI for a large Rust projest, that builds > 10GB of ./target and taring and untaring ./target is not even noticeable between all the actual heavy lifting (mostly cargo compiling).

@j-baker
Copy link
Contributor Author

j-baker commented Sep 15, 2023

Hi! It might be that we’re compiling different kinds of Rust projects. In the ones that I work with, there is perhaps a relatively large dependency tree (think 400 crates) but relatively small amount of project code (say, 10-20k lines). Think, small crud app which has a grpc api, accesses a database, does some serde and uses some async.

What this means is that compiling the dependencies takes quite a bit of time if you’re doing it from scratch, while the incremental cost can be pretty good. Put simply, without Crane the typical dev recompile flow is measured in the single digit seconds.

My users are comparing Crane usage to raw Cargo, so the expectation on ‘I have made a small change’ is a relatively cheap build. Obviously, the normal dev cycle is for people to run ‘cargo test’ or similar, but my overall aim is that people should want to run ‘nix flake check’ as their last step before pushing, and that it should typically run quickly.

The cost of zstd-encoding and copying the output is considerable even on small projects, especially compared to the incremental cost of compilation in the typical cargo workflow.

It also leads to considerable storage bloat - consider the crane example which runs nextest, doc, build, and clippy simultaneously. With the symlinking strategy, if there are 2GB of compiled artifacts, the build can be made to use about 2GB of storage. With the untarring strategy, it will use 8GB.

This is all to say - as a comparison to recompiling, zstd and tarring is obviously better - but as a comparison to iterative cargo compilation it is perhaps weaker.

Your comment does however give me an idea - I’d wonder if a tar of symlinks might represent the best of both worlds in terms of performance? It would likely be much faster to extract than traversing the filesystem due to sequential disk access and reduced cost of forking.

@dpc
Copy link
Contributor

dpc commented Sep 15, 2023

My users are comparing Crane usage to raw Cargo, so the expectation on ‘I have made a small change’ is a relatively cheap build.

Your users should keep using raw cargo in a dev shell during dev work anyway. They don't have to compare with raw cargo, because they will be using raw cargo.

Having devs run nix commands during dev work is one of the biggest mistakes one can make when using Nix flakes.

my overall aim is that people should want to run ‘nix flake check’ as their last step before pushing, and that it should typically run quickly.

That's just a waste of time and disk space. My users do just lint or just final-check none of which invokes nix, but only uses the inputs provided by the dev shell to do what needs to be done. Since barely anyone ever touches Nix code, if things work in a dev shell, 99.9% they will work in Nix.

With the untarring strategy, it will use 8GB.

With taring (and most importantly zstd) 12GB of artifacts compresses to like 2GB target.zstd.tar. The build directory itself is being discarded after Nix derivation is compiled so it largely doesn't matter how much space it uses when Rust is building. Compression & extraction are fast (I can barely notice them, and our project is large, and we do all sorts of things via Nix CI). What matters most is much the end artifacts take in the /nix/store, especially since download/uploading stuff to cachix takes time.

In addition if the incremental layers were implemented taring would become ever better. I'm confident that layers of incremental compressed archives is all around the best strategy.

In theory symlinking could be more performant, but in practice it's an uphill battle, where cargo will break it in weird ways, because we'll be messing with its internals that are not guaranteed to be stable, and no one tests or even cares about this use case.

@j-baker
Copy link
Contributor Author

j-baker commented Sep 15, 2023 via email

@dpc
Copy link
Contributor

dpc commented Sep 15, 2023

For example, a typical Azure host will have 100-200MB/sec of disk write throughout, meaning that a 12GB extraction will take perhaps 1 minute - not insignificant, especially problematic if

I wonder if Nix is using some kind of a ramdisk for the build (maybe it's configurable?). So the disk IO is not the 12GB of uncompressed data, but 2GB of compressed data. Things are just flying too fast, and your ballpark calculations are reasonable. This has its own problems (running out of memory).

Just did the quick check:

Screenshot from 2023-09-15 16-08-17
Screenshot from 2023-09-15 16-08-50

One screenshot is just dependencies (smaller), the other one is whole workspace extracted for the test derivation. Running on Jetbuild for Github Actions.

For one other thing - nix store optimisation will tend to ensure that
rebuilds, though time consuming, don’t take up more space in the store,
whereas tarring up results will.

Taring the difference will ensure near 0 redundancy, and I'm 100% certain it can be done in a very performant way. This is currently unimplemented, and even with the redundancy, it's been working great.

I don't deny that symlink approach would be faster, especially in theory. But as someone that have spent days debugging all sorts of issues in our Nix CI pipeline for various reasons (most my mistakes, but some were crane bugs, some just general complexity), and at least 8 hours recently on mysterious rebuilds in newer versions of Rust toolchain, I value the brutal simplicity and robustness of taring over marginal performance improvements. Not to mention that even right now the symlinking is partial, IIRC, because reasons.

I would be happy to have a perfectly working "tar the symlinks" model as it seems perfect. I would use it, but I guarantee it will break sometimes in mysterious ways, because cargo does not expect to have files messed with, and then crate build scripts are often doing crazy things. So even then I'd rather have "incremental tars + copy" model that I advocate to fall back on, because I'm reasonably sure it is good enough and will always work because it is invisible to cargo.

@j-baker
Copy link
Contributor Author

j-baker commented Sep 16, 2023

I believe what you're observing is GNU tar's behaviour where it doesn't fsync when extracting. Effectively, this means that you have a ramdisk so long as you have free memory - but when you run out, it'll slow down to the speed of your disk. Linking back to the point in my previous comment, the workflow you are describing will work great so long as resources are good (lots of ram, fast disk) but drop either or both of those (due to e.g. being on some colo box) and the effects are outsized.

I've had another go at implementing symlinking based on my comments from last night, and it appears to work well - much faster than the present strategy and robust to crate rebuilds.

PR is at #386! would appreciate if you could give it a shot to see what performance is like for you! The sense I get is that it uses rustc in a way that's robust to changes - it tells rustc to emit data into another directory and overwrites symlinks, rather than relying on rustc doing the right thing.

@dpc
Copy link
Contributor

dpc commented Sep 17, 2023

@j-baker I commented on your PR. It's probably an improvement, even if not in my favorite approach. :D

BTW. It occurred to me that one reason why zstd compresses so well is because there is tons of redundancy between files in ./target. These are the storage savings that symlinking will not get. It might pay for itself just in this way, at least as long as cargo itself does not compress better.

Which reminded me that in your approach you can only symlink cargo artifacts, while incremental deduplication could dedup anything. In our project we (very unfortunately) have quite a few C dependencies and I wander how much do cargo check and cargo doc produce (vs rustc).

I believe what you're observing is GNU tar's behaviour where it doesn't fsync when extracting. Effectively, this means that you have a ramdisk so long as you have free memory - but when you run out, it'll slow down to the speed of your disk. Linking back to the point in my previous comment, the workflow you are describing will work great so long as resources are good (lots of ram, fast disk) but drop either or both of those (due to e.g. being on some colo box) and the effects are outsized.

I think you're right here, but also I don't think this is going to be an issue. The system you're building needs resources corresponding to the project size, and most of these files will be read over and over during the build, so the fact that they are warm in the memory is a plus. If you want a fast CI, you want your ./target to fit in RAM one way or another.

@dpc
Copy link
Contributor

dpc commented Sep 17, 2023

In

the fact that they are warm in the memory is a plus. If you want a fast CI, you want your ./target to fit in RAM one way or another.

In a way extracing zstd might have saved time over symlinking again, by just reading everything into in-memory disk cache with one big sequential read, while doing only 20% of the IO.

@dpc
Copy link
Contributor

dpc commented Sep 17, 2023

@j-baker @ipetkov

Since I tried the improved symlinking in the current revision I'd like to post some metrics I can already capture:

Sun, 17 Sep 2023 23:28:07 GMT fedimint-deps> installing target to /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> /build/tmp.7gTbCmcFrk /build/source

So copying just the dependencies takes 6s (vs 4s from the zstd version)

But most importantly you see the wall of text in a post fixup phase:

Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> post-installation fixup
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking RPATHs of ELF executables and libraries in /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/fedimintd
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/devimint

...

Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/zstd-sys-946f5b742f816cdf/out/zstd/lib/legacy/zstd_v03.o
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> patchelf: wrong ELF type
Sun, 17 Sep 2023 23:28:13 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/zstd-sys-946f5b742f816cdf/out/zstd/lib/legacy/zstd_v05.o

...

Sun, 17 Sep 2023 23:28:14 GMT fedimint-deps> patchelf: wrong ELF type
Sun, 17 Sep 2023 23:28:14 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/prost-build-56ba7b5010ed83ff/build_script_build-56ba7b5010ed83ff
Sun, 17 Sep 2023 23:28:14 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/prost-build-56ba7b5010ed83ff/build-script-build
Sun, 17 Sep 2023 23:28:14 GMT fedimint-deps> shrinking /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target/ci/build/tiny-keccak-df2eda386b4069b4/build_script_build-df2eda386b4069b4

...

Sun, 17 Sep 2023 23:28:22 GMT fedimint-deps> checking for references to /build/ in /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1...
Sun, 17 Sep 2023 23:28:22 GMT fedimint-deps> patchelf: wrong ELF type
Sun, 17 Sep 2023 23:28:22 GMT fedimint-deps> patchelf: wrong ELF type

...

Sun, 17 Sep 2023 23:28:23 GMT fedimint-deps> patchelf: wrong ELF type
Sun, 17 Sep 2023 23:28:31 GMT fedimint-deps> patching script interpreter paths in /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1

That's an extra 10s (or 18s, I'm not sure if this stuff before last patching script interpreter should be counted) that did not exist at all in zstd approach. I'm reasonably sure it can be avoided, and it should be. I reported it a while ago, but didn't dwell on it, because I'm not using use-symlink myself.

Further more the disk space use is much, much higher:

> du -cksh /nix/store/8ddgq21p04f73hjibnz43ynf1q2ljnkv-fedimint-deps-0.0.1/target
3.8G    /nix/store/8ddgq21p04f73hjibnz43ynf1q2ljnkv-fedimint-deps-0.0.1/target
3.8G    total

3.8G vs 818M with zstd. And that's only the dependencies!

Now when the workspace inheriting deps-only artifacts:

Sun, 17 Sep 2023 23:28:36 GMT fedimint-workspace-build> copying cargo artifacts from /nix/store/mxgjp66jw2iw8ang8dhi77ri6i4rx46m-fedimint-deps-0.0.1/target to target
Sun, 17 Sep 2023 23:28:39 GMT fedimint-workspace-build> configuring

3s vs 4s with zstd.

So far no real speedup on inheriting, and slower savings, larger disk usage. (and we ignore the overhead of rustc wrapper).

Then the current version fails, but I don't think these metrics will change, so there really need to be some sweet gains in the second stage to make up for this losses in the first one.

linyinfeng added a commit to linyinfeng/ace-bot that referenced this issue Oct 1, 2023
linyinfeng added a commit to linyinfeng/oranc that referenced this issue Oct 1, 2023
bryango added a commit to bryango/system-manager that referenced this issue Oct 2, 2023
r-vdp added a commit to numtide/system-manager that referenced this issue Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants