Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 0171] Default name of fetchFromGithub FOD to include revision #171

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
114 changes: 114 additions & 0 deletions rfcs/0171-fetch-from-github.md
@@ -0,0 +1,114 @@
---
feature: Default name of fetchFromGithub FOD to include revision
start-date: 2024-03-15
author: Jonathan Ringer
co-authors:
shepherd-team:
shepherd-leader:
related-issues:
---

# Summary
[summary]: #summary

Updating the hash on Fixed-Output Derivations (FODs) is a very error prone process. It's not intuitive to invalidate the existing hash, attempt to realize the FOD, then replace the hash value with the value that Nix just calculated. This RFC advocates for influencing the default derivation name of the fetchFromGithub helper with the "repo" and "rev" values to ensure that changed URLs force the FOD to be re-built.

# Motivation
[motivation]: #motivation

This will hopefully provide immediate feedback that an FOD contains a stale hash. One must either build the FOD without the previous FOD in their Nix store, or run the FOD build with `--check` to verify that the FOD is not stale. Although fetchFromGithub is one of many fetchers; it is very common, and generally has a user specify granular source information which makes differentiating between sources easy.

As a secondary effect, this will also give a more meaningful name to the FODs than "source". E.g. `/nix/store/...-source -> /nix/store/...-protobuf-v24.1-src`

# Detailed design
[design]: #detailed-design

Change the default name of fetchFromGithub fetcher from `"source"` to `lib.sanitizeDerivationName "${repo}-${rev}-src"`.
Copy link

@sdht0 sdht0 Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two suggestions:

  • How about using pname instead of repo as a sort of middle ground between stability and informative names. Of course then changing the repo url will not automaticaly verify if the hash actually matches, so a tradeoff.
  • Also how about starting with src- instead of at the end? When listing the nix store, all fetched inputs can then be sorted together.

The second suggestion is actually a part of a wish I have to prepend src- to all build inputs (tars, patch files, builders, etc), pkg- to all packages, and nixos- to the generated nixos configs, so that it is easier to tell at a glance why a particular store path is being installed, and also would be useful for tools like nvd when listing diffs. Yeah this would a much broader proposal, but one can dream someday it will happen. For now, we can start with just the FOD fetchers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using pname instead of repo as a sort of middle ground between stability and informative names. Of course then changing the repo url will not automaticaly verify if the hash actually matches, so a tradeoff.

pname is a stdenv.mkDerivation convention, this would require people to pass it separately. The goal here also is to avoid hash mismatch, so I don't think is this is enough to solve the problem this RFC is trying to remedy.

Also how about starting with src- instead of at the end? When listing the nix store, all fetched inputs can then be sorted together.

I wouldn't be opposed to it, but I'm used to <pname>-<version>-<variant> naming, source being a variant in this case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second suggestion is actually a part of a wish I have to prepend src- to all build inputs (tars, patch files, builders, etc), pkg- to all packages, and nixos- to the generated nixos configs, so that it is easier to tell at a glance why a particular store path is being installed, and also would be useful for tools like nvd when listing diffs. Yeah this would a much broader proposal, but one can dream someday it will happen. For now, we can start with just the FOD fetchers.

It sounds like another Nix RFC.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to <pname>-<version>-<variant> naming, source being a variant in this case.

It's a bit strange to consider package source a "variant" of the package. Nevertheless, such name-suffixing pattern exists in Nixpkgs -- the vendor FOD assign to goModules attribute of buildGoModule has its name set as "${pname}-${version}-go-modules", and sometimes people manually set the name of in-attribute FODs this way.

On the other hand, language-specific packages (e.g. Python packages, Ocaml packages, etc.) often prefix the name with the name and version of the compiler/interpreter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not specifically tied to the suffix approach.. I'll add a comment to the initial comment


# Examples and Interactions
[examples-and-interactions]: #examples-and-interactions

```
# previous sha256 is still valid
$ nix-build -A nix-template.src --check
checking outputs of '/nix/store/ib74331l6pl6f8s2hsakf68lhg2jsl5i-nix-template-0.1.4-src.drv'...

trying https://github.com/jonringer/nix-template/archive/v0.1.4.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 130 100 130 0 0 425 0 --:--:-- --:--:-- --:--:-- 426
100 27955 0 27955 0 0 36653 0 --:--:-- --:--:-- --:--:-- 311k
unpacking source archive /build/v0.1.4.tar.gz
/nix/store/lfbgmqvpq5365q5ivv6ccck7xg88syw5-nix-template-0.1.4-src

# explicit commit hash example
$ nix-build -A artyFX.src
this derivation will be built:
/nix/store/ir4k3n5q7nmb2wh533pq1ma1cabyr8h7-openAV-ArtyFX-8c542627d9-src.drv
building '/nix/store/ir4k3n5q7nmb2wh533pq1ma1cabyr8h7-openAV-ArtyFX-8c542627d9-src.drv'...

trying https://github.com/openAVproductions/openAV-ArtyFX/archive/8c542627d936a01b1d97825e7f26a8e95633f7aa.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 173 100 173 0 0 754 0 --:--:-- --:--:-- --:--:-- 755
100 627k 0 627k 0 0 604k 0 --:--:-- 0:00:01 --:--:-- 1014k
unpacking source archive /build/8c542627d936a01b1d97825e7f26a8e95633f7aa.tar.gz
/nix/store/dkvcfm90ckrlgmv89s8sr15vcidwlxhs-openAV-ArtyFX-8c542627d9-src
```

# Drawbacks
[drawbacks]: #drawbacks

- All derivations which don't pass a "name" parameter will need to be re-realized
- This will be a download-intensive one-time cost to realize the new FOD derivations.
- NAR hash should not need to be recomputed assuming it was deterministic and not stale.
- Cache should be minimally impacted as NARs are content addressable, thus deterministic sources should not contribute to cache bloat.
- Potential for sources which are no longer available to be broken.
- These can have their name manually set to "source" to perserve previous behavior.
jonringer marked this conversation as resolved.
Show resolved Hide resolved
- Ideally source availability would be remedied with more appropriate methods. E.g. being made available.
- "Interchangeability" with other fetchers is diminished as the derivation name is different
- In practice, fetchFromGitHub is never used in this way. It is generally the only fetcher, so there is never another FOD to dedupilicate.
jonringer marked this conversation as resolved.
Show resolved Hide resolved
- Out-of-tree repositories may get hash mismatch errors
- If the cause of the mismatch is staleness, this is good and working as intended
- If the cause is non-determinism, this is frustrating.
- Some derivations assume "source" to be the name of sourceRoot
- This has been mitigated over two years within Nixpkgs
- Out-of-tree code may break if they assume "source" is the name
- Can be mitigated with release notes describing the issue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are caused by the assumption of src.name == "source" being broken.

However, we have already deprecated such assumption as early as Nixpkgs 21.11. Nixpkgs Manual now requires sourceRoot to start with "${src.name}" instead of "source" when src is constructed with by fetchgit-based fetchers, and tree-wide conversions (NixOS/nixpkgs#245388, NixOS/nixpkgs#247977, NixOS/nixpkgs#248528, NixOS/nixpkgs#248683, NixOS/nixpkgs#294334) have been merged.

fetchFromGitHub and other output-as-a-directory fetchers can still be used interchangeably if we stick to ${src.name} instead of "source".

In my opinion, we do not provide bug-level compatibility to packages failing to follow already-stablized specifications, in-tree or out-of-tree.


# Alternatives
[alternatives]: #alternatives

- Do nothing. Retain current status quo.

- In https://github.com/NixOS/nixpkgs/pull/153386#issuecomment-1007729116, @Ericson2314 mentioned that this may be solved at the Nix tooling level. And that a year should be give to see if an implementation can be done.
- That was 2+ years ago, and an ounce of prevention today is worth ten ounces of remedy tomorrow.

# Prior art
[prior-art]: #prior-art

- https://github.com/NixOS/nixpkgs/pull/153386

# Unresolved questions
[unresolved]: #unresolved-questions

- Full commit hashes could be truncated. This sacrifices a bit of simplicity for better looking derivation names:
```
let
version = builtins.replaceStrings [ "refs/tags/" ] [ "" ] rev;
# Git commit hashes are 40 characters long, assume that very long versions are version-control labels
ref = if (builtins.stringLength rev) > 15 then builtins.substring 0 8 version else version;
in lib.sanitizeDerivationName "${repo}-${ref}-src";
Comment on lines +96 to +102
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't recommend this. Short hashes are not guaranteed to be stable for long term storage.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

40 characters are not really that long for a machine-generated derivation.

People would most likely see the store hash and (hopefully) part of the name when it flashes through their terminal. During debugging, a path like that would occupy at most a bit more than one line inside the terminal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current "source" isn't that stable either. 8 characters is still a fair amount of entropy, and likely to be different enough for most repositories.

Copy link
Contributor Author

@jonringer jonringer Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remember, that it doesn't have to be "unique across all time". It just needs to be different than what was there before, so that the combination of name + hash are different.

```

- Similar treatment to similar fetchFromGitX helpers?

# Future work
[future]: #future-work

- Apply name change to fetchFromGitHub in PR to staging:
- Resolve stale FODs, most of these can be PRs made against master
- Resolve assumed usage of "source" as `src.name`, most of these changes can be made against master
- Revert name to "source" for removed source urls.
- Add release notes for potential breakages when assuming the name of the unpacked directory is "source"