Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching a github repo with submodules and checksum #229

Open
markus1189 opened this issue Jun 16, 2020 · 5 comments
Open

Fetching a github repo with submodules and checksum #229

markus1189 opened this issue Jun 16, 2020 · 5 comments

Comments

@markus1189
Copy link

Hi,

I have the following use case, but it seems like niv (currently) does not support it.

Here is the scenario:

  • I want to add https://github.com/darktable-org/darktable to niv
    • but only adding it as type=tarball does not allow me to fetch submodules, therefore the sha256 is incorrect (differs from the one with submodules)
    • using type=git and the unstable nix version as described in fetchGit and submodules #58 would work, but as I understand it, that means that niv update is a no-op on the json and everytime I import from sources.nix it will download the repo again (which takes a lot time)

What I want is a json entry that also has the sha256 set, such that the evaluation of sources.nix does not download the huge git repo everytime.

As far as I see, the problem is that the builtins.fetchgit does not support adding a sha256, so this would require fetchFromGitHub which is not builtin. I could change my sources.nix to add this, which is nice, but what I don't get is that niv update works and updates the package...

Can you confirm my observations? What would be a good way to add this behavior to the code? It seems like we could add a case on type in:

niv/src/Niv/Cli.hs

Lines 346 to 349 in f73bf8d

let cmd = case HMS.lookup "type" (unPackageSpec defaultSpec) of
Just "git" -> gitCmd
Just "local" -> localCmd
_ -> githubCmd

and for example use nix-prefetch-git for a github type?

@tfc
Copy link
Contributor

tfc commented Jul 1, 2020

i just commented on a similar matter here: #214

The whole thing with submodules support would also be very interesting for me.

@nmattia
Copy link
Owner

nmattia commented Jul 27, 2020

(Sorry for the late reply, I was out and still catching up on notifications)

that means that niv update is a no-op on the json

No, it should still update the rev if I understand the setup correctly. It's treated as a Git repo which should work fine.

it will download the repo again (which takes a lot time)

What exactly downloads the repo?

the problem is that the builtins.fetchgit does not support adding a sha256, so this would require fetchFromGitHub which is not builtin.

That's a problem, because fetchFromGitHub does not support submodules I think Looks like fetchFromGitHub supports submodules. What is the problem with it not being a built-in? What fetcher would you use to fetch submodules while still having a sha256?

@tfc
Copy link
Contributor

tfc commented Jul 27, 2020

hey @nmattia, i wrote my comments a bit with a hot needle.

it somehow seems that builtins.fetchgit does not get stuff from the store even if it's available there.
on an offline machine with everything that my repo gets from niv precached in the store (which i did by calculating the closure of a niv attrset and storing them all in an iso file), nix still tries to download stuff from the internet.

@markus1189
Copy link
Author

markus1189 commented Jul 28, 2020

(Sorry for the late reply, I was out and still catching up on notifications)

No worries!

that means that niv update is a no-op on the json

No, it should still update the rev if I understand the setup correctly. It's treated as a Git repo which should work fine.
Hmm at least as far as I remember, it did not update.

it will download the repo again (which takes a lot time)

What exactly downloads the repo?

Using the sources.nix attribute of the dependency (darktable in the example above) in e.g. my NixOS config

the problem is that the builtins.fetchgit does not support adding a sha256, so this would require fetchFromGitHub which is not builtin.

That's a problem, because fetchFromGitHub does not support submodules I think Looks like fetchFromGitHub supports submodules. What is the problem with it not being a built-in? What fetcher would you use to fetch submodules while still having a sha256?

I think I didn't put that in the right words :) Using fetchFromGitHub does work indeed, but then we would also need to change the type=git fetcher away from fetchgit or introduce another type?

@nmattia
Copy link
Owner

nmattia commented Aug 14, 2020

Hey guys, quick update. I didn't drop the ball, but I opened a pretty big can of worm when I started working on #111 (implementation here: #258).

NOTE: ok this is longer than I thought but writing this down made it a bit clearer in my head. Feedback very much welcome.

I'll start with a quick recap of how niv works and what it does; then I'll give a quick overview of potential solutions.


There are two sides of niv: one is the "Nix evaluation" that's provided with sources.nix and the other one is the update, with niv update. The "Nix evaluation" tries to pick the best fetcher possible (for instance, fetchGit should be used for private repos because fetchFromGitHub just won't work (in any way practical)). The update part hits the GitHub API, pings git repos and calls nix-prefetch-url to find information about the sources like: the default branch (if none is provided), the latest revision on the branch and potentially the sha256.


Now, let's focus on git repositories (including GitHub projects). What's the best fetcher? Well, that depends on three factors: Is the repository public on GitHub? Does the repository require (SSH-)authentication for a git clone? Does the repo have submodules? Let's have a look:

note: I'll talk about fetchgit, fetchGit and fetchzip because fetchFromGitHub uses fetchgit with submodules and fetchzip without. The fetchzip variant works by downloading a tarball from GitHub.

  • The repository is hosted on GitHub, is public, and has no submodules: Any fetcher will work (fetchgit, fetchGit, fetchzip). Both fetchgit and fetchzip are good because they are fast (fixed-output derivation) and run at build-time. fetchGit will work but (1) it will regularly ping the upstream repo to check for changes and (2) will need extra settings when run inside a restrict-eval evaluation.
  • The repository is hosted on GitHub, is public, and has submodules: fetchzip is out of the question because GitHub does not offer tarballs that include submodules. fetchgit will work fine; fetchGit will work in recent versions of Nix with the same caveats as above (regular polling + eval-time considerations).
  • The repository is private: whether on GitHub or not, fetchzip won't work (without leaking the GITHUB_TOKEN which would be a pain). The fetchgit way of cloning repos is so not user friendly that I'll just say "it doesn't work". That leaves us with fetchGit; same caveats as above (regular polling + eval-time considerations) and, in case the repo has submodules, it must use a recent version of Nix.
  • The repository is public but not hosted on GitHub: Both fetchgit and fetchGit will work, but fetchzip won't (because there's no one providing a tarball). Note the caveats mentioned above for fetchGit (regular polling + eval-time considerations + recent Nix for submodules).

So basically, if your repo is public and on GitHub, fetchzip and fetchgit are best; fetchzip is a bit cleaner (just a tarball download) but for consistency fetchgit may be better (it also works if your repo has submodules). If your repo is not on GitHub but is public, use fetchgit. If your repo is private, use fetchGit but caveats (regular polling + eval-time considerations + recent Nix for submodules). Here I'd like to point out that most users just don't care about the fetchGit caveats, so maybe niv should just use fetchGit by default with an option to fallback to fetchgit for public repos.


Ok, now let's figure out how the update part of niv can figure out the latest rev and default branch. There's basically three ways: for one, you can query the GitHub API. Alternatively, you can use git ls-remote or git clone. Using git ls-remote is always preferable to git clone because it contains the info we need (latest revision and/or default branch) but doesn't involve copying any more info (super slow for big repos). So the situation is like this:

  • If the repository is a public GitHub project, then both the GitHub API and and git ls-remote will work fine; however GitHub does some rate limiting if you're not authenticated (i.e. no GITHUB_TOKEN) so let's just say that git ls-remote is better.
  • If the repository is a private GitHub project, then the GitHub API will work, but you'll need to be authenticated (i.e. have a GITHUB_TOKEN). git ls-remote will work, so... git ls-remote is preferable.
  • If the repository is not on GitHub, then... use git ls-remote.

In some cases where you need to clone the repo anyway (see below) then you might as well use git clone directly, but that's just complicating an already complicated story. So instead, let's just say niv should alwasy fetch the latest revision and default branch with git ls-remote.


Finally, in the "fetcher" section above I said that fetchgit needs a sha256. So the question is: how does one get the sha of a repo? This also relates to #111 because whenever we get the sha of a repo, we can always get the last commit date.

  • Repository is public and on GitHub and doesn't have submodules: Two solutions, using the GitHub API (downloading tarball for sha256 and querying commit info for the date) or performing a git clone. A clone can take a long, long time (try getting a shallow clone of https://github.com/torvalds/linux) so the GitHub API is preferable.
  • Repository is private and on GitHub and doesn't have submodules: Same as above, although the user needs to be authenticated (GITHUB_TOKEN) for using the GitHub API. So here the best way is probably to start the git clone and instruct the user to set a GITHUB_TOKEN if it's taking too long.
  • Repository is not on GitHub: a clone is needed here.

Basically: use the GitHub API as much as possible, but fall back to git clone when you can't. When the clone is taking too long, tell the user they could use GitHub instead (or, if only the date is needed, then do something like niv update --no-date).


There's a few other details (how does niv figure out if a repo is on GitHub, private, has submodules?) but that's more of a niv add question (then we can just store the info in sources.json). Still should spend some time thinking about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants