Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse Checkout #6169

Closed
wants to merge 0 commits into from
Closed

Conversation

YuKitsune
Copy link

@YuKitsune YuKitsune commented Jan 12, 2022

This PR aims to implement basic sparse checkout functionality into LibGit2 building on @jochenhz's original PR.

What does it do?

  • Status / Diff takes GIT_INDEX_ENTRY_SKIP_WORKTREE into account
  • sparse-checkout file is treated as an attr_file, hence following gitignore rules
  • Skips sparse files on checkout
  • Sparse checkout can be enabled with core.sparseCheckout
  • support init, set, add, reapply and disable commands all supported
  • Expose an interface function git_sparse_check_path to query whether or not a file is excluded from checkout
  • Set GIT_INDEX_ENTRY_SKIP_WORKTREE flag on git_index_read_tree based on the sparse checkout ruleset.

What does it not do?

  • Smart trickery to determine whether or not a full tree is sparse (would be a very welcome optimization)
  • Support core.sparseCheckoutCone
  • No sparse checkout option on clone (but this can be achieved with GIT_CHECKOUT_NONE and git_index_read_tree)

Note

  • I've got some tests around worktrees, they seem to work
  • Given how complex sparse checkout can be, I'm almost certain I haven't gotten this implementation 100% correct, though feel free to prove me wrong 馃憤

src/sparse.c Outdated
Comment on lines 119 to 68
if ((error = git_str_joinpath(&infopath, repo->gitdir, "info")) < 0) {
if (error != GIT_ENOTFOUND)
goto done;
error = 0;
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to have used something like git_repository__item_path here, but it would always get the info path for the root repository, not the worktrees. Keen to know if there's a nicer way to do this.

@YuKitsune YuKitsune marked this pull request as ready for review January 12, 2022 01:35
/** Skip files in the diff that are excluded by the `sparse-checkout` file.
* Set to 1 to skip sparse files, 0 otherwise
*/
int skip_sparse_files;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one might require some discussion on how flags are handled.
There is a flags field, but it is almost full. Only (1u << 27) is available.

Most of the flags are in enums, I think the reason for this is that it provides some type safety while still being passable as arguments to functions.
In this particular case the flags are never used as a parameter, the user passes the git_diff_options struct instead.
This means that it would be possible to specify the flags in bitfields as unsigned int skip_sparse_files:1;
Unfortunately we can't split out the existing enum to a regular bitfield without being incompatible but IMO that would be a good thing to do for flags passed in a larger struct like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinging in @ethomson since he probably have given some more thought on what to do when we run out of flags.

src/sparse.h Outdated
GIT_SPARSE_NOTFOUND = -1,
GIT_SPARSE_NOCHECKOUT = 0,
GIT_SPARSE_CHECKOUT = 1,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we give this enum a typedef so that the proper type can be used instead of int?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've called it git_sparse_status for now, though I'm open to some better name suggestions.

src/util.c Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
src/checkout.c Outdated
const git_index_entry *e = git_index_get_byindex(data->index, pos);
if (e == NULL ||
(e->flags_extended & GIT_INDEX_ENTRY_SKIP_WORKTREE) == 0 ||
(data->strategy & GIT_CHECKOUT_REMOVE_SPARSE_FILES) != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be possible to add some comment on what is going on here?

src/sparse.c Outdated
scan = (char *)git_str_cstr(&data);
while (!error && *scan) {

buf = git__strtok_(&scan, "\r\n", "\n", "\0");
Copy link
Contributor

@boretrk boretrk Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't how strtok works. strtok matches any character in the string.
git__strtok(&scan, "\r\n") is what you want here.
If either '\r' or '\n' is encountered it would split on them and empty segments are skipped over.
So "\r\n" would first match on '\r' and then skip over the '\n'.
It would also take "\n\n\r" or any combination of them.

But over all it seems a bit strange to have this splitting and parsing here.
https://www.git-scm.com/docs/git-sparse-checkout says:

By default, the sparse-checkout file uses the same syntax as .gitignore files.

Shouldn't we aim to have the same function for parsing them?

I haven't really thought it through but I have this feeling that it could be possible to merge the spare checkout list and the ignore list for some checks but maybe there is something that prevents that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've replaced git__strtok_ with git__strtok, I couldn't find any "git ignore list"-like functions around, so I quickly threw this together. I'm open to some ideas on how we can make this list function a bit nicer.

I've also consolidated the ignore parsing functions, so that parse_sparse_file now uses parse_ignore_file instead. I'd like to keep the parse_sparse_file function around since cone mode should change the parsing behaviour a bit.

@YuKitsune
Copy link
Author

Bump.
Is there anything I can do to help progress this further?

@VinnyOG
Copy link

VinnyOG commented Aug 5, 2022

@YuKitsune Is this PR in a state that it's working for you? I would like to patch it into a project that I'm working on and wondering if there are any rough edges I should know about.

Thanks for your work in continuing on this!

VinnyOG pushed a commit to 8thwall/libgit2 that referenced this pull request Aug 5, 2022
take skip worktree flag into account when handling unmatched items while diffing
Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

move does_negate_rule from ignore.c to attr_file.h so that it can be used by sparse checkout code as well

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

add core.sparseCheckout to configmap cache

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

implement very basic handling of .git/info/spare-checkout file.

The sparse-checkout file is treated as a git_attr_file. A public API git_sparse_check_path reads the sparse-checkout file from the attribute file cache and checks the path against this attribute file using the gitignore ruleset

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

implement git_sparse__lookup function that can be called on initialised git_sparse object

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

when doing git_index_read_tree set skip-worktree flag for index entries that are excluded by the sparse checkout ruleset - if sparse checkout is enabled

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

fix cleanup of git_sparse structure when core.sparseCheckout is disabled

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

set skip-worktree flag when index entry is initialized and path is sparse

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

cache sparse-checkout values when iterating a tree so that it can be accessed by checkout and diff functionalities quickly.

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

skip sparse files for unmatched new and old items if requested

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

skip sparse files on checkout, add option to remove sparse files on checkout

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

add test repository for sparse unit tests

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

add tests for git_sparse_check_path and git status on sparse repositories

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

test that index operations set skip-worktree flag if needed

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

implement git sparse checkout tests

Signed-off-by: Jochen Hunz <j.hunz@anchorpoint.app>

fix: parsing sparse file does not require context for subdirs


Fix build errors form rebase

Fix build errors form rebase

Implement init and set

Fix `git_sparse_checkout_set` not enabling sparse checkout

WIP: Implementing other sparse-checkout commands

- Temporarily remove cone references, will need to revisit
- Updating references in tests

fix sparse_checkout_list impl

update tests

re-sync `git_attr__does_negate_rule`

update test coverage

make `git_sparse_checkout_init` with default patterns

fix init overwriting sparse-checkout file

docs tidy-up

Tidy-up

Update sparse checkout tests

update set.c

implement git sparse-checkout apply

hack: restore working directory on disable

Updating tests

Tidy up

Fix missing methods from rebase

move sparse diff flag to separate field

seems we've run out of space on that enum...

add version to sparse checkout init opts

fix more missing methods from rebase

Move diff ignore sparse flag

fix sparse checkout file being written to the root

Fix up assertions

add worktree tests

tidy up args and internals

Further tidy up

fix sparse-checkout file not writing to worktree gitdir

remove todo

add git sparse status typedef

remove `git__strtok_`

tidy `parse_sparse_file`

consolidate ignore parsing

add comment addressing skip_worktree and sparse checkout

update gitignore

Fix build errors from rebase

Move sparse tests
VinnyOG pushed a commit to 8thwall/libgit2 that referenced this pull request Aug 5, 2022
@YuKitsune
Copy link
Author

@VinnyOG We haven't gotten around to making use of this. Our use-case requires a new build of LibGit2Sharp, which proved to be too much of a pain.

@extrawurst
Copy link

I wish this would be getting attention

@YuKitsune
Copy link
Author

YuKitsune commented Aug 30, 2022

I've mucked up the rebase and the PR auto-closed... 馃う
Doesn't seem to let me re-open it, so I've created a new one over here #6394
Sorry for the mess 馃槄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants