Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support partial cloning with ability to commit on partial clones #11

Open
jfkw opened this issue May 25, 2016 · 3 comments
Open

Support partial cloning with ability to commit on partial clones #11

jfkw opened this issue May 25, 2016 · 3 comments

Comments

@jfkw
Copy link

jfkw commented May 25, 2016

s3git mentions the potential for creating very large repositories.

Is it possible to clone only a part (subtree or even specific files) of those large repositories, and create new commits using that partial clone?

If it is already possible, that would be a good addition to the documented use case. If not, please consider this a feature request.

@fwessels
Copy link
Collaborator

Yes, s3git can create really large repos.

I think what you may be looking for is the following: it is possible to do an s3git snapshot checkout --dedupe (after cloning the repo whereby not all content is downloaded to local disk).

It is briefly mentioned here: https://github.com/s3git/s3git/blob/master/BINARY-RELEASE-MANAGEMENT.md#deduped-format-for-updating-a-snapshot

But to quote:

  • By default a s3git snapshot checkout will reconstruct the directory structure and files into their original format (and download the content from cloud storage).
  • If all you are interested in is write access (ie. writing/updating/renaming/removing files) then you can use the --dedupe flag for the s3git snapshot checkout. This will recreate the directory and files structure with shadow content that contains binary hashes/pointers, so if you (recursively) list the snapshot it will appear identical to a full-fledged checkout (note that many files are 128 bytes or larger multiples of 64 bytes).
  • You can now copy in new data and or move files around or delete them. When you are done you can create a new snapshot as regular using s3git snapshot create -m "Updated version" in order to push it upstream.

Is this what you are looking for? (the documentation is still WIP and it would be good to highlight something like this).

@jfkw
Copy link
Author

jfkw commented May 25, 2016

Yes, that sounds like it will work well, thank you.

I would not have understood that --dedupe corresponded to shadow content (hashes/pointers only), perhaps the documentation can go to some lengths to establish that association in users minds.

For truly large repositories by file count and/or directory nesting, even the --dedupe option may represent overhead that leads users to use s3git differently if they are limited by space, bandwidth time, etc. If a further mode/option can support cloning and operations on only a controlled subset of deduped files, that could be a real boon for small operations on large repositories.

@fwessels
Copy link
Collaborator

So you mean like a filter for some sort for 'sparse' checkouts such as described in http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/ ?

Something like this shouldn't be too difficult.

Note that there is also the s3git snapshot ls command to just list the contents of a snapshot (without creating any files on local disk). The --presigned option will create a so-called 'presigned URL' so that the file can be fetched directly out of cloud storage, see here for more info https://github.com/s3git/s3git/blob/master/BINARY-RELEASE-MANAGEMENT.md#grab-straight-out-of-cloud-storage

NB I'll make a note regarding the documentation to clarify this as it is an important issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants