Support partial cloning with ability to commit on partial clones #11

jfkw · 2016-05-25T20:43:09Z

s3git mentions the potential for creating very large repositories.

Is it possible to clone only a part (subtree or even specific files) of those large repositories, and create new commits using that partial clone?

If it is already possible, that would be a good addition to the documented use case. If not, please consider this a feature request.

fwessels · 2016-05-25T21:14:11Z

Yes, s3git can create really large repos.

I think what you may be looking for is the following: it is possible to do an s3git snapshot checkout --dedupe (after cloning the repo whereby not all content is downloaded to local disk).

It is briefly mentioned here: https://github.com/s3git/s3git/blob/master/BINARY-RELEASE-MANAGEMENT.md#deduped-format-for-updating-a-snapshot

But to quote:

By default a s3git snapshot checkout will reconstruct the directory structure and files into their original format (and download the content from cloud storage).
If all you are interested in is write access (ie. writing/updating/renaming/removing files) then you can use the --dedupe flag for the s3git snapshot checkout. This will recreate the directory and files structure with shadow content that contains binary hashes/pointers, so if you (recursively) list the snapshot it will appear identical to a full-fledged checkout (note that many files are 128 bytes or larger multiples of 64 bytes).
You can now copy in new data and or move files around or delete them. When you are done you can create a new snapshot as regular using s3git snapshot create -m "Updated version" in order to push it upstream.

Is this what you are looking for? (the documentation is still WIP and it would be good to highlight something like this).

jfkw · 2016-05-25T21:55:27Z

Yes, that sounds like it will work well, thank you.

I would not have understood that --dedupe corresponded to shadow content (hashes/pointers only), perhaps the documentation can go to some lengths to establish that association in users minds.

For truly large repositories by file count and/or directory nesting, even the --dedupe option may represent overhead that leads users to use s3git differently if they are limited by space, bandwidth time, etc. If a further mode/option can support cloning and operations on only a controlled subset of deduped files, that could be a real boon for small operations on large repositories.

fwessels · 2016-05-26T08:43:12Z

So you mean like a filter for some sort for 'sparse' checkouts such as described in http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/ ?

Something like this shouldn't be too difficult.

Note that there is also the s3git snapshot ls command to just list the contents of a snapshot (without creating any files on local disk). The --presigned option will create a so-called 'presigned URL' so that the file can be fetched directly out of cloud storage, see here for more info https://github.com/s3git/s3git/blob/master/BINARY-RELEASE-MANAGEMENT.md#grab-straight-out-of-cloud-storage

NB I'll make a note regarding the documentation to clarify this as it is an important issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support partial cloning with ability to commit on partial clones #11

Support partial cloning with ability to commit on partial clones #11

jfkw commented May 25, 2016

fwessels commented May 25, 2016

jfkw commented May 25, 2016

fwessels commented May 26, 2016

Support partial cloning with ability to commit on partial clones #11

Support partial cloning with ability to commit on partial clones #11

Comments

jfkw commented May 25, 2016

fwessels commented May 25, 2016

jfkw commented May 25, 2016

fwessels commented May 26, 2016