New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for .kitignore, fix paths handling when packing/unpacking #212
Conversation
Add support for .kitignore files, using the same semantics as .dockerignore: * Each line is a pattern for ignoring files based on prefix * Patterns support '*' and '?' wildcards * Prepending ! negates a pattern
Fix some issues with kit pack: there are three "types" of packing we need to be able to do: * Single files (e.g. path is `somedir/myFile.txt`) * Entire directories (e.g. path is `somedir/subdir`) * The context directory itself (e.g. path is `.`) For the first case, we create a tarball with only one file in it; for the second, we compress the directory itself. This allows for the unpack process to be 1. Create base directory for layer's path (e.g. `somedir` above) 2. Unpack the layer's tarball into the directory However, treating the third case in the same way would lead to us adding a directory named `<context-dir>` to the tarball, which is inappropriate (e.g. if I have my files in /projects/mydir, you want to be able to unpack to /projects/otherdir). To avoid this issue (and having to special-case when the path is `.`), we need to switch the pack and unpack processes to working on relative paths rather than absolute paths. This updates kit pack and unpack to work as follows 1. Resolve absolute paths for e.g. Kitfile and context directory 2. Change working directory to the context directory (or target directory for unpack) 3. Pack or unpack based on relative paths within the kitfile rather than calculated absolute paths 4. If the directory to be packed is `.`, skip creating a directory entry for it. This change requires relaxing some checks for existing directories (i.e. don't require the --overwrite flag when a directory exists); we still require the --overwrite flag before overwriting files.
Avoid failing when unpacking encounters a directory that already exists. This is not necessarily a problem as multiple layers can create the same directories in some cases.
Update how we handle paths to allow layers to intersect, e.g. model: path: ./my-model/ code: path: ./my-model/config/ With this change, in addition to processing the ignore file, we also consider all other layers when packing a layer. If a path is not excluded by the ignore file, it is still excluded if 1. Some other layer path is a prefix for the current path, and 2. The other layer path is _not_ a prefix for the current layer path (more explicitly, while processing `./my-model`, we ignore any paths starting with `./my-model/config`, but when processing `./my-model/config`, we don't ignore paths starting with `./my-model` as this would exclude all paths). This commit refactors how models are handled internally, removing the Model struct and working on the Kitfile directly instead. Functionality for ignore files is moved into the filesystem package and extended to support the test above
Remove out-of-date test for the pack command; this test relied on default kitfile name behavior, which is no longer used since we explicitly check whether a kitfile exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I test this with ghcr.io/jozu-ai/modelkit-examples/tensorflow-image:latest
It fails to package files under the directories.
Clean paths passed in to Ignore.Matches() in the same way we clean up during set up to avoid false results -- e.g. ./my-dir and my-dir not matching. Otherwise, a Kitfile that uses paths with prefixed dot-slash will be ignored incorrectly.
Good catch, I wasn't cleaning paths before checking if they match, which led to |
Description
Add support for
.kitignore
files. These files use the same semantics as.dockerignore
:*
- multiple characters,?
- single character**
wildcard can be used to match any number of directories (e.g.**/*.md
will ignore all.md
files no matter where they are!
can be used to negate matching, to reinclude a pattern that is excluded previously (e.g. adding!README.md
) will include the readme even if it's ignored via**/*.md
.In addition, this PR fixes a few issues with how we were handling paths:
.
path for context dir (in pack) or target dir (in unpack) now works without strange behaviour with some relative paths (if something resolved to.
it would be placed in the working directory instead of the target directory)..
as a path in Kitfiles, allowing e.g. a simple Kitfile where the model lives in the same directory as the Kitfile (previously, everything was assumed to be a subdirectory)my-model
model will not include the subdirectorymain-dir/intersect-dir
as it is included in themy-dataset
datasetThe above also mean it's possible to write Kitfiles like
There's more detail on changes in commit messages.
Linked issues
Closes #119
Required for #85
Testing
To come...