Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for .kitignore, fix paths handling when packing/unpacking #212

Merged
merged 10 commits into from Apr 17, 2024

Conversation

amisevsk
Copy link
Contributor

Description

Add support for .kitignore files. These files use the same semantics as .dockerignore:

  • Each line is a path, files are ignored if the path is a prefix for their path
  • Wildcards are supported: * - multiple characters, ? - single character
  • The ** wildcard can be used to match any number of directories (e.g. **/*.md will ignore all .md files no matter where they are
  • The prefix ! can be used to negate matching, to reinclude a pattern that is excluded previously (e.g. adding !README.md) will include the readme even if it's ignored via **/*.md.

In addition, this PR fixes a few issues with how we were handling paths:

  • Using a non . path for context dir (in pack) or target dir (in unpack) now works without strange behaviour with some relative paths (if something resolved to . it would be placed in the working directory instead of the target directory).
  • It's now possible to use . as a path in Kitfiles, allowing e.g. a simple Kitfile where the model lives in the same directory as the Kitfile (previously, everything was assumed to be a subdirectory)
  • Kit can now handle intersecting paths without compressing files twice, e.g. in the Kitfile below, the my-model model will not include the subdirectory main-dir/intersect-dir as it is included in the my-dataset dataset
    manifestVersion: 1.0.0
    package:
      name: test-intersecting
    model:
      name: my-model
      path: main-dir
    datasets:
      - name: my-dataset
        path: main-dir/intersect-dir

The above also mean it's possible to write Kitfiles like

manifestVersion: 1.0.0
package:
  name: test-intersecting
model:
  name: my-model
  path: ./my-model
code:
  - path: .

There's more detail on changes in commit messages.

Linked issues

Closes #119
Required for #85

Testing

To come...

Add support for .kitignore files, using the same semantics as
.dockerignore:

* Each line is a pattern for ignoring files based on prefix
* Patterns support '*' and '?' wildcards
* Prepending ! negates a pattern
Fix some issues with kit pack: there are three "types" of packing we
need to be able to do:

* Single files (e.g. path is `somedir/myFile.txt`)
* Entire directories (e.g. path is `somedir/subdir`)
* The context directory itself (e.g. path is `.`)

For the first case, we create a tarball with only one file in it; for
the second, we compress the directory itself. This allows for the unpack
process to be

1. Create base directory for layer's path (e.g. `somedir` above)
2. Unpack the layer's tarball into the directory

However, treating the third case in the same way would lead to us adding
a directory named `<context-dir>` to the tarball, which is inappropriate
(e.g. if I have my files in /projects/mydir, you want to be able to
unpack to /projects/otherdir).

To avoid this issue (and having to special-case when the path is `.`),
we need to switch the pack and unpack processes to working on relative
paths rather than absolute paths. This updates kit pack and unpack to
work as follows

1. Resolve absolute paths for e.g. Kitfile and context directory
2. Change working directory to the context directory (or target
   directory for unpack)
3. Pack or unpack based on relative paths within the kitfile rather than
   calculated absolute paths
4. If the directory to be packed is `.`, skip creating a directory entry
   for it.

This change requires relaxing some checks for existing directories (i.e.
don't require the --overwrite flag when a directory exists); we still
require the --overwrite flag before overwriting files.
Avoid failing when unpacking encounters a directory that already
exists. This is not necessarily a problem as multiple layers can create
the same directories in some cases.
Update how we handle paths to allow layers to intersect, e.g.

model:
  path: ./my-model/
code:
  path: ./my-model/config/

With this change, in addition to processing the ignore file, we also
consider all other layers when packing a layer. If a path is not
excluded by the ignore file, it is still excluded if

1. Some other layer path is a prefix for the current path, and
2. The other layer path is _not_ a prefix for the current layer path

(more explicitly, while processing `./my-model`, we ignore any paths
starting with `./my-model/config`, but when processing
`./my-model/config`, we don't ignore paths starting with `./my-model` as
this would exclude all paths).

This commit refactors how models are handled internally, removing the
Model struct and working on the Kitfile directly instead. Functionality
for ignore files is moved into the filesystem package and extended to
support the test above
Remove out-of-date test for the pack command; this test relied on
default kitfile name behavior, which is no longer used since we
explicitly check whether a kitfile exists.
@amisevsk amisevsk requested a review from gorkem April 12, 2024 20:59
Copy link
Contributor

@gorkem gorkem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I test this with ghcr.io/jozu-ai/modelkit-examples/tensorflow-image:latest It fails to package files under the directories.

pkg/lib/filesystem/ignore.go Show resolved Hide resolved
Clean paths passed in to Ignore.Matches() in the same way we clean up
during set up to avoid false results -- e.g. ./my-dir and my-dir not
matching. Otherwise, a Kitfile that uses paths with prefixed dot-slash
will be ignored incorrectly.
@amisevsk
Copy link
Contributor Author

When I test this with ghcr.io/jozu-ai/modelkit-examples/tensorflow-image:latest It fails to package files under the directories.

Good catch, I wasn't cleaning paths before checking if they match, which led to ./data not matching data.

@amisevsk amisevsk merged commit cb615fc into jozu-ai:main Apr 17, 2024
1 check passed
@amisevsk amisevsk deleted the ignore-file branch April 17, 2024 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement ignore file
2 participants