New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache and reuse intermediate blobs #4330
Conversation
e13c2d9
to
267ce45
Compare
267ce45
to
42906de
Compare
42906de
to
5deb116
Compare
5deb116
to
5d5eb1c
Compare
5d5eb1c
to
39efb30
Compare
8d807d7
to
0aba2d5
Compare
particularly useful for zipfiles and f16s
0aba2d5
to
3520c0e
Compare
|
||
"github.com/ollama/ollama/api" | ||
"github.com/ollama/ollama/convert" | ||
"github.com/ollama/ollama/llm" | ||
"github.com/ollama/ollama/types/model" | ||
) | ||
|
||
var intermediateBlobs sync.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, this is a map of blob digests to ggml model layer digests? Does intermediate mean f16 but not yet quantized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intermediate here means any blob that's not referenced in but used to produce something in the final manifest. here's a concrete example:
FROM /path/to/safetensors/dir
creates a zip (intermediate) converts into a f16 (intermediate) quantizes into a q4_0 (final)
when you want to create another quantization, this map tracks the relationship between the zip and f16 so it's able to skip uploading the zip and reconverting the f16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this probably deserves a comment describing its operation. I think most people aren't going to know what intermediate means here when they're going through the code.
I was trying to think os some other names, like maybe "blobCache" but then I think people will wonder why only some things are in the blobCache. Although why not shove everything into the blobCache? Why just put the "intermediate" stuff there?
Or the inverse; why not just put the intermediate stuff on desk and just use the normal machanism for pulling blobs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also a little confused about why sync.Map
here instead of map[string]bool
. I don't think there should be any contention here w/ mutexes given you're just shoving the same digest into the map.
return | ||
} | ||
|
||
if _, err := os.Stat(p); errors.Is(err, os.ErrNotExist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should probably include a debug statement in here for the cache hit but something else removed the storage.
|
||
"github.com/ollama/ollama/api" | ||
"github.com/ollama/ollama/convert" | ||
"github.com/ollama/ollama/llm" | ||
"github.com/ollama/ollama/types/model" | ||
) | ||
|
||
var intermediateBlobs sync.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this probably deserves a comment describing its operation. I think most people aren't going to know what intermediate means here when they're going through the code.
I was trying to think os some other names, like maybe "blobCache" but then I think people will wonder why only some things are in the blobCache. Although why not shove everything into the blobCache? Why just put the "intermediate" stuff there?
Or the inverse; why not just put the intermediate stuff on desk and just use the normal machanism for pulling blobs?
@@ -340,7 +340,24 @@ func CreateModel(ctx context.Context, name, modelFileDir, quantization string, m | |||
return err | |||
} | |||
} else if strings.HasPrefix(c.Args, "@") { | |||
blobpath, err := GetBlobsPath(strings.TrimPrefix(c.Args, "@")) | |||
digest := strings.TrimPrefix(c.Args, "@") | |||
if ib, ok := intermediateBlobs.Load(digest); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't ib
and digest
the same thing? Can't you just call this as _, ok := intermediateBlobs.Load(digest); ok {
and use digest
below?
|
||
"github.com/ollama/ollama/api" | ||
"github.com/ollama/ollama/convert" | ||
"github.com/ollama/ollama/llm" | ||
"github.com/ollama/ollama/types/model" | ||
) | ||
|
||
var intermediateBlobs sync.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also a little confused about why sync.Map
here instead of map[string]bool
. I don't think there should be any contention here w/ mutexes given you're just shoving the same digest into the map.
particularly useful for zipfiles and f16s