Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache and reuse intermediate blobs #4330

Merged
merged 1 commit into from May 20, 2024
Merged

Conversation

mxyng
Copy link
Contributor

@mxyng mxyng commented May 10, 2024

particularly useful for zipfiles and f16s

@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from e13c2d9 to 267ce45 Compare May 10, 2024 23:18
@mxyng mxyng changed the title tmp cache and reuse intermediate blobs May 10, 2024
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 267ce45 to 42906de Compare May 10, 2024 23:19
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 42906de to 5deb116 Compare May 10, 2024 23:37
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 5deb116 to 5d5eb1c Compare May 10, 2024 23:49
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 5d5eb1c to 39efb30 Compare May 16, 2024 17:54
@mxyng mxyng changed the base branch from mxyng/modelname-8 to main May 16, 2024 17:54
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch 2 times, most recently from 8d807d7 to 0aba2d5 Compare May 17, 2024 18:40
particularly useful for zipfiles and f16s
@mxyng mxyng force-pushed the mxyng/cache-intermediate-layers branch from 0aba2d5 to 3520c0e Compare May 20, 2024 20:25
@pdevine pdevine mentioned this pull request May 20, 2024
server/images.go Outdated Show resolved Hide resolved

"github.com/ollama/ollama/api"
"github.com/ollama/ollama/convert"
"github.com/ollama/ollama/llm"
"github.com/ollama/ollama/types/model"
)

var intermediateBlobs sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, this is a map of blob digests to ggml model layer digests? Does intermediate mean f16 but not yet quantized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intermediate here means any blob that's not referenced in but used to produce something in the final manifest. here's a concrete example:

FROM /path/to/safetensors/dir creates a zip (intermediate) converts into a f16 (intermediate) quantizes into a q4_0 (final)

when you want to create another quantization, this map tracks the relationship between the zip and f16 so it's able to skip uploading the zip and reconverting the f16

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this probably deserves a comment describing its operation. I think most people aren't going to know what intermediate means here when they're going through the code.

I was trying to think os some other names, like maybe "blobCache" but then I think people will wonder why only some things are in the blobCache. Although why not shove everything into the blobCache? Why just put the "intermediate" stuff there?

Or the inverse; why not just put the intermediate stuff on desk and just use the normal machanism for pulling blobs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a little confused about why sync.Map here instead of map[string]bool. I don't think there should be any contention here w/ mutexes given you're just shoving the same digest into the map.

@mxyng mxyng merged commit b4dce13 into main May 20, 2024
12 checks passed
@mxyng mxyng deleted the mxyng/cache-intermediate-layers branch May 20, 2024 20:54
return
}

if _, err := os.Stat(p); errors.Is(err, os.ErrNotExist) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably include a debug statement in here for the cache hit but something else removed the storage.


"github.com/ollama/ollama/api"
"github.com/ollama/ollama/convert"
"github.com/ollama/ollama/llm"
"github.com/ollama/ollama/types/model"
)

var intermediateBlobs sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this probably deserves a comment describing its operation. I think most people aren't going to know what intermediate means here when they're going through the code.

I was trying to think os some other names, like maybe "blobCache" but then I think people will wonder why only some things are in the blobCache. Although why not shove everything into the blobCache? Why just put the "intermediate" stuff there?

Or the inverse; why not just put the intermediate stuff on desk and just use the normal machanism for pulling blobs?

@@ -340,7 +340,24 @@ func CreateModel(ctx context.Context, name, modelFileDir, quantization string, m
return err
}
} else if strings.HasPrefix(c.Args, "@") {
blobpath, err := GetBlobsPath(strings.TrimPrefix(c.Args, "@"))
digest := strings.TrimPrefix(c.Args, "@")
if ib, ok := intermediateBlobs.Load(digest); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't ib and digest the same thing? Can't you just call this as _, ok := intermediateBlobs.Load(digest); ok { and use digest below?


"github.com/ollama/ollama/api"
"github.com/ollama/ollama/convert"
"github.com/ollama/ollama/llm"
"github.com/ollama/ollama/types/model"
)

var intermediateBlobs sync.Map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also a little confused about why sync.Map here instead of map[string]bool. I don't think there should be any contention here w/ mutexes given you're just shoving the same digest into the map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants