Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ollama ps command for showing currently loaded models #4327

Merged
merged 6 commits into from May 14, 2024
Merged

Conversation

pdevine
Copy link
Contributor

@pdevine pdevine commented May 10, 2024

This change adds a rudimentary ps command which makes use of the new scheduler changes in the server. The UX also

The UX for this depends on whether you're using the CPU, GPU, or a hybrid of both and looks like:

NAME            ID              SIZE    PROCESSOR        UNTIL
mistral:latest  61e88e884507    5.4 GB  100% GPU         28 seconds from now

NAME            ID              SIZE    PROCESSOR        UNTIL
mistral:latest  61e88e884507    5.4 GB  48%/52% CPU/GPU  28 seconds from now

NAME            ID              SIZE    PROCESSOR        UNTIL
mistral:latest  61e88e884507    5.4 GB  100% CPU         28 seconds from now

Additionally, there is a new --keepalive flag in the REPL which can be used to set how long you want the model to stay resident in memory after the model has finished inference. It takes a duration string (e.g. 3m30s), however we can switch this to also accept integers similar to the API.

This also introduces a new /api/ps endpoint which returns back a response similar to the /api/tags endpoint albeit with additional information. The size of the running model will not match the amount reported from the /api/tags endpoint for a given model since it can take additional memory when loaded onto the GPU or as a hybrid.

Partially addresses #3902
Fixes #4013
Replaces #2359

@pdevine pdevine mentioned this pull request May 11, 2024
@@ -324,6 +325,18 @@ func RunHandler(cmd *cobra.Command, args []string) error {
}
opts.Format = format

keepAlive, err := cmd.Flags().GetString("keepalive")
Copy link
Member

@jmorganca jmorganca May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keepAlive, err := cmd.Flags().GetString("keepalive")
keepAlive, err := cmd.Flags().GetString("keep-alive")

I think this is ok as is, but would suggest keep-alive to be consistent with the api keep_alive – although the design of ollama's cli should be a matter of taste/usability over consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but we have --nowordwrap as one of the other options for run, so it felt weird going to kebab case here.

@pdevine pdevine merged commit 6845988 into main May 14, 2024
15 checks passed
@pdevine pdevine deleted the pdevine/ps branch May 14, 2024 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API Endpoint for Listing Loaded Running Models
2 participants