Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GGUF #365

Open
philpax opened this issue Jul 10, 2023 · 4 comments · Fixed by #412 · May be fixed by #442
Open

Support GGUF #365

philpax opened this issue Jul 10, 2023 · 4 comments · Fixed by #412 · May be fixed by #442
Labels
app:cli App: the `llm` CLI issue:enhancement New feature or request

Comments

@philpax
Copy link
Collaborator

philpax commented Jul 10, 2023

GGUF is the new file format specification that we've been designing that's designed to solve the problem of not being able to identify a model. The specification is here: ggerganov/ggml#302

llm should be able to do the following:

  • continue supporting existing models (i.e. this change should be non-destructive)
  • load GGUF models and automatically dispatch to the correct model.
    • load_dynamic already has an interface that should support this, but loading currently only begins after the model arch is known
    • use the new information stored within the metadata to improve the UX, including automatically using the HF tokenizer if available
  • save GGUF models, especially in quantization

llm could do the following:

  • convert old models to GGUF models with prompting for missing data
  • implement the migration tool mentioned in the spec, which does autonomous conversion for users based on hashes
@philpax philpax added issue:enhancement New feature or request app:cli App: the `llm` CLI labels Jul 10, 2023
@philpax philpax mentioned this issue Aug 20, 2023
18 tasks
@EwoutH
Copy link

EwoutH commented Sep 8, 2023

To give an update on the state of GGUF: Halfway August GGUF was merged into llama.cpp (ggerganov/llama.cpp#2398 (comment)). It’s full specification can be found here.

Recap of what GGUF is:

  • binary file format for storing models for inference
  • designed for fast loading and saving of models
  • easy to use (with a few lines of code)
  • mmap (memory mapping) compatibility: models can be loaded using mmap for fast loading and saving.

@philpax philpax pinned this issue Sep 19, 2023
@pixelspark
Copy link
Contributor

Hi all, any updates on this?

@philpax
Copy link
Collaborator Author

philpax commented Oct 2, 2023

Hi - sorry about the lack of updates, I've been extremely busy for the last ~two months and haven't had much free time to work on llm. I'm hoping this will ease up soon and we can start catching up proper.

@philpax philpax linked a pull request Nov 12, 2023 that will close this issue
17 tasks
@Dipeshpal
Copy link

Any updates on gguf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app:cli App: the `llm` CLI issue:enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants