Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #442

Open
wants to merge 63 commits into
base: main
Choose a base branch
from
Open

Develop #442

wants to merge 63 commits into from

Conversation

philpax
Copy link
Collaborator

@philpax philpax commented Nov 12, 2023

The pending PRs were interrelated, but I didn't want to leave main in a half-working state, so I've merged all the PRs into a new develop branch. The plan is to work on this branch and leave main in maintenance mode until this is ready.

Closes #365, closes #403, closes #439, closes #77.

This integrates:

  • a GGML version upgrade
  • GGUF support
  • BERT support
  • APIs for context-shuffling

This is the to-do list:

  • Update to the latest GGML
  • Fix CUDA inference
  • Fix OpenCL inference
  • Fix Metal inference
  • Fix the embedded tokenizer
  • Readd quantisation
  • Modularize the model definitions (i.e. move block inference to the block struct)
  • Fix models (ensure they're uncommented in llm):
    • Fix BLOOM
    • Fix GPT-NeoX
    • Fix Falcon
    • Fix GPT-2
    • Fix GPT-J
    • Fix MPT
    • Fix BERT
  • Remove the expects
  • Fix the TODOs

nerdypepper and others added 30 commits August 7, 2023 14:55
Co-authored-by: Lukas Kreussel <lukaskreussel@gmail.com>
Co-authored-by: Philpax <me@philpax.me>
* with some heavy caveats, see the PR
Build against newer GGML version
Add "context swap" functions to session and add "decoded_tokens" to snapshot read/write
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants