Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Elixir code completion #2332

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jonastemplestein
Copy link

@jonastemplestein jonastemplestein commented Nov 9, 2023

The aim of this PR is to eventually offer Elixir inline code completion within Livebook.

The high level design is like this

  • When users stop typing in a code cell, they will see a ghost text suggesting what they might want to type next. Hitting tab inserts that text
  • We will train a code model (using the python ecosystem)
  • We will run inference in livebook using either bumblebee (in case of beefy GPU with large memory) or otherwise a llama.cpp NIF (in which case we could use quantised models and CPU inference)
  • Users will be able to select which model to run and livebook downloads it for them

For more context on the project, see this this random document with (slightly outdated) notes.

Status

This is a very minimal implementation of copilot style code completion.

At the moment the only LLM supported is the GPT4 API. Set OPENAI_API_KEY env var to play around with it.

Inline completion should appear 500ms after you stop typing. Or use Ctrl + Space to force inline completion to appear.

TODO in Livebook

Frontend polish

  • Don't debounce completion when keyboard shortcut is used
  • Implement stop words logic like Tabby does
  • Better deal with line-breaks (e.g. when the infilled code is meant to start with a newline when the cursor is at the end of a comment line)
  • Don't show completion in certain situations (e.g. empty editor, cursor at the beginning of non-empty line, etc)
  • Completion currently doesn't show when there is an intellisense suggestion. I'd try to make these independent and have [tab] always do code completion and [return] always accept the intellisense suggestion (like the cursor editor does it)

Livebook plumbing

  • Allow user to select model and download that model
  • Communicate what's going on with user while model is a) being downloaded and b) being loaded (and error states)
  • Give context from the whole livebook (not just current cell) to the LLM
  • Completion cache to avoid hitting the LLM unnecessarily

Model inference

  • Llama.cpp HTTP API for rapid prototyping using local llama.cpp
  • Llama.cpp nif for inference
  • Bumblebee model implementation

Tests!

TODO for fine-tuning a model

The hardest task is to actually fine-tune a model

  • Acquire large amounts of elixir code (and remove PII etc - could perhaps use The Stack and the source code of elixir core and a few other open source projects)
  • Turn the code into fill-in-the-middle training examples (can use GPT4 to e.g. generate comments to be used in those examples)
  • Fine tune a bunch of different models to see which one performs best (be mindful they each have different infilling formats - this document has a list of LLMs to evaluate
  • Create mechanism for evaluating fill in the model performance
  • Implement bumblebee model loaders for most promising models
  • Produce model files to be used by livebook - both for bumblebee (HF transformers format) and quantised GGUF

One of the most fiddly bits seems to be to properly tokenise the special infilling tokens (both in bumblebee and llama.cpp). This seems a bit fiddly and the models often output garbage if you get this wrong. There is some good context on these llama.cpp threads [1] [2]

Set OPENAI_API_KEY env var to play around with it
@CLAassistant
Copy link

CLAassistant commented Nov 9, 2023

CLA assistant check
All committers have signed the CLA.

@jonastemplestein
Copy link
Author

Ah sorry, I meant to open this under my own fork. Shall I move it there?

@josevalim
Copy link
Contributor

Feel free to leave it here for people to play with :)

@jonastemplestein jonastemplestein changed the title MVP copilot completion using GPT4 API [WIP] Elixir code completion Nov 10, 2023
This means you can now use any model for completion that llama.cpp can run

Just compile llama.cpp and run the server like this:

./server -m codellama-7b.Q5_K_M.gguf -c 4096

I've tested this with codellama 7B quantised (codellama-7b.Q5_K_M.gguf) and it works well. But I have no idea if the special `/infill` endpoint works for other models, as I don't know how llama.cpp would know about the infilling tokens
- Refactored the way copilot completion backends work
- Added Livebook.Copilot.BumblebeeBackend (including attempting to run Serving under new DynamicSupervisor)
- Added Livebook.Copilot.DummyBackend for testing
- Added Livebook.Copilot.LlamaCppHttpBackend for running models in llama.cpp's server locally
- Added Livebook.Copilot.OpenaiBackend for running on OpenAi
- Added Livebook.Copilot.HuggingfaceBackend to use HF inference endpoints
- Played around with adding some user feedback via flash messages
- Fixed a whole bunch of edge cases and bugs in client side logic
- Request completions instantly (instead of debounced) when manually requested
- Added special comments you can put in livebook cells that will override the configured copilot backend
@jonastemplestein
Copy link
Author

jonastemplestein commented Nov 27, 2023

Just to give a little update on this:

  • I think we will most likely want to use a fine-tune of bumblebee-1.3b (and maybe 6.7b for beefier machines)
  • I got a bit stuck last week trying to fine-tune the model but learned a lot about how the models actually work

Will hopefully have a model that is demonstrably better than bumblebee-1.3b by the end of the week

@josevalim
Copy link
Contributor

Bumblebee or deepseekr? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants