Use jupyter-collaboration to get the full notebook content for completions #708

krassowski · 2024-03-28T17:34:55Z

Users have repeatedly requested for the LLMs to consider the full notebook rather than only the current cell when generating completions, for example in jupyterlab/jupyterlab#15532.

However, sending the full notebook in each completion request would be slow, especially for large notebooks; completion are supposed to be nearly instant. Sending only the text of all cells might help a little (though not for the very long notebooks), but it would be restrictive (what if model needs the cell ID to generate a link? what if it needs cell outputs?) - future multi-modal AIs could use much more than just cell source (e.g. analysing the cell outputs or attachments).

When designing the inline completer implementation in jupyter-ai we included the file path. This helps a little because one could open the file on the disk to read the notebook. However, this has two limitations:

the notebook on disk may be outdated if user has not saved it recently; in particular the previous cell will be often outdated, even though it is the most important for adding context (on top ofr the current cell)
the notebook may not be on the same disk as the jupyter-server (if using a remote kernel)

This is a proof of concept for using the shared notebook model retrieved via jupyter-collaboration to populate the prefix/suffix with the content of the previous/following cells. Because RTC synchronises the notebook state in delta updates as changes happen, it would not add any latency and while it might be a few characters behind at times, this is not a problem because we only use it for previous/following cells which would have synchronised already, while using the text of the current cells as provided by frontend.

This PR is not intended to be merged in the current form, but to serve as a reference for discussion.

In particular:

we would not want to require the user to enable RTC to take advantage of this enhancement; a newly opened Separate out the frontend and the backend jupyter-collaboration#269 discusses splitting the frontend and backend of the jupyter-collaboration extension
we would want to offload the code for discovering the YNotebook to jupyter-collaboration (or whatever the right package would be) which would offer a public API for getting YNotebook (and other documents) from a jupyter-server extension (edit: removed implementation details questions, as this is now tracked in Public API to get a view of the shared document model jupyter-collaboration#270)
the details of what to do with the document (whether to only extract source of cells, how to concatenate them) may be best left for the third-party to customize; I think that while DefaultInlineCompletionHandler could hold a utility method for retrieving the shared notebook model, advanced users should be allowed to adjusted the way the notebook document gets used (how many cells get extracted, concatenated, etc), possibly by swapping the DefaultInlineCompletionHandler - see Allow to swap the DefaultInlineCompletionHandler #702

PoC for using jupter-collaboration for getting full notebook content

a26f453

krassowski added the enhancement New feature or request label Mar 28, 2024

krassowski mentioned this pull request Mar 29, 2024

Public API to get a view of the shared document model jupyterlab/jupyter-collaboration#270

Closed

5 tasks

Correct file format for files

d6c374b

krassowski mentioned this pull request Apr 11, 2024

Add a public API for getting a read-only view of the shared model jupyterlab/jupyter-collaboration#275

Merged

krassowski mentioned this pull request May 8, 2024

Document how to create completions using full notebook content #777

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use jupyter-collaboration to get the full notebook content for completions #708

Use jupyter-collaboration to get the full notebook content for completions #708

krassowski commented Mar 28, 2024 •

edited

Use jupyter-collaboration to get the full notebook content for completions #708

Are you sure you want to change the base?

Use jupyter-collaboration to get the full notebook content for completions #708

Conversation

krassowski commented Mar 28, 2024 • edited

krassowski commented Mar 28, 2024 •

edited