Talk to your Jupyter Notebooks

Question-answering with RAG (Retrieval-Augmented Generation) over your Jupyter Notebooks.

Index Jupyter Notebooks and run RAG using LLMs locally:

Llama 2 (7B) using Ollama
bge-large-en embedding model

Use Case

Jupyter Notebooks have become de facto a go-to tool for development in Data Science, Machine Learning, and Data Analytics. It seamlessly integrates code, visualizations, and explanatory text, facilitating efficient exploration, analysis, documentation, and sharing of insights

However, as data science, machine learning, and data analysis projects progress, they often accumulate a large number of notebooks, leading to difficulties in usability, management, and addressing questions or queries, creating what is commonly referred to as a "notebook hell" scenario, as illustrated in the example below:

(Image Source)

Talk directly to your notebooks with local retrieval-augmented generation (RAG). Index your notebooks and start asking questions. RAG retrieves relevant code and explanations, providing instant answers within your local environment.

Running Large Language Models (LLMs) locally can help avoid sending code or sensitive information to proprietary LLM services by keeping all computations and data processing confined to the local environment, ensuring complete privacy and control.

Setup

ollama.ai is used to run Llama 2 locally, see details on installation here.

Setup python environment:

$ PIPENV_VENV_IN_PROJECT=1 pipenv shell
$ pipenv install

Workflow

Run the streamlit app

$ make run-notebooks-rag

Index your Jupyter Notebooks providing the path to folder.

NOTE: for the testing purposes I used kaggle notebooks (2 pinned notebooks) from the famous Titanic competition. This also explains some of the prompts below.

An example of a prompt for checking which notebooks were indexed:

Some examples of prompts/results:

Next steps

Improve chunking, metadata and retrieval
RAG Evaluation
Utilize images in RAG
Index figures
Embedding on small chunks, retrieve bigger chunks
Leverage notebooks metadata
Hybrid Search

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
notebooks		notebooks
.gitignore		.gitignore
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
qa_rag_over_notebooks.py		qa_rag_over_notebooks.py
talk_to_notebooks_app.py		talk_to_notebooks_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

notebooks

notebooks

.gitignore

.gitignore

Makefile

Makefile

Pipfile

Pipfile

Pipfile.lock

Pipfile.lock

README.md

README.md

qa_rag_over_notebooks.py

qa_rag_over_notebooks.py

talk_to_notebooks_app.py

talk_to_notebooks_app.py

Repository files navigation

Talk to your Jupyter Notebooks

Use Case

Setup

Workflow

Next steps

About

Languages

delta1epsilon/talk-to-your-jupyter-notebooks

Folders and files

Latest commit

History

Repository files navigation

Talk to your Jupyter Notebooks

Use Case

Setup

Workflow

Next steps

About

Topics

Resources

Stars

Watchers

Forks

Languages