$R^4$ - Explain ranked retrieval with LLMs

R^4: retrieve, rerank, relevance and reason Uses llama-3 8B by default.

Installation

Before anything, download a suitable torch version for your system. Any >2 version should suffice.

CPU only: pip install torch --index-url https://download.pytorch.org/whl/cpu

With CUDA: pip install torch

# downloads the huggingface models and llm (.gguf) to the models directory
make download  # download-nor or any other added language in the makefile with custom environments.
# installs llama-cpp-python
make install  # /install-cuda/install-m1
# runs the server on port 8000
make

Overview/architecture

There's two main components:

the server (model located in models/llm.gguf)
- runnable with make or ./serve.sh
- uses llama-cpp python bindings
the client (or example notebook)
- a rag system that talks to the server

RAG workflow

See rag-pipeline.ipynb.

load + preprocess data (json, csv, ...)
initializes chromadb for local persistent storage
- stores to the /chroma dir
- the embedding model is listed in the environment file (.env)
compute embeddings for the loaded data
retrieve and rerank
- separate steps:
  - retrieve:
    - docs = collection.query(query, n_results=N)
  - with reranking:
    - ranked = rank_collection(collection, reranker, query=query, top_n=N)
- reranker defined in .env
combine it with LLMs
1. Fetch documents with get_ranked_and_contextualized
  - Rank documents with llm_rerank
    1. rank with reranker (rank_collection)
    2. reason about each result, and only use the ones that are deemed relevant
2. For each ID (original sentence), extract a sliding window context (e.g., -2, sent_id, +2)
reason about the query in the larger context (sliding window)

Offline usage

make download
copy the project to the target offline computer.

Environment variables

Any program utilizing huggingface models should use load_dotenv() for the correct environment variables. See the utils.generic module for info, allowing overriding variables from custom env files.

from utils.generic import init_dotenv
init_dotenv(custom_environments=".env-nor")
# your program

Usage and Examples

from llm import categorize_and_reason

categorize_and_reason(
    "A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification",
    "neural networks",
)
#{'relevance': 'relevant',
# 'reason': 'The document discusses a large language model, which is related to neural networks as they are used in developing such models for various natural language processing tasks.'}

categorize_and_reason("coffee", "neural networks")
#{'relevance': 'irrelevant',
# 'reason': "The term 'neural networks' refers to a subfield of artificial intelligence that focuses on algorithms and models inspired by biological neural networks, while 'coffee' is a beverage made from roasted coffee beans. There doesn't seem to be any direct connection between the two."}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
examples		examples
server		server
utils		utils
.env		.env
.env-nor		.env-nor
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
chroma.py		chroma.py
install_local_models.py		install_local_models.py
llm.py		llm.py
llm_config.yml		llm_config.yml
rag-full-pipeline.ipynb		rag-full-pipeline.ipynb
rag-pipeline.ipynb		rag-pipeline.ipynb
requirements.txt		requirements.txt
rerank.py		rerank.py
serve.sh		serve.sh
simple-example.ipynb		simple-example.ipynb
topic.py		topic.py
zip.py		zip.py

tollefj/LLM-retrieve-rerank-reason

Folders and files

Latest commit

History

Repository files navigation

$R^4$ - Explain ranked retrieval with LLMs

Installation

Overview/architecture

RAG workflow

Offline usage

Environment variables

Usage and Examples

About

Topics

Resources

Stars

Watchers

Forks

Languages