R^4: retrieve, rerank, relevance and reason
Uses llama-3 8B by default.
Before anything, download a suitable torch version for your system. Any >2 version should suffice.
CPU only:
pip install torch --index-url https://download.pytorch.org/whl/cpu
With CUDA:
pip install torch
# downloads the huggingface models and llm (.gguf) to the models directory
make download # download-nor or any other added language in the makefile with custom environments.
# installs llama-cpp-python
make install # /install-cuda/install-m1
# runs the server on port 8000
make
There's two main components:
- the server (model located in
models/llm.gguf
)- runnable with
make
or./serve.sh
- uses llama-cpp python bindings
- runnable with
- the client (or example notebook)
- a rag system that talks to the server
See rag-pipeline.ipynb
.
- load + preprocess data (json, csv, ...)
- initializes chromadb for local persistent storage
- stores to the
/chroma
dir - the embedding model is listed in the environment file (
.env
)
- stores to the
- compute embeddings for the loaded data
- retrieve and rerank
- separate steps:
- retrieve:
docs = collection.query(query, n_results=N)
- with reranking:
ranked = rank_collection(collection, reranker, query=query, top_n=N)
- retrieve:
- reranker defined in
.env
- separate steps:
- combine it with LLMs
- Fetch documents with
get_ranked_and_contextualized
- Rank documents with
llm_rerank
- rank with reranker (
rank_collection
) - reason about each result, and only use the ones that are deemed relevant
- rank with reranker (
- Rank documents with
- For each ID (original sentence), extract a sliding window context (e.g., -2, sent_id, +2)
- Fetch documents with
- reason about the query in the larger context (sliding window)
make download
- copy the project to the target offline computer.
Any program utilizing huggingface models should use load_dotenv()
for the correct environment variables.
See the utils.generic
module for info, allowing overriding variables from custom env files.
from utils.generic import init_dotenv
init_dotenv(custom_environments=".env-nor")
# your program
from llm import categorize_and_reason
categorize_and_reason(
"A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification",
"neural networks",
)
#{'relevance': 'relevant',
# 'reason': 'The document discusses a large language model, which is related to neural networks as they are used in developing such models for various natural language processing tasks.'}
categorize_and_reason("coffee", "neural networks")
#{'relevance': 'irrelevant',
# 'reason': "The term 'neural networks' refers to a subfield of artificial intelligence that focuses on algorithms and models inspired by biological neural networks, while 'coffee' is a beverage made from roasted coffee beans. There doesn't seem to be any direct connection between the two."}