Skip to content

Work in progress. An llm util to work as an evaluation step in RAG applications

Notifications You must be signed in to change notification settings

tollefj/LLM-retrieve-rerank-reason

Repository files navigation

$R^4$ - Explain ranked retrieval with LLMs

R^4: retrieve, rerank, relevance and reason Uses llama-3 8B by default.

Installation

Before anything, download a suitable torch version for your system. Any >2 version should suffice.

CPU only: pip install torch --index-url https://download.pytorch.org/whl/cpu

With CUDA: pip install torch

# downloads the huggingface models and llm (.gguf) to the models directory
make download  # download-nor or any other added language in the makefile with custom environments.
# installs llama-cpp-python
make install  # /install-cuda/install-m1
# runs the server on port 8000
make

Overview/architecture

There's two main components:

  1. the server (model located in models/llm.gguf)
    • runnable with make or ./serve.sh
    • uses llama-cpp python bindings
  2. the client (or example notebook)
    • a rag system that talks to the server

RAG workflow

See rag-pipeline.ipynb.

  1. load + preprocess data (json, csv, ...)
  2. initializes chromadb for local persistent storage
    • stores to the /chroma dir
    • the embedding model is listed in the environment file (.env)
  3. compute embeddings for the loaded data
  4. retrieve and rerank
    • separate steps:
      • retrieve:
        • docs = collection.query(query, n_results=N)
      • with reranking:
        • ranked = rank_collection(collection, reranker, query=query, top_n=N)
    • reranker defined in .env
  5. combine it with LLMs
    1. Fetch documents with get_ranked_and_contextualized
      • Rank documents with llm_rerank
        1. rank with reranker (rank_collection)
        2. reason about each result, and only use the ones that are deemed relevant
    2. For each ID (original sentence), extract a sliding window context (e.g., -2, sent_id, +2)
  6. reason about the query in the larger context (sliding window)

Offline usage

  1. make download
  2. copy the project to the target offline computer.

Environment variables

Any program utilizing huggingface models should use load_dotenv() for the correct environment variables. See the utils.generic module for info, allowing overriding variables from custom env files.

from utils.generic import init_dotenv
init_dotenv(custom_environments=".env-nor")
# your program

Usage and Examples

from llm import categorize_and_reason

categorize_and_reason(
    "A large language model (LLM) is a language model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification",
    "neural networks",
)
#{'relevance': 'relevant',
# 'reason': 'The document discusses a large language model, which is related to neural networks as they are used in developing such models for various natural language processing tasks.'}

categorize_and_reason("coffee", "neural networks")
#{'relevance': 'irrelevant',
# 'reason': "The term 'neural networks' refers to a subfield of artificial intelligence that focuses on algorithms and models inspired by biological neural networks, while 'coffee' is a beverage made from roasted coffee beans. There doesn't seem to be any direct connection between the two."}