RAGGA: Retrieval Augmented Generation General Assistant

Table of Contents

RAGGA: Retrieval Augmented Generation General Assistant

RAGGA

What is this? Load quantized LLMs and run them on your local devices. Interact with your own notes (currently markdown notes are supported by default) and ask questions about what you have written and get relevant answers from what you have written! Recommended model is TinyLlama-chat https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/blob/main/README.md based on its speed of token generation and it's low system requirements. Try the 8-bit quantized model to start with and if that's still too beefy you can always try 5, 4, etc.

Basically, RAGGA is an local LLM based AI assistant that cares about your privacy, and does not require running things via untrusted third party APIs. The one exception is the web search feature, which can be completely disabled, so nothing ever leaves your system.

Example Usage

Download one of the TinyLlama-chat quantized model (e.g. 8-bit quantized TinyLlama-1.1B-Chat-v1.0-GGUF) and copy it to the models directory.

Install all the prerequistes and then install RAGGA.

Edit the config.yaml file to point to the model you downloaded in the generator: llm_path:, and update the dataset: path: key to point to your directory containing markdown files.

Run the tinyllama_example.py script to try it out.

Prerequisites

Due to issues with hatch not allowing pip options completely:

PyTorch needs to be installed manually
llama-cpp-python needs to be installed manually
meaning you need a c++ compiler (e.g. Visual Studio 2022 Community C++, build-essentials, etc)

When the issues are resolved with hatch, this will become significantly easier to install.

Both CPU and GPU will require llama-cpp-python to be installed. The pip --pre flag is required because the current stable release as of writing does not support phi-2.

CPU (Windows, Linux, and macOS)

Any python 3.11 installation, venv, conda, etc.
Faiss (CPU) will be installed with the package
Install PyTorch
- Windows / MacOS
  - pip install torch
- Linux
  - pip install torch --index-url https://download.pytorch.org/whl/cpu
Install llama-cpp-python
- Windows / Linux / MacOS: Default (supports CPU without acceleration)
  - pip install --pre llama-cpp-python
or Install llama-cpp-python with OpenBLAS Hardware Acceleration (optional)
- Windows: OpenBLAS
  - $env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
  - pip install --pre llama-cpp-python
- Linux/MacOS: OpenBLAS
  - CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python

GPU (Windows and Linux only)

Make sure you have CUDA 12.1 or higher installed.

Install miniforge / miniconda
Create a new environment with python 3.11 conda create -n ragga python=3.11
Activate that environment and install faiss, pytorch and llama-cpp-python
- conda activate ragga
- conda install faiss-gpu pytorch pytorch-cuda=12.1 -c pytorch -c nvidia -c conda-forge
- Windows: llama-cpp-python with CUBLAS acceleration
  - $env:CMAKE_ARGS="-DLLAMA_CUBLAS=on"
  - pip install --pre llama-cpp-python
- Linux: llama-cpp-python with CUBLAS acceleration
  - CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install --pre llama-cpp-python

Note: You do not need to use conda/mamba to install faiss-gpu, but as there are no wheels for it, you will need to compile it yourself, this is not covered here but see the faiss build documentation

Installation

CPU (Windows, Linux, and macOS)

pip install 'ragga[cpu] @ https://github.com/zeyus/RAGGA/releases/download/v0.0.6/ragga-0.0.6-py3-none-any.whl'

GPU (Windows and Linux only)

pip install 'ragga @ https://github.com/zeyus/RAGGA/releases/download/v0.0.5/ragga-0.0.5-py3-none-any.whl'

Model Selection

Subjective evaluation of 3 different models was performed by generating responses to general and dataset-specific questions for 4 publicly available Obsidian notes repositories. The results for response quality were ranked from 0-10 for each question, responses were shown anonymously so no indication of which model generated the responses was visible.

The phi-2 and phi-2 chat used the same base model but a different prompt template. For evaluation and rating details see model_tests.py and scripts/03_rank_output.py.

License

ragga is distributed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
data		data
models		models
reports		reports
scripts		scripts
src/ragga		src/ragga
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
model.py		model.py
model_tests.py		model_tests.py
pyproject.toml		pyproject.toml
requirements-core.txt		requirements-core.txt
requirements-cpu.txt		requirements-cpu.txt
requirements-eval.txt		requirements-eval.txt
requirements-gpu.txt		requirements-gpu.txt
tinyllama_example.py		tinyllama_example.py

License

zeyus/RAGGA

Folders and files

Latest commit

History

Repository files navigation

RAGGA: Retrieval Augmented Generation General Assistant

RAGGA

Example Usage

Prerequisites

CPU (Windows, Linux, and macOS)

GPU (Windows and Linux only)

Installation

CPU (Windows, Linux, and macOS)

GPU (Windows and Linux only)

Model Selection

License

About

Resources

License

Stars

Watchers

Forks

Languages