Skip to content

Latest commit

 

History

History
133 lines (106 loc) · 6.46 KB

INSTALLATION_LLAMA.md

File metadata and controls

133 lines (106 loc) · 6.46 KB

📦 Installation (Code Llama / Llama 2)

This section shows how to install Incognito Pilot using the Llama models. Please note that you will only get satisfactory results with the largest model llama-2-70b-chat, which needs considerable hardware resources. And even then, the experience will not be comparable to GPT-4, since the Llama models are not fine-tuned to interact with tools like a code interpreter. You can also try smaller models specialized for coding, like codellama-34b-instruct, but the coding capabilities of Llama 2 are not really the bottleneck and thus the largest model usually gives better results.

Nevertheless, it's a lot of fun to see what's already possible with open-source models. At the moment, there are two ways of using Incognito Pilot with the Llama models:

  • Using a cloud API from replicate. While you don't have the advantage of a fully local setup here, you can try out the 70B model in a quick way without owning powerful hardware.
  • Using Hugging Face's Text Generation Inference container, which allows you to run llama 2 locally with a simple docker run command.

Replicate

Follow these steps:

  1. Install docker.
  2. Create an empty folder somewhere on your system. This will be the working directory to which Incognito Pilot has access to. The code interpreter can read your files in this folder and store any results. In the following, we assume it to be /home/user/ipilot.
  3. Create a Replicate account, add a credit card and copy your API key.
  4. Now, just run the following command (replace your working directory and API key):
docker run -i -t \
  -p 3030:80 \
  -e LLM="llama-replicate:replicate/llama-2-70b-chat:2796ee9483c3fd7aa2e171d38f4ca12251a30609463dcfd4cd76703f22e96cdf" \
  -e REPLICATE_API_KEY="your-replicate-api-key" \
  -e ALLOWED_HOSTS="localhost:3030" \
  -v /home/user/ipilot:/mnt/data \
  silvanmelchior/incognito-pilot:latest-slim

You can of course also choose a different model, but the smaller ones are much less suited for this task.

In the console, you should now see a URL. Open it, and you should see the Incognito Pilot interface.

Before you continue, remember:

  • Everything you type or every code result you approve is sent to the Replicate API
  • Your data stays and is processed locally

Does it work? Great, let's move to the Getting started section.

Text Generation Inference

Follow these steps:

  1. Install docker.
  2. Create an empty folder somewhere on your system. This will be the working directory to which Incognito Pilot has access to. The code interpreter can read your files in this folder and store any results. In the following, we assume it to be /home/user/ipilot.
  3. Create a Hugging Face account.
  4. Make sure you get access to the Llama 2 model weights on Hugging Face.
  5. In the Files and versions tab, download the following three files (we assume them to be in /home/user/tokenizer):
    • tokenizer.json
    • tokenizer.model
    • tokenizer_config.json
  6. Create an access token.

Now, let's first run the Text Generation Inference service. Check out their Readme. I had to run something similar to this:

docker run \
  --gpus all \
  --shm-size 1g \
  -p 8080:80 \
  -v /home/user/tgi_cache:/data
  -e HUGGING_FACE_HUB_TOKEN=hf_your-huggingface-api-token
  ghcr.io/huggingface/text-generation-inference \
  --model-id "meta-llama/Llama-2-70b-chat-hf"

You can of course also choose a different model, but the smaller ones are much less suited for this task. Once the container shows a success message, you are ready for the next step.

Visit http://localhost:8080/info. You should see a JSON with model information. Check out the value for max_total_tokens. It tells you how many tokens fit in the context for this model on your system. Incognito Pilot needs this information to not send too long messages to the service.

Now, just run the following command (replace your directories and max tokens):

docker run -i -t \
  -p 3030:80 \
  -e LLM="llama-tgi:http://host.docker.internal:8080" \
  -e MAX_TOKENS="your-max-tokens" \
  -e TOKENIZER_PATH="/mnt/tokenizer/tokenizer.model" \
  -v /home/user/tokenizer:/mnt/tokenizer \
  -v /home/user/ipilot:/mnt/data \
  silvanmelchior/incognito-pilot:latest-slim

In the console, you should now see a URL. Open it, and you should see the Incognito Pilot interface.

Congrats! You have a fully local setup, everything is running on your own system 🥳.

🚀 Getting started (Code Llama / Llama 2)

In the Incognito Pilot interface, you will see a chat interface, with which you can interact with the model. Let's try it out!

  1. File Access: Type "Create a text file with all numbers from 0 to 100". You will see how the Code part of the UI shows you a Python snippet. As soon as you approve, the code will be executed on your machine (within the docker container). You will see the result in the Result part of the UI. As soon as you approve it, it will be sent back to the model. In the case of using an API (like Replicate), this of course also means that this result will be sent to their services. After the approval, the model will confirm you the execution. Check your working directory now (e.g. /home/user/ipilot): You should see the file!
  2. Math: Type "What is 1 + 2 * 3 + 4 * 5 + 6 * 7 + 8 * 9?". The model will use the Python interpreter to come to the correct result.

Now you should be ready to use Incognito Pilot for your own tasks. One more thing: The version you just used has nearly no packages shipped with the Python interpreter. This means, things like reading images or Excel files will not work. To change this, head back to the console and press Ctrl-C to stop the container. Now re-run the command, but remove the -slim suffix from the image. This will download a much larger version, equipped with many packages.

Let's head back to the Settings section.