- Ubuntu 22.04 LTS
- CUDA 11.8
- Python 3.10.12
- LLaVA v1.2.0 (LLaVA 1.6)
- Torch 2.1.2
- runpodctl
- croc
- rclone
- speedtest-cli
- screen
- tmux
- llava-v1.6-mistral-7b model
This image is designed to work on RunPod. You can use my custom RunPod template to launch it on RunPod.
docker run -d \
--gpus all \
-v /workspace \
-p 3000:3001 \
-p 8888:8888 \
-e JUPYTER_PASSWORD=Jup1t3R! \
ashleykza/llava:latest
You can obviously substitute the image name and tag with your own.
Important
If you select a 13B or larger model, CUDA will result in OOM errors with a GPU that has less than 48GB of VRAM, so A6000 or higher is recommended for 13B.
You can add an environment called MODEL
to your Docker container to
specify the model that should be downloaded. If the MODEL
environment
variable is not set, the model will default to liuhaotian/llava-v1.6-mistral-7b
.
Model | Environment Variable Value | Version | LLM | Default |
---|---|---|---|---|
llava-v1.6-vicuna-7b | liuhaotian/llava-v1.6-vicuna-7b | LLaVA-1.6 | Vicuna-7B | no |
llava-v1.6-vicuna-13b | liuhaotian/llava-v1.6-vicuna-13b | LLaVA-1.6 | Vicuna-13B | no |
llava-v1.6-mistral-7b | liuhaotian/llava-v1.6-mistral-7b | LLaVA-1.6 | Mistral-7B | yes |
llava-v1.6-34b | liuhaotian/llava-v1.6-34b | LLaVA-1.6 | Hermes-Yi-34B | no |
Model | Environment Variable Value | Version | Size | Default |
---|---|---|---|---|
llava-v1.5-7b | liuhaotian/llava-v1.5-7b | LLaVA-1.5 | 7B | no |
llava-v1.5-13b | liuhaotian/llava-v1.5-13b | LLaVA-1.5 | 13B | no |
BakLLaVA-1 | SkunkworksAI/BakLLaVA-1 | LLaVA-1.5 | 7B | no |
Port | Description |
---|---|
3000 | LLaVA |
8888 | Jupyter Lab |
Variable | Description | Default |
---|---|---|
JUPYTER_PASSWORD | Password for Jupyter Lab | Jup1t3R! |
DISABLE_AUTOLAUNCH | Disable LLaVA from launching automatically | enabled |
MODEL | The path of the Huggingface model | liuhaotian/llava-v1.6-mistral-7b |
If you are running the RunPod template, edit your pod and add HTTP port 5000.
If you are running locally, add a port mapping for port 5000.
# Stop model worker and controller to free up VRAM
fuser -k 10000/tcp 40000/tcp
# Install dependencies
source /workspace/venv/bin/activate
pip3 install flask protobuf
cd /workspace/LLaVA
export HF_HOME="/workspace"
python -m llava.serve.api -H 0.0.0.0 -p 5000
You can use the test script to test your API.
- Matthew Berman for giving me a demo on LLaVA, as well as his amazing YouTube videos.
Pull requests and issues on GitHub are welcome. Bug fixes and new features are encouraged.
You can contact me and get help with deploying your container to RunPod on the RunPod Discord Server below, my username is ashleyk.