Skip to content

Commit

Permalink
Fixes #1602
Browse files Browse the repository at this point in the history
  • Loading branch information
pseudotensor committed May 13, 2024
1 parent 6bda835 commit 378c0ff
Show file tree
Hide file tree
Showing 3 changed files with 64 additions and 1 deletion.
61 changes: 61 additions & 0 deletions docs/README_DOCKER.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,67 @@ For single GPU use `--gpus '"device=0"'` or for 2 GPUs use `--gpus '"device=0,1"

See [README_GPU](README_GPU.md) for more details about what to run.

## Run h2oGPT in docker offline:

Ensure $HOME/users and $HOME/db_nonusers are writeable by user running docker, then run:
```bash

export TRANSFORMERS_OFFLINE=1
export GRADIO_SERVER_PORT=7860
export OPENAI_SERVER_PORT=5000
export HF_HUB_OFFLINE=1
docker run --gpus all \
--runtime=nvidia \
--shm-size=2g \
-e TRANSFORMERS_OFFLINE=$TRANSFORMERS_OFFLINE \
-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
-e HF_HUB_OFFLINE=$HF_HUB_OFFLINE \
-e HF_HOME="/workspace/.cache/huggingface/" \
-p $GRADIO_SERVER_PORT:$GRADIO_SERVER_PORT \
-p $OPENAI_SERVER_PORT:$OPENAI_SERVER_PORT \
--rm --init \
--network host \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-v "${HOME}"/.cache/huggingface/:/workspace/.cache/huggingface \
-v "${HOME}"/.cache/torch/:/workspace/.cache/torch \
-v "${HOME}"/.cache/transformers/:/workspace/.cache/transformers \
-v "${HOME}"/save:/workspace/save \
-v "${HOME}"/user_path:/workspace/user_path \
-v "${HOME}"/db_dir_UserData:/workspace/db_dir_UserData \
-v "${HOME}"/users:/workspace/users \
-v "${HOME}"/db_nonusers:/workspace/db_nonusers \
-v "${HOME}"/llamacpp_path:/workspace/llamacpp_path \
-e GRADIO_SERVER_PORT=$GRADIO_SERVER_PORT \
gcr.io/vorvan/h2oai/h2ogpt-runtime:0.2.0 \
/workspace/generate.py \
--base_model=mistralai/Mistral-7B-Instruct-v0.2 \
--use_safetensors=False \
--prompt_type=mistral \
--save_dir='/workspace/save/' \
--use_gpu_id=False \
--user_path=/workspace/user_path \
--langchain_mode="LLM" \
--langchain_modes="['UserData', 'MyData', 'LLM']" \
--score_model=None \
--max_max_new_tokens=2048 \
--max_new_tokens=1024 \
--visible_visible_models=False \
--openai_port=$OPENAI_SERVER_PORT \
--gradio_offline_level=2 --gradio_offline_level=2
```
Depending upon if use links, may require more specific mappings to direct location not linked location that cannot be used, e.g.
```bash
-v "${HOME}"/.cache/huggingface/hub:/workspace/.cache/huggingface/hub \
-v "${HOME}"/.cache:/workspace/.cache \
```
You can also specify the cache location:
```bash
-e TRANSFORMERS_CACHE="/workspace/.cache/" \
```


## Run h2oGPT + vLLM or vLLM using Docker

One can run an inference server in one docker and h2oGPT in another docker.
Expand Down
2 changes: 2 additions & 0 deletions docs/README_offline.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,8 @@ You can also do same for h2oGPT, but take note that if you pass absolute path fo
python generate.py --inference_server="vllm:0.0.0.0:5000" --base_model='$HOME/.cache/huggingface/hub/models--meta-llama--Llama-2-13b-chat-hf/snapshots/c2f3ec81aac798ae26dcc57799a994dfbf521496' --score_model=None --langchain_mode='UserData' --user_path=user_path --use_auth_token=True --max_seq_len=4096 --max_max_new_tokens=2048 --concurrency_count=64 --batch_size=16 --prompt_type=llama2 --add_disk_models_to_ui=False
```

See [README_docker](README_docker.md) for more details on running h2oGPT in offline mode for docker.

### Disable access or port

To ensure nobody can access your gradio server, disable the port via firewall. If that is a hassle, then one can enable authentication by adding to CLI when running `python generate.py`:
Expand Down
2 changes: 1 addition & 1 deletion src/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "044644450f10798f5484ed30ed24b0d851ee5646"
__version__ = "6bda835b462780f30e42e4453dc58a152119ab75"

0 comments on commit 378c0ff

Please sign in to comment.