Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug setup for inference server & worker #3575

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Expand Up @@ -29,7 +29,7 @@
exclude: build|stubs|^bot/templates/$|openassistant/templates|docs/docs/api/openapi.json|scripts/postprocessing/regex_pii_detector.py

default_language_version:
python: python3
python: python3.10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@olliestanley This is what was causing pre-commit to fail on my machine. python3 is interpreted as python3.7, which is not new enough for some of the syntax, or isort. When I am more specific, like here, it works. Should this be examined some more?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this wouldn't be an issue if you ran the commands from inside a Python 3.10 virtual environment (I guess that's why others haven't had any similar issues) but I don't see any reason we can't make this config change if it helps make things easier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on a rolling release Linux system with Python 3.10.10. It worked correctly inside a Ubuntu Docker container with the same Python version.. so it's probably some other dependency that broke this


ci:
autofix_prs: true
Expand Down
32 changes: 32 additions & 0 deletions .vscode/launch.json
Expand Up @@ -106,6 +106,38 @@
"CUDA_VISIBLE_DEVICES": "1,2,3,4,5",
"OMP_NUM_THREADS": "1"
}
},
{
"name": "Debug: Inference Server",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/inference/server",
"remoteRoot": "/opt/inference/server"
}
],
"justMyCode": false
},
{
"name": "Debug: Worker",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5679
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the different ports for server and worker

},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/inference/worker",
"remoteRoot": "/opt/inference/worker"
}
],
"justMyCode": false
}
]
}
5 changes: 5 additions & 0 deletions docker-compose.yaml
Expand Up @@ -231,12 +231,14 @@ services:
TRUSTED_CLIENT_KEYS: "6969"
ALLOW_DEBUG_AUTH: "True"
API_ROOT: "http://localhost:8000"
DEBUG: "True"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the compose file is only meant for local development, so setting this here shouldn't be a problem?

volumes:
- "./oasst-shared:/opt/inference/lib/oasst-shared"
- "./inference/server:/opt/inference/server"
restart: unless-stopped
ports:
- "8000:8000"
- "5678:5678" # Port to attach debugger
depends_on:
inference-redis:
condition: service_healthy
Expand All @@ -254,9 +256,12 @@ services:
MODEL_CONFIG_NAME: ${MODEL_CONFIG_NAME:-distilgpt2}
BACKEND_URL: "ws://inference-server:8000"
PARALLELISM: 2
DEBUG: "True"
volumes:
- "./oasst-shared:/opt/inference/lib/oasst-shared"
- "./inference/worker:/opt/inference/worker"
ports:
- "5679:5679" # Port to attach debugger
deploy:
replicas: 1
profiles: ["inference"]
Expand Down
4 changes: 2 additions & 2 deletions docker/inference/Dockerfile.server
Expand Up @@ -78,8 +78,8 @@ USER ${APP_USER}
VOLUME [ "${APP_BASE}/lib/oasst-shared" ]
VOLUME [ "${APP_BASE}/lib/oasst-data" ]


CMD uvicorn main:app --reload --host 0.0.0.0 --port "${PORT}"
# In the dev image, we start uvicorn from Python so that we can attach the debugger
CMD python main.py



Expand Down
20 changes: 20 additions & 0 deletions inference/README.md
Expand Up @@ -60,6 +60,26 @@ python __main__.py
# You'll soon see a `User:` prompt, where you can type your prompts.
```

## Debugging

The inference server and the worker allow attaching a Python debugger. To do
this from VS Code, start the inference server & worker using docker compose as
described above (e.g. with `docker compose --profile inference up --build`),
then simply pick one of the following launch profiles, depending on what you
would like to debug:

- Debug: Inference Server
- Debug: Worker

### Waiting for Debugger on Startup

It can be helpful to wait for the debugger before starting the application. This
can be achieved by uncommenting `debugpy.wait_for_client()` in the appropriate
location:

- `inference/server/main.py` for the inference server
- `inference/worker/__main.py__` for the worker

## Distributed Testing

We run distributed load tests using the
Expand Down
18 changes: 18 additions & 0 deletions inference/server/main.py
Expand Up @@ -148,3 +148,21 @@ async def maybe_add_debug_api_keys():
async def welcome_message():
logger.warning("Inference server started")
logger.warning("To stop the server, press Ctrl+C")


if __name__ == "__main__":
import os

import uvicorn

port = int(os.getenv("PORT", "8000"))
is_debug = bool(os.getenv("DEBUG", "False"))

if is_debug:
import debugpy

debugpy.listen(("0.0.0.0", 5678))
# Uncomment to wait here until a debugger is attached
# debugpy.wait_for_client()

uvicorn.run("main:app", host="0.0.0.0", port=port, reload=is_debug)
Copy link
Contributor Author

@0xfacade 0xfacade Jul 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method of starting the server is only used for development - the docker image for production still invokes the uvicorn command. I could change that to also use python main.py instead for consistency, if desired.

1 change: 1 addition & 0 deletions inference/server/requirements.txt
Expand Up @@ -4,6 +4,7 @@ asyncpg
authlib
beautifulsoup4 # web_retriever plugin
cryptography==39.0.0
debugpy
fastapi-limiter
fastapi[all]==0.88.0
google-api-python-client
Expand Down
10 changes: 10 additions & 0 deletions inference/worker/__main__.py
@@ -1,4 +1,5 @@
import concurrent.futures
import os
import signal
import sys
import time
Expand Down Expand Up @@ -130,4 +131,13 @@ def main():


if __name__ == "__main__":
is_debug = bool(os.getenv("DEBUG", "False"))

if is_debug:
import debugpy

debugpy.listen(("0.0.0.0", 5679))
# Uncomment to wait here until a debugger is attached
# debugpy.wait_for_client()

main()
1 change: 1 addition & 0 deletions inference/worker/requirements.txt
@@ -1,4 +1,5 @@
aiohttp
debugpy
hf_transfer
huggingface_hub
langchain==0.0.142
Expand Down