Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug setup for inference server & worker #3575

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
34 changes: 33 additions & 1 deletion .vscode/launch.json
Expand Up @@ -106,6 +106,38 @@
"CUDA_VISIBLE_DEVICES": "1,2,3,4,5",
"OMP_NUM_THREADS": "1"
}
}
},
{
"name": "Debug: Inference Server",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/inference/server",
"remoteRoot": "/opt/inference/server"
}
],
"justMyCode": false
},
{
"name": "Debug: Worker",
"type": "python",
"request": "attach",
"connect": {
"host": "localhost",
"port": 5679
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the different ports for server and worker

},
"pathMappings": [
{
"localRoot": "${workspaceFolder}/inference/worker",
"remoteRoot": "/opt/inference/worker"
}
],
"justMyCode": false
},
]
}
5 changes: 5 additions & 0 deletions docker-compose.yaml
Expand Up @@ -231,12 +231,14 @@ services:
TRUSTED_CLIENT_KEYS: "6969"
ALLOW_DEBUG_AUTH: "True"
API_ROOT: "http://localhost:8000"
DEBUG: "True"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the compose file is only meant for local development, so setting this here shouldn't be a problem?

volumes:
- "./oasst-shared:/opt/inference/lib/oasst-shared"
- "./inference/server:/opt/inference/server"
restart: unless-stopped
ports:
- "8000:8000"
- "5678:5678" # Port to attach debugger
depends_on:
inference-redis:
condition: service_healthy
Expand All @@ -254,9 +256,12 @@ services:
MODEL_CONFIG_NAME: ${MODEL_CONFIG_NAME:-distilgpt2}
BACKEND_URL: "ws://inference-server:8000"
PARALLELISM: 2
DEBUG: "True"
volumes:
- "./oasst-shared:/opt/inference/lib/oasst-shared"
- "./inference/worker:/opt/inference/worker"
ports:
- "5679:5679" # Port to attach debugger
deploy:
replicas: 1
profiles: ["inference"]
Expand Down
4 changes: 2 additions & 2 deletions docker/inference/Dockerfile.server
Expand Up @@ -78,8 +78,8 @@ USER ${APP_USER}
VOLUME [ "${APP_BASE}/lib/oasst-shared" ]
VOLUME [ "${APP_BASE}/lib/oasst-data" ]


CMD uvicorn main:app --reload --host 0.0.0.0 --port "${PORT}"
# In the dev image, we start uvicorn from Python so that we can attach the debugger
CMD python main.py



Expand Down
16 changes: 16 additions & 0 deletions inference/server/main.py
Expand Up @@ -148,3 +148,19 @@ async def maybe_add_debug_api_keys():
async def welcome_message():
logger.warning("Inference server started")
logger.warning("To stop the server, press Ctrl+C")


if __name__ == "__main__":
import uvicorn
import os

port = int(os.getenv('PORT', "8000"))
is_debug = bool(os.getenv("DEBUG", "False"))

if is_debug:
import debugpy
debugpy.listen(("0.0.0.0", "5679"))
# Uncomment to wait here until a debugger is attached
# debugpy.wait_for_client()

uvicorn.run("main:app", host="0.0.0.0", port=port, reload=is_debug)
Copy link
Contributor Author

@0xfacade 0xfacade Jul 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method of starting the server is only used for development - the docker image for production still invokes the uvicorn command. I could change that to also use python main.py instead for consistency, if desired.

1 change: 1 addition & 0 deletions inference/server/requirements.txt
Expand Up @@ -4,6 +4,7 @@ asyncpg
authlib
beautifulsoup4 # web_retriever plugin
cryptography==39.0.0
debugpy
fastapi-limiter
fastapi[all]==0.88.0
google-api-python-client
Expand Down
10 changes: 10 additions & 0 deletions inference/worker/__main__.py
Expand Up @@ -4,6 +4,8 @@
import time
from contextlib import closing

import os

import pydantic
import transformers
import utils
Expand Down Expand Up @@ -130,4 +132,12 @@ def main():


if __name__ == "__main__":
is_debug = bool(os.getenv("DEBUG", "False"))

if is_debug:
import debugpy
debugpy.listen(("0.0.0.0", "5679"))
# Uncomment to wait here until a debugger is attached
# debugpy.wait_for_client()

main()
1 change: 1 addition & 0 deletions inference/worker/requirements.txt
@@ -1,4 +1,5 @@
aiohttp
debugpy
hf_transfer
huggingface_hub
langchain==0.0.142
Expand Down