Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tensorrt-llm backend] A question about launch_triton_server.py #455

Open
victorsoda opened this issue May 13, 2024 · 0 comments
Open

[tensorrt-llm backend] A question about launch_triton_server.py #455

victorsoda opened this issue May 13, 2024 · 0 comments

Comments

@victorsoda
Copy link

Question

The codes in launch_triton_server.py:

def get_cmd(world_size, tritonserver, grpc_port, http_port, metrics_port,
            model_repo, log, log_file, tensorrt_llm_model_name):
    cmd = ['mpirun', '--allow-run-as-root']
    for i in range(world_size):
        cmd += ['-n', '1', tritonserver, f'--model-repository={model_repo}']
        if log and (i == 0):
            cmd += ['--log-verbose=3', f'--log-file={log_file}']
        # If rank is not 0, skip loading of models other than `tensorrt_llm_model_name`
        if (i != 0):
            cmd += ['--model-control-mode=explicit']
            model_names = tensorrt_llm_model_name.split(',')
            for name in model_names:
                cmd += [f'--load-model={name}']
        cmd += [
            f'--grpc-port={grpc_port}', f'--http-port={http_port}',
            f'--metrics-port={metrics_port}', '--disable-auto-complete-config',
            f'--backend-config=python,shm-region-prefix-name=prefix{i}_', ':'
        ]
    return cmd

When world_size = 2 for example, 2 triton servers will be launched using the same grpc port (e.g., 8001).
But how could this be possible?
When I tried to do something similar, I got the following error while launching the second server:

I0513 03:43:28.353306 21205 grpc_server.cc:2466] Started GRPCInferenceService at 0.0.0.0:8001
I0513 03:43:28.353458 21205 http_server.cc:4636] Started HTTPService at 0.0.0.0:8000
E0513 03:43:28.353559006   21206 chttp2_server.cc:1080]      UNKNOWN:No address added out of total 1 resolved for '0.0.0.0:8001' {created_time:"2024-05-13T03:43:28.353510541+00:00", children:[UNKNOWN:Failed to add any wildcard listeners {created_time:"2024-05-13T03:43:28.353503146+00:00", children:[UNKNOWN:Address family not supported by protocol {target_address:"[::]:8001", syscall:"socket", os_error:"Address family not supported by protocol", errno:97, created_time:"2024-05-13T03:43:28.353465612+00:00"}, UNKNOWN:Unable to configure socket {fd:6, created_time:"2024-05-13T03:43:28.353493367+00:00", children:[UNKNOWN:Address already in use {syscall:"bind", os_error:"Address already in use", errno:98, created_time:"2024-05-13T03:43:28.353488259+00:00"}]}]}]}
E0513 03:43:28.353650 21206 main.cc:245] failed to start GRPC service: Unavailable - Socket '0.0.0.0:8001' already in use

Background

I've been developing my triton backend drawing on the experience of https://github.com/triton-inference-server/tensorrtllm_backend.

I have already built two engines (tensor parallel, tp_size = 2) of the llama2-7b model.
It's ok to run something like mpirun -np 2 python3.8 run.py to load the two engines, run tensor-parallel inference, and get the correct results.

My goal now is to run the same two engines by the triton server.

I have already implemented the run.py logic in the model.py (initialize() and execute() functions) in my python backend.

Following launch_triton_server.py, I tried the following command line:

mpirun --allow-run-as-root -n 1 /opt/tritonserver/bin/tritonserver --model-repository=./model_repository --grpc-port=8001 --http-port=8000 --metrics-port=8002 --disable-auto-complete-config --backend-config=python,shm-region-prefix-name=prefix0_ : -n 1 /opt/tritonserver/bin/tritonserver --model-repository=./model_repository --model-control-mode=explicit --load-model=llama2_7b --grpc-port=8001 --http-port=8000 --metrics-port=8002 --disable-auto-complete-config --backend-config=python,shm-region-prefix-name=prefix1_ :

Then I got the error as above.

Could you please tell me what I did wrong and how I can fix the error? Thanks a lot!

@rmccorm4 rmccorm4 transferred this issue from triton-inference-server/server May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant