Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1 #1584

Open
2 of 4 tasks
gloritygithub11 opened this issue May 13, 2024 · 7 comments
Open
2 of 4 tasks
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@gloritygithub11
Copy link

System Info

tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.10.0.dev2024050700

Who can help?

@byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

build with following script, it could build success

set -e

export MODEL_DIR=/mnt/memory
export MODEL_NAME=Mixtral-8x7B-Instruct-v0.1
export LD_LIBRARY_PATH=/usr/local/tensorrt/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/tensorrt/bin:$PATH
export PRECISION=fp16
export DTYPE=bfloat16
export TP_SIZE=4


python ../llama/convert_checkpoint.py \
    --model_dir $MODEL_DIR/${MODEL_NAME} \
    --output_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
    --dtype $DTYPE \
    --tp_size $TP_SIZE

trtllm-build \
    --checkpoint_dir $MODEL_DIR/tmp/trt_models/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
    --output_dir $MODEL_DIR/tmp/trt_engines/${MODEL_NAME}/$PRECISION/${TP_SIZE}-gpu-tp${TP_SIZE} \
    --gemm_plugin $DTYPE \
    --gpt_attention_plugin $DTYPE \
    --use_fused_mlp \
    --max_batch_size 1 \
    --max_input_len 2048 \
    --max_output_len 1024 

load the engine

import tensorrt as trt

# Initialize TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.INFO)

# Function to load TensorRT engine
def load_engine(engine_path):
    with open(engine_path, 'rb') as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

rank = 0
# Determine the engine file based on the rank
engine_path = f'/mnt/memory/tmp/trt_engines/Mixtral-8x7B-Instruct-v0.1/fp16/4-gpu-tp4/rank{rank}.engine'
print(f"Process {rank} loading engine from {engine_path}")
load_engine(engine_path)

get error:

Process 0 loading engine from /mnt/memory/tmp/trt_engines/Mixtral-8x7B-Instruct-v0.1/fp16/4-gpu-tp4/rank0.engine
[05/13/2024-02:59:35] [TRT] [I] Loaded engine size: 22480 MiB
[05/13/2024-02:59:37] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/13/2024-02:59:37] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/13/2024-02:59:37] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)

Expected behavior

The engine could load success

actual behavior

fail to load engine

additional notes

plugin dir

  llama git:(trtllm-build2) ✗ ll /app/tensorrt-llm/cpp/build/tensorrt_llm/plugins           
total 447M
drwxr-xr-x 3 root root  106 May 10 11:17 CMakeFiles
-rw-r--r-- 1 root root  52K May 10 11:17 Makefile
drwxr-xr-x 3 root root   67 May 10 11:17 bertAttentionPlugin
-rw-r--r-- 1 root root 5.3K May 10 11:17 cmake_install.cmake
drwxr-xr-x 3 root root   67 May 10 11:17 common
drwxr-xr-x 3 root root   67 May 10 11:17 cumsumLastDimPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 gemmPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 gptAttentionCommon
drwxr-xr-x 3 root root   67 May 10 11:17 gptAttentionPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 identityPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 layernormQuantizationPlugin
lrwxrwxrwx 1 root root   36 May 10 13:27 libnvinfer_plugin_tensorrt_llm.so -> libnvinfer_plugin_tensorrt_llm.so.10
lrwxrwxrwx 1 root root   40 May 10 13:27 libnvinfer_plugin_tensorrt_llm.so.10 -> libnvinfer_plugin_tensorrt_llm.so.10.0.1
-rwxr-xr-x 1 root root 447M May 10 13:27 libnvinfer_plugin_tensorrt_llm.so.10.0.1
drwxr-xr-x 3 root root   67 May 10 11:17 lookupPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 loraPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 lruPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 mambaConv1dPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 mixtureOfExperts
drwxr-xr-x 3 root root   67 May 10 11:17 ncclPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 quantizePerTokenPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 quantizeTensorPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 rmsnormQuantizationPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 selectiveScanPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 smoothQuantGemmPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 weightOnlyGroupwiseQuantMatmulPlugin
drwxr-xr-x 3 root root   67 May 10 11:17 weightOnlyQuantMatmulPlugin

list plugins with script

import tensorrt as trt

# Initialize the TensorRT logger
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

def list_plugins():
    plugin_registry = trt.get_plugin_registry()
    if plugin_registry is None:
        print("No plugin registry found.")
        return

    plugin_creators = plugin_registry.plugin_creator_list
    num_plugins = len(plugin_creators)
    print(f"Number of registered plugins: {num_plugins}")

    for i, plugin_creator in enumerate(plugin_creators):
        print(f"Plugin {i + 1}, Name: {plugin_creator.name}, Version: {plugin_creator.plugin_version}")

if __name__ == "__main__":
    list_plugins()

and get following plugin list

Number of registered plugins: 30
Plugin 1, Name: CaskDeconvV2RunnerWeightsTransformerPlugin, Version: 1
Plugin 2, Name: CaskDeconvV1RunnerWeightsTransformerPlugin, Version: 1
Plugin 3, Name: CaskConvolutionRunnerWeightsTransformerPlugin, Version: 1
Plugin 4, Name: CaskFlattenConvolutionRunnerWeightsTransformerPlugin, Version: 1
Plugin 5, Name: CaskConvActPoolWeightsTransformerPlugin, Version: 1
Plugin 6, Name: CaskDepSepConvWeightsTransformerPlugin, Version: 1
Plugin 7, Name: MyelinWeightsTransformPlugin, Version: 1
Plugin 8, Name: DisentangledAttention_TRT, Version: 1
Plugin 9, Name: CustomEmbLayerNormPluginDynamic, Version: 1
Plugin 10, Name: CustomEmbLayerNormPluginDynamic, Version: 2
Plugin 11, Name: CustomEmbLayerNormPluginDynamic, Version: 3
Plugin 12, Name: CustomFCPluginDynamic, Version: 1
Plugin 13, Name: CustomGeluPluginDynamic, Version: 1
Plugin 14, Name: GroupNormalizationPlugin, Version: 1
Plugin 15, Name: CustomSkipLayerNormPluginDynamic, Version: 3
Plugin 16, Name: CustomSkipLayerNormPluginDynamic, Version: 4
Plugin 17, Name: CustomSkipLayerNormPluginDynamic, Version: 1
Plugin 18, Name: CustomSkipLayerNormPluginDynamic, Version: 2
Plugin 19, Name: RnRes2Br1Br2c_TRT, Version: 1
Plugin 20, Name: RnRes2Br1Br2c_TRT, Version: 2
Plugin 21, Name: RnRes2Br2bBr2c_TRT, Version: 1
Plugin 22, Name: RnRes2Br2bBr2c_TRT, Version: 2
Plugin 23, Name: RnRes2FullFusion_TRT, Version: 1
Plugin 24, Name: SingleStepLSTMPlugin, Version: 1
Plugin 25, Name: CustomQKVToContextPluginDynamic, Version: 3
Plugin 26, Name: CustomQKVToContextPluginDynamic, Version: 1
Plugin 27, Name: CustomQKVToContextPluginDynamic, Version: 2
Plugin 28, Name: DLRM_BOTTOM_MLP_TRT, Version: 1
Plugin 29, Name: SmallTileGEMM_TRT, Version: 1
Plugin 30, Name: RNNTEncoderPlugin, Version: 1
@gloritygithub11 gloritygithub11 added the bug Something isn't working label May 13, 2024
@byshiue
Copy link
Collaborator

byshiue commented May 15, 2024

It is caused by the mismatch of TRT version. Have you rebuild the docker image when you upgrade the TensorRT-LLM to 0.10.0? Since TensorRT-LLM 0.10.0 uses TensorRT 10 while older TensorRT-LLM uses TensorRT 9.

@byshiue byshiue self-assigned this May 15, 2024
@byshiue byshiue added triaged Issue has been triaged by maintainers and removed bug Something isn't working labels May 15, 2024
@gloritygithub11
Copy link
Author

@byshiue yes, I've rebuild the docker image. You can see in my above list, the tensorrt already 0.10.1: libnvinfer_plugin_tensorrt_llm.so.10.0.1

ll /usr/local/tensorrt/lib/
total 3.5G
lrwxrwxrwx 1 root root   20 Apr 15 23:25 libnvinfer.so -> libnvinfer.so.10.0.1
lrwxrwxrwx 1 root root   20 Apr 15 23:25 libnvinfer.so.10 -> libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 224M Apr 15 23:25 libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 1.3G Apr 15 23:26 libnvinfer_builder_resource.so.10.0.1
lrwxrwxrwx 1 root root   29 Apr 15 23:22 libnvinfer_dispatch.so -> libnvinfer_dispatch.so.10.0.1
lrwxrwxrwx 1 root root   29 Apr 15 23:22 libnvinfer_dispatch.so.10 -> libnvinfer_dispatch.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:22 libnvinfer_dispatch.so.10.0.1
-rw-r--r-- 1 root root 751K Apr 15 23:22 libnvinfer_dispatch_static.a
lrwxrwxrwx 1 root root   25 Apr 15 23:22 libnvinfer_lean.so -> libnvinfer_lean.so.10.0.1
lrwxrwxrwx 1 root root   25 Apr 15 23:22 libnvinfer_lean.so.10 -> libnvinfer_lean.so.10.0.1
-rwxr-xr-x 1 root root  33M Apr 15 23:22 libnvinfer_lean.so.10.0.1
-rw-r--r-- 1 root root 243M Apr 15 23:22 libnvinfer_lean_static.a
lrwxrwxrwx 1 root root   27 Apr 15 23:26 libnvinfer_plugin.so -> libnvinfer_plugin.so.10.0.1
lrwxrwxrwx 1 root root   27 Apr 15 23:26 libnvinfer_plugin.so.10 -> libnvinfer_plugin.so.10.0.1
-rwxr-xr-x 1 root root  33M Apr 15 23:26 libnvinfer_plugin.so.10.0.1
-rw-r--r-- 1 root root  37M Apr 15 23:26 libnvinfer_plugin_static.a
-rw-r--r-- 1 root root 1.7G Apr 15 23:26 libnvinfer_static.a
lrwxrwxrwx 1 root root   30 Apr 15 23:26 libnvinfer_vc_plugin.so -> libnvinfer_vc_plugin.so.10.0.1
lrwxrwxrwx 1 root root   30 Apr 15 23:26 libnvinfer_vc_plugin.so.10 -> libnvinfer_vc_plugin.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:26 libnvinfer_vc_plugin.so.10.0.1
-rw-r--r-- 1 root root 442K Apr 15 23:26 libnvinfer_vc_plugin_static.a
lrwxrwxrwx 1 root root   21 Apr 15 23:26 libnvonnxparser.so -> libnvonnxparser.so.10
lrwxrwxrwx 1 root root   25 Apr 15 23:26 libnvonnxparser.so.10 -> libnvonnxparser.so.10.0.1
-rwxr-xr-x 1 root root 3.4M Apr 15 23:22 libnvonnxparser.so.10.0.1
-rw-r--r-- 1 root root  19M Apr 15 23:22 libnvonnxparser_static.a
-rw-r--r-- 1 root root 675K Apr 15 23:26 libonnx_proto.a
drwxr-xr-x 2 root root  168 Apr 15 23:26 stubs

@byshiue
Copy link
Collaborator

byshiue commented May 17, 2024

@byshiue yes, I've rebuild the docker image. You can see in my above list, the tensorrt already 0.10.1: libnvinfer_plugin_tensorrt_llm.so.10.0.1

ll /usr/local/tensorrt/lib/
total 3.5G
lrwxrwxrwx 1 root root   20 Apr 15 23:25 libnvinfer.so -> libnvinfer.so.10.0.1
lrwxrwxrwx 1 root root   20 Apr 15 23:25 libnvinfer.so.10 -> libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 224M Apr 15 23:25 libnvinfer.so.10.0.1
-rwxr-xr-x 1 root root 1.3G Apr 15 23:26 libnvinfer_builder_resource.so.10.0.1
lrwxrwxrwx 1 root root   29 Apr 15 23:22 libnvinfer_dispatch.so -> libnvinfer_dispatch.so.10.0.1
lrwxrwxrwx 1 root root   29 Apr 15 23:22 libnvinfer_dispatch.so.10 -> libnvinfer_dispatch.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:22 libnvinfer_dispatch.so.10.0.1
-rw-r--r-- 1 root root 751K Apr 15 23:22 libnvinfer_dispatch_static.a
lrwxrwxrwx 1 root root   25 Apr 15 23:22 libnvinfer_lean.so -> libnvinfer_lean.so.10.0.1
lrwxrwxrwx 1 root root   25 Apr 15 23:22 libnvinfer_lean.so.10 -> libnvinfer_lean.so.10.0.1
-rwxr-xr-x 1 root root  33M Apr 15 23:22 libnvinfer_lean.so.10.0.1
-rw-r--r-- 1 root root 243M Apr 15 23:22 libnvinfer_lean_static.a
lrwxrwxrwx 1 root root   27 Apr 15 23:26 libnvinfer_plugin.so -> libnvinfer_plugin.so.10.0.1
lrwxrwxrwx 1 root root   27 Apr 15 23:26 libnvinfer_plugin.so.10 -> libnvinfer_plugin.so.10.0.1
-rwxr-xr-x 1 root root  33M Apr 15 23:26 libnvinfer_plugin.so.10.0.1
-rw-r--r-- 1 root root  37M Apr 15 23:26 libnvinfer_plugin_static.a
-rw-r--r-- 1 root root 1.7G Apr 15 23:26 libnvinfer_static.a
lrwxrwxrwx 1 root root   30 Apr 15 23:26 libnvinfer_vc_plugin.so -> libnvinfer_vc_plugin.so.10.0.1
lrwxrwxrwx 1 root root   30 Apr 15 23:26 libnvinfer_vc_plugin.so.10 -> libnvinfer_vc_plugin.so.10.0.1
-rwxr-xr-x 1 root root 965K Apr 15 23:26 libnvinfer_vc_plugin.so.10.0.1
-rw-r--r-- 1 root root 442K Apr 15 23:26 libnvinfer_vc_plugin_static.a
lrwxrwxrwx 1 root root   21 Apr 15 23:26 libnvonnxparser.so -> libnvonnxparser.so.10
lrwxrwxrwx 1 root root   25 Apr 15 23:26 libnvonnxparser.so.10 -> libnvonnxparser.so.10.0.1
-rwxr-xr-x 1 root root 3.4M Apr 15 23:22 libnvonnxparser.so.10.0.1
-rw-r--r-- 1 root root  19M Apr 15 23:22 libnvonnxparser_static.a
-rw-r--r-- 1 root root 675K Apr 15 23:26 libonnx_proto.a
drwxr-xr-x 2 root root  168 Apr 15 23:26 stubs

How do you build the docker image and the tensorrt_llm?

@gloritygithub11
Copy link
Author

with following docker file

# Use an official NVIDIA CUDA image as a parent image
FROM nvidia/cuda:12.4.1-devel-ubuntu20.04

# Set the working directory
WORKDIR /app


# Install software-properties-common to add repositories
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y software-properties-common

# Add deadsnakes PPA for newer Python versions
RUN add-apt-repository ppa:deadsnakes/ppa

# Install necessary packages
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    python3.10 \
    python3.10-distutils \
    python3-pip \
    openmpi-bin \
    libopenmpi-dev \
    git \
    && rm -rf /var/lib/apt/lists/*

RUN apt-get update \
  && apt-get install python3.10-venv \
  && python3.10 -m venv venv_dev

RUN apt-get update \
  && apt-get install -y python3.10-dev

RUN . venv_dev/bin/activate \
  && python3 -m pip install -U pip \
  && pip3 install tensorrt_llm --pre --extra-index-url https://pypi.nvidia.com --timeout 3600

RUN apt-get install wget \
  && wget https://github.com/Kitware/CMake/releases/download/v3.29.2/cmake-3.29.2-linux-x86_64.sh\
  && chmod +x cmake-3.29.2-linux-x86_64.sh\
  && ./cmake-3.29.2-linux-x86_64.sh --skip-license --prefix=/usr/local


RUN git clone https://github.com/NVIDIA/TensorRT-LLM.git tensorrt-llm \
  && cd tensorrt-llm \
  && ENV=/root/.bashrc bash docker/common/install_tensorrt.sh

RUN apt-get install -y vim git-lfs

RUN export PYTHONPATH=/app/tensorrt-llm/3rdparty/cutlass/python:$PYTHONPATH \
  && . /app/venv_dev/bin/activate \
  && cd tensorrt-llm \
  && git lfs install \
  && git lfs pull \
  && python scripts/build_wheel.py -c -D"TRT_INCLUDE_DIR=/usr/local/tensorrt/include" -D"TRT_LIB_DIR=/usr/local/tensorrt/lib"

# Make port 80 available to the world outside this container
EXPOSE 80

# Define environment variable
ENV NAME World

# Run app.py when the container launches
CMD ["bash", "echo Hello World!"]

@byshiue
Copy link
Collaborator

byshiue commented May 23, 2024

It seems you don't use the official docker file. Could you take a try?

@gloritygithub11
Copy link
Author

@byshiue

I follow the steps in https://nvidia.github.io/TensorRT-LLM/installation/linux.html to create a new docker env.

Get similar error.

Process 0 loading engine from /root/models/tmp/trt_engines/Meta-Llama-3-8B-Instruct/fp16/1-gpu-tp1/rank0.engine
[05/24/2024-08:20:11] [TRT] [I] Loaded engine size: 15323 MiB
[05/24/2024-08:20:13] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/24/2024-08:20:13] [TRT] [E] 3: getPluginCreator could not find plugin: Gemmtensorrt_llm version: 1
[05/24/2024-08:20:13] [TRT] [E] 1: [pluginV2Runner.cpp::load::294] Error Code 1: Serialization (Serialization assertion creator failed.Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry)

Following is the modules related to tensorrt

root@e8fbc031fb35:~/TensorRT-LLM/examples/llama# pip list | grep tensorrt
tensorrt                 10.0.1
tensorrt-cu12            10.0.1
tensorrt-cu12-bindings   10.0.1
tensorrt-cu12-libs       10.0.1
tensorrt-llm             0.11.0.dev2024052100

PS, I didn't find tensorrt under /usr/local/tensorrt/lib, does it located in somewhere or need additional steps?

@byshiue
Copy link
Collaborator

byshiue commented May 27, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

2 participants