Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different prediction with tensorrt on refinedet model for the version v0.18.0 #1324

Open
1 of 6 tasks
YaYaB opened this issue Aug 10, 2021 · 3 comments
Open
1 of 6 tasks

Comments

@YaYaB
Copy link
Contributor

YaYaB commented Aug 10, 2021

pred_trt_refinedet_issue.zip

Configuration

  • Version of DeepDetect:
    • Locally compiled on:
      • Ubuntu 18.04 LTS
      • Other:
    • Docker CPU
    • Docker GPU
    • Amazon AMI
  • Commit (shown by the server when starting):
    23bd913

Your question / the problem you're facing:

I am observing weird predictions (with tensorrt and a refinedet model) associated to the last version of DeepDetect.
The predictions seem really off.

I have created a script to replicate.
It will launch predictions on dd's version from v0.15.0 to v0.18.0 with and without tensorrt.
Then it dumps the predictions and a hash is computed on each prediction file (we keep only the predicions' list). We observe that the v0.18.0 trt is not consistent with its caffe version or with the previous trt models.

Please fill in the script the following env variables and make sure that you have a gpu available for testing.
BASE_PATH=TODO
LOGGING_FOLDER=TODO

and then simply launch the script

bash  pred_trt_refinedet_issue.sh

You should get the following output at then end (all the docker logs are not shown here):

Here we compute the sha256sum of the predictions obtained.
For the caffe models nothing changes however we observe differences for the trt model of the last version of dd v0.18.0.
Compare deepdetect_gpu
PATH_LOGS/prediction_deepdetect_gpu_v0.15.0.json: 9e056b235be08f7245bdd324ac8ca756c41353771fcb3004df2f6b6347326d63  -
PATH_LOGS/prediction_deepdetect_gpu_v0.16.0.json: 9e056b235be08f7245bdd324ac8ca756c41353771fcb3004df2f6b6347326d63  -
PATH_LOGS/prediction_deepdetect_gpu_v0.17.0.json: 9e056b235be08f7245bdd324ac8ca756c41353771fcb3004df2f6b6347326d63  -
PATH_LOGS/prediction_deepdetect_gpu_v0.18.0.json: 9e056b235be08f7245bdd324ac8ca756c41353771fcb3004df2f6b6347326d63  -

Compare deepdetect_gpu_tensorrt
PATH_LOGS/prediction_deepdetect_gpu_tensorrt_v0.15.0.json: 51767470062ecba3d77e765c34bed6000cf175400d5ff59dda9b4727356f49b5  -
PATH_LOGS/prediction_deepdetect_gpu_tensorrt_v0.16.0.json: 51767470062ecba3d77e765c34bed6000cf175400d5ff59dda9b4727356f49b5  -
PATH_LOGS/prediction_deepdetect_gpu_tensorrt_v0.17.0.json: 51767470062ecba3d77e765c34bed6000cf175400d5ff59dda9b4727356f49b5  -
PATH_LOGS/prediction_deepdetect_gpu_tensorrt_v0.18.0.json: 1508b68447819ff281231ad5c757e88f4a651f50570115565438ac9fee88d566  -

Expected predictions
[
  {
    "classes": [
      {
        "last": true,
        "bbox": {
          "ymax": 350.2694091796875,
          "xmax": 745.9049682617188,
          "ymin": 108.38544464111328,
          "xmin": 528.0482788085938
        },
        "prob": 0.9999849796295166,
        "cat": "1"
      }
    ],
    "uri": "https://icour.fr/ELeveSeconde/ajout/yann_lecum_vidal/images/yann_LeCun.jpg"
  }
]

Anormal predictions for trt v0.18.0
[
  {
    "classes": [
      {
        "last": true,
        "bbox": {
          "ymax": 239.68505859375,
          "xmax": 425.599365234375,
          "ymin": 0,
          "xmin": 211.946044921875
        },
        "prob": 1,
        "cat": "1"
      }
    ],
    "uri": "https://icour.fr/ELeveSeconde/ajout/yann_lecum_vidal/images/yann_LeCun.jpg"
  }

@fantes
Copy link
Contributor

fantes commented Aug 18, 2021

Hi there, and thank you for the bug report

we were finally able to fix this, here : #1329

this PR updates TRT dependency (to TENSORRT 8.0.x), and unfortunately, this version has a bug https://forums.developer.nvidia.com/t/build-engine-error-when-use-pointnet-like-structure-and-tensorrt-8-0-1-6/183569/6
that affects ssd models.

Hopefully it will be fixed in net TRT update, and everything should then go as it should

@YaYaB
Copy link
Contributor Author

YaYaB commented Aug 18, 2021

Thanks a lot I'll try your fix!
I could be a good idea to add unit tests based on expected values for different models predictions to catch those, no?

@fantes
Copy link
Contributor

fantes commented Aug 18, 2021

Indeed we have a few tests (we need to add some more) but they are deactivated due to dependancies problems (compatibility between versions of tensorrt, tensorrt-oss, cudnn , ubuntu and correspnding docker images... )
Hopefully we will be able to integrate/activate them with TRT 8.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants