Testing Using Pretrained Model Results in Different Performance Across Multiple Runs #12

YunzeMan · 2023-08-12T06:02:02Z

Hi, Thanks for sharing your great work. As indicated in the title, I not only got lower performance (mAP@0.25: 0.7069, mAP@0.5: 0.5699), but also different performance across multiple runs.

These are several runs using the following command:
python tools/test.py configs/tr3d/tr3d_scannet-3d-18class.py ./tr3d_scannet.pth --eval mAP (Yes, I'm using the provided pretrained model on scannet)

Run 1: (mAP@0.25: 0.7069, mAP@0.5: 0.5699)
Run 2: (mAP@0.25: 0.7068, mAP@0.5: 0.5716)
Run 3: (mAP@0.25: 0.7069, mAP@0.5: 0.5710)

I'm using Pytorch 1.12, CUDA 11.3, CUDNN 8.

I cannot figure out where the stochasticity may come from, especially during the evaluation (test). Could you shed some lights on the possible reasons of this scenario?

Here is one of my outputs during the testing.
test_log.txt

The text was updated successfully, but these errors were encountered:

filaPro · 2023-08-12T14:01:58Z

Hi @YunzeMan ,

We didn't meet nondeterministic issues on ScanNet. Can you please share .log file of this run? Also may be check something from torch randomness guide?

YunzeMan · 2023-08-12T16:13:51Z

Regarding randomness, I strictly followed your steps. I checked torch randomness guide but didn't find useful directions. Setting --deterministic=True didn't seem to help.

I also trained the model following your steps. Here is the log file.
20230812_013000.log

The mAP@0.25 and mAP@0.5 are both lower than your reported value. Have you altered the codebase or parameters a little bit without noticing it?

filaPro · 2023-08-13T10:42:34Z

No, this code for sure is able to reproduce our metrics...

May by you can try our implementation in mmdetection3d codebase?

filaPro · 2023-08-13T17:46:32Z

Btw, i think i understand this little randomness in test stage. Here in SparseTensor construction the default quantization_mode is RANDOM_SUBSAMPLE following MinkowskiEngine. Can you try with UNWEIGHTED_AVERAGE here?

YunzeMan · 2023-08-13T20:27:13Z

Thanks for pointing that out. However, after changing quantization_mode of x to UNWEIGHTED_AVERAGE, the little randomness still persists. Here are results of three separate runs:

Run 1: (mAP@0.25: 0.7068, mAP@0.5: 0.5702)
Run 2: (mAP@0.25: 0.7068, mAP@0.5: 0.5697)
Run 3: (mAP@0.25: 0.7069, mAP@0.5: 0.5720)

What's more frustrating than the little randomness is the marginally lower performance. But since the gap isn't very large, I can perhaps work with the current version.

This comment was marked as outdated.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing Using Pretrained Model Results in Different Performance Across Multiple Runs #12

Testing Using Pretrained Model Results in Different Performance Across Multiple Runs #12

YunzeMan commented Aug 12, 2023

filaPro commented Aug 12, 2023

This comment was marked as outdated.

YunzeMan commented Aug 12, 2023

filaPro commented Aug 13, 2023

filaPro commented Aug 13, 2023

YunzeMan commented Aug 13, 2023

Testing Using Pretrained Model Results in Different Performance Across Multiple Runs #12

Testing Using Pretrained Model Results in Different Performance Across Multiple Runs #12

Comments

YunzeMan commented Aug 12, 2023

filaPro commented Aug 12, 2023

This comment was marked as outdated.

YunzeMan commented Aug 12, 2023

filaPro commented Aug 13, 2023

filaPro commented Aug 13, 2023

YunzeMan commented Aug 13, 2023