Update UNet3D inference to use MLPerf's evaluation set #4518

francislata · 2024-05-10T21:07:46Z

Inference set changes

I noticed that MLPerf's inference set includes one extra file, case_00400. I did see that the official kits19 dataset doesn't have this particular case. To compensate for it, the official MLPerf's inference script copies case_00185 to be case_00400.

For comparison, here's the result of running model_eval.py using get_val_set (old):

Meanwhile, here's the result of running model_eval.py when using get_eval_set (new):

Both of these runs use the pretrained model and is above the expected MEAN Dice score of 0.86170.

Support for loading model checkpoint inside `model_eval.py`

Lastly, this also adds support of loading a model checkpoint for the UNet3D model when running model_eval.py. Here's an example run of a converged training run on a tinybox green:

wozeparrot · 2024-05-10T21:10:25Z

The dice score should be the same as the reference no?

francislata · 2024-05-10T21:12:41Z

The dice score should be the same as the reference no?

this one is an odd one because i was looking at the old PR that introduced this eval function and they got 0.86632.

i'm pretty sure we're using the same pretrained model from the MLPerf inference README but I can double-check that again.

wozeparrot · 2024-05-10T21:17:57Z

Yea looking at the original pr, they say that the reference target is 0.86330, but I'm not sure where this is from considering that mlcommons upstream https://github.com/mlcommons/inference/blob/master/vision/medical_imaging/3d-unet-kits19/README.md lists it as 0.86170 for all frameworks.

francislata · 2024-05-10T21:22:52Z

yeah I agree. the only one time I found 0.86630 was in Dell's blog regarding MLPerf:

and that blog was pretty recent (last month).

francislata · 2024-05-13T15:44:40Z

@wozeparrot I did a quick run of MLPerf's inference script and it looks like they are getting 0.86172 for mean DICE score:

It is interesting that we are getting higher for the same checkpoint. I did write a test on another PR regarding the DICE score metric and it is equal to the original DICE score implementation in MLPerf's training reference implementation. I'll try to find out where the difference comes from between their implementation versus ours.

use matching eval set as mlperf's inference script

8fbdbef

francislata marked this pull request as ready for review May 10, 2024 21:19

francislata marked this pull request as draft May 13, 2024 12:50

francislata marked this pull request as ready for review May 13, 2024 15:42

francislata marked this pull request as draft May 14, 2024 11:50

Merge branch 'master' into unet3d_use_eval_set

870044e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update UNet3D inference to use MLPerf's evaluation set #4518

Update UNet3D inference to use MLPerf's evaluation set #4518

francislata commented May 10, 2024

wozeparrot commented May 10, 2024

francislata commented May 10, 2024

wozeparrot commented May 10, 2024

francislata commented May 10, 2024

francislata commented May 13, 2024

Update UNet3D inference to use MLPerf's evaluation set #4518

Are you sure you want to change the base?

Update UNet3D inference to use MLPerf's evaluation set #4518

Conversation

francislata commented May 10, 2024

Inference set changes

Support for loading model checkpoint inside model_eval.py

wozeparrot commented May 10, 2024

francislata commented May 10, 2024

wozeparrot commented May 10, 2024

francislata commented May 10, 2024

francislata commented May 13, 2024

Support for loading model checkpoint inside `model_eval.py`