Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update UNet3D inference to use MLPerf's evaluation set #4518

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

francislata
Copy link
Contributor

Inference set changes

I noticed that MLPerf's inference set includes one extra file, case_00400. I did see that the official kits19 dataset doesn't have this particular case. To compensate for it, the official MLPerf's inference script copies case_00185 to be case_00400.

For comparison, here's the result of running model_eval.py using get_val_set (old):
val_set

Meanwhile, here's the result of running model_eval.py when using get_eval_set (new):
eval_set

Both of these runs use the pretrained model and is above the expected MEAN Dice score of 0.86170.

Support for loading model checkpoint inside model_eval.py

Lastly, this also adds support of loading a model checkpoint for the UNet3D model when running model_eval.py. Here's an example run of a converged training run on a tinybox green:
Screenshot 2024-05-10 at 16 09 01

@wozeparrot
Copy link
Collaborator

The dice score should be the same as the reference no?

@francislata
Copy link
Contributor Author

The dice score should be the same as the reference no?

this one is an odd one because i was looking at the old PR that introduced this eval function and they got 0.86632.

i'm pretty sure we're using the same pretrained model from the MLPerf inference README but I can double-check that again.

@wozeparrot
Copy link
Collaborator

Yea looking at the original pr, they say that the reference target is 0.86330, but I'm not sure where this is from considering that mlcommons upstream https://github.com/mlcommons/inference/blob/master/vision/medical_imaging/3d-unet-kits19/README.md lists it as 0.86170 for all frameworks.

@francislata francislata marked this pull request as ready for review May 10, 2024 21:19
@francislata
Copy link
Contributor Author

yeah I agree. the only one time I found 0.86630 was in Dell's blog regarding MLPerf:
Screenshot 2024-05-10 at 17 20 36

and that blog was pretty recent (last month).

@francislata francislata marked this pull request as draft May 13, 2024 12:50
@francislata francislata marked this pull request as ready for review May 13, 2024 15:42
@francislata
Copy link
Contributor Author

@wozeparrot I did a quick run of MLPerf's inference script and it looks like they are getting 0.86172 for mean DICE score:
Screenshot 2024-05-13 at 11 21 07

It is interesting that we are getting higher for the same checkpoint. I did write a test on another PR regarding the DICE score metric and it is equal to the original DICE score implementation in MLPerf's training reference implementation. I'll try to find out where the difference comes from between their implementation versus ours.

@francislata francislata marked this pull request as draft May 14, 2024 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants