Evaluation Metrics #8

infusion-zero-edit · 2023-04-09T06:27:19Z

There is no clear instructions in the repo how to calculate and verify the metrics published in the paper, neither it has been calculated in training and validation step only the input and denoised images are saved as results in the experiments folder.

There is no reference of metrics PSNR and SSIM as defined in the file: https://github.com/StanfordMIMI/DDM2/blob/4f5a551a7f16e18883e3bf7451df7b46e691236d/core/metrics.py

Can you please add instructions to calculated the metrics after third stage of training is completed.

prinshul · 2023-04-10T07:19:56Z

Hi
@tiangexiang can you please clarify how one can get the results reported in the paper on test data after the training stage III is over?

tiangexiang · 2023-04-10T20:24:53Z

Hi, thanks for your interests! The metrics were calculated based on this script provided by DIPY (https://dipy.org/documentation/1.1.0./examples_built/snr_in_cc/) since it provides the foreground regions to calculate SNR/CNR. Note that we only reported the metrics on Stanford HARDI dataset by following the steps in the DIPY script.
We will manage to release our evaluation script in a few days!

prinshul · 2023-04-10T23:11:03Z

Thank you. Waiting eagerly for the evaluation script. Also for the Sherbrooke dataset, is their a way to compute any quantitative metric on it as well? Can we use, https://dipy.org/documentation/1.1.0./examples_built/snr_in_cc/
to evaluate Sherbrooke test data as well?

tiangexiang · 2023-04-11T05:43:03Z

I think the primary concern is the definition of foreground ROI, which will be used to calculate SNR/CNR. Without having this medical expertise to know which region should be defined as ROI, we didn't try to calculate metrics on other datasets. If your team is able to localize the ROIs for different datasets precisely, you can directly use the same script :)

prinshul · 2023-04-17T14:25:23Z

Hi @tiangexiang

Can you please update the metric code. Thanks.

tiangexiang · 2023-04-19T04:53:30Z

Hi, our script for the quantitative metric calculation is uploaded. Please see README for details :) Note that in the notebook, we tested a different set of denoised data, which yield slightly different scores than the ones we reported in the paper.

prinshul · 2023-04-19T05:06:16Z

Thank you for prompt response @tiangexiang.

infusion-zero-edit · 2023-04-20T12:28:59Z

HI @tiangexiang , i have ran your notebook and the denoised save size is (81, 106, 76, 1), it gives error at SNR = SNR[sel_b] saying sel_b of size 160 and SNR of size 11. Can you help please ?

SNR = mean_signal_denoised[k] / (denoised_noise_std[k]+1e-7)
CNR = (mean_signal_denoised[k] - denoised_mean_bg[k]) / (denoised_noise_std[k]+1e-7)

SNR = SNR[sel_b]

infusion-zero-edit · 2023-04-20T16:41:00Z

Hi @tiangexiang i have run the denoising script for all the slices i have seen your code where it is taking only one slice the 32th one, after that the evaluation metrics code is running fine, but following your steps in the github repo we are getting following results:

raw [SNR] mean: 5.1141 std: 2.4988
raw [CNR] mean: 4.6567 std: 2.4976

our [SNR delta] mean: 1.0223 std: 1.3709 best: 3.8002 worst: -1.7891
our [CNR delta] mean: 0.9643 std: 1.3711 best: 3.7478 worst: -1.8394

The results which is reported in evaluation_metrics notebook is different

our [SNR delta] mean: 1.8284 std: 1.6969 best: 4.9025 worst: -1.7451
our [CNR delta] mean: 1.7486 std: 1.6949 best: 4.8205 worst: -1.8113

The box plot plotted in notebook with these results is not matching with the box plot in the paper

So we are not sure how to arrive on the results reported in the paper. Please help.

tiangexiang · 2023-04-20T16:59:51Z

Hi @anantkha , sorry for the unclearness. The evaluation script needs to run on ALL the slices for ALL the volumes (except for b=0 volumes). In 'denoise.py' we set the slice index to 32 just for a quick demo, you need to change 32 to 'all' to denoise on all slices in order to calculate the metrics. This can take a relatively long time :)

After denoising ALL non-b0 volumes, you need to append the original b0 volumes into the denoised results. Don't worry about different intensity scales, there is a normalization step in the notebook.

Lastly, as explained in an earlier thread, the denoised results we provided are different from the ones used in the paper, therefore the metric scores could be a bit different as well.

infusion-zero-edit · 2023-04-20T17:05:29Z

yes @tiangexiang i have ran on all the slices, and that takes relatively longer time and have calculated results basis that, but they are not matching with the paper results? any thoughts i have followed the exact same steps as per your github repo there is a huge difference we are getting as per the box plot in paper the mean should be around 5 but we are getting only 1.02 as the mean

infusion-zero-edit · 2023-04-20T17:07:42Z

yes so after saving the denoised results the shape is (81, 106, 76, 150) and the rest 10 of them is concatenated as b0 volumes and calculated results basis that. Still we are not getting the same results as in the paper. is there any way we can get the exact same results as in the paper

tiangexiang · 2023-04-20T17:20:44Z

Frankly, I am also not sure why your quantitative scores are this low. It could be variations from generations, could be something wrong during training, could also be a problem with software/hardware versions, etc. Can you please double-check the visual quality? And make sure it is reasonable throughout the training of Stage III.

We ran the code under our software/hardware environment multiple times, and we can get similar results at every time. At least, the 1.02 delta SNR is still better than all other comparison methods.

infusion-zero-edit · 2023-04-20T17:28:16Z

GPU USED Nvidia Tesla V100 32 GB

I also ran this two times, but observed the training happens quite fast the stage 1 training completed in 15 minutes and stage ||| training completed in two-three hours all without any error if you want i can share the logs with you. I just wanted you to check the training scripts if they contains any hard constraint like the one we found in the denoising script which is calculating only on 32th slice it might happen during training such kind of constraint is there because of which training happens quite fast which should not happen ideally because here we train the diffusion model not a MLP network.

infusion-zero-edit · 2023-04-20T17:29:38Z

The inference of all the slices take quite a long time i am wondering how the training happens quite fast. Requesting you to please check the training scripts if there any observable constraint there. Thanks

tiangexiang · 2023-04-20T17:31:25Z

Yeah you are right, the training is abnormally fast. Let's both go through the script to see if there are suspicious constraints/errors. I will keep you posted whenever I make an update! Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Metrics #8

Evaluation Metrics #8

infusion-zero-edit commented Apr 9, 2023

prinshul commented Apr 10, 2023

tiangexiang commented Apr 10, 2023

prinshul commented Apr 10, 2023

tiangexiang commented Apr 11, 2023

prinshul commented Apr 17, 2023

tiangexiang commented Apr 19, 2023

prinshul commented Apr 19, 2023

infusion-zero-edit commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023 •

edited

tiangexiang commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023 •

edited

tiangexiang commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023

tiangexiang commented Apr 20, 2023

Evaluation Metrics #8

Evaluation Metrics #8

Comments

infusion-zero-edit commented Apr 9, 2023

prinshul commented Apr 10, 2023

tiangexiang commented Apr 10, 2023

prinshul commented Apr 10, 2023

tiangexiang commented Apr 11, 2023

prinshul commented Apr 17, 2023

tiangexiang commented Apr 19, 2023

prinshul commented Apr 19, 2023

infusion-zero-edit commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023 • edited

tiangexiang commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023 • edited

tiangexiang commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023

tiangexiang commented Apr 20, 2023

infusion-zero-edit commented Apr 20, 2023 •

edited

infusion-zero-edit commented Apr 20, 2023 •

edited