How IoU values are computed? #108

GabrielePaolini · 2024-01-27T14:50:54Z

Thank you for your efforts on this repo!
I'm trying to train the toy dataset by following the documentation.
I was able to train the network and also to infer on the same point cloud (by first removing the classification labels from the las file).

From the log of the training, it seems that the IoU value never gets close to 1.
For example, for the building class, the highest IoU value is something around 0.30.
This doesn't make sense to me, since a visual inspection of the inferred labels shows an almost perfect score
for ground, building and vegetation classes!

So, how are IoU values computed for individual classes? How should I interpret these results?

Thank you in advance for your help!

CharlesGaydon · 2024-01-30T16:42:56Z

Hi @GabrielePaolini, thanks for using Myria3D.

I'm was able to replicate your observation, but I also gave a look to the training IoUs, which show overfitting as expected. See the screenshot below:

There is indeed a difference in how IoUs are calculated at eval time. Let me refer to you to the documentation on design choices. See the two last sections : Speed is of the essence and Evaluation is key to select the right approach. In a nutshell, we supervize learning on the subsampled cloud, for efficiency, which I guess could impact IoUs computed during evaluation.

I'm still surprised, since I would imagine that the impact would be rather small if the model got the exact same data during evaluation. Pytorch's doc says that the same dataloader is used for training and evaluation during overfitting. This may mean that the model might see a point cloud subsampled differently than the one seen during training, leading to degraded IoU during evaluation in an overfitting setting.So this observation could be simply due to pytorch's behavior and not to the computation of IoUs itself.

GabrielePaolini · 2024-01-31T12:34:13Z

Hi @CharlesGaydon and thank you so much for the explanation!
I wasn't totally accurate in my explanation.
In my question I was referring to the val/IoU values, which I assume are computed on the validation dataset (which for the toy dataset is the same as the training dataset).

In fact, I see that your val/IoU values also doesn't approach 1 (apart from the ground class).
Why does this happen? Shouldn't the model overfit and give good IoU values on the validation step?

I really need to understand how should I evaluate the model performance, since I want to train RandLa-Net on new data.
Should I rely on train/IoU values to assess the generalization capability of my model?

Thank you again for your support!

CharlesGaydon · 2024-01-31T18:01:16Z

Shouldn't the model overfit and give good IoU values on the validation step
That is why I would have expected as well. My best guess is that the model is totally disturbed at evaluation time due to a different subsampling, and could have been robust to that if not for the different method of IoU computation at eval time.

This only happens when overfitting this data. So yes, you can totally rely on validation IoUs for your own traninings. I have never seen this outside of overfitting, and I even had the occasion to calculate IoUs out of Myria3D with a different code, with the same results. So I am fairly confident that this is an edge case that happens solely during overfitting on a single cloud.

Sorry if this is causing some confusion! I'm keeping this issue open until I have time to check out what causes this behavior during overfitting.

GabrielePaolini · 2024-01-31T18:57:55Z

Everything is clearer now, I hope it is an easily solved edge case!
Anyway, thank you Charles, I am curious to know where the problem comes from.

CharlesGaydon · 2024-02-07T17:28:37Z

I gave this a quick look. The data transforms are run at each step. But the batch is of the same size and the average of each feature is constant across run, so it should be the same data that is seen by the model.
The point clouds are in a different order, so I cannot say for sure if the point themselves are shuffled within a cloud. If they are, this could have a high impact on RandLA-Net since the model use decimation subsampling, which is sensitive to point order. But I think this is not the explanation since this would also affect training metrics from one step to another.

Then, I tried removing knn interpolation at evaluation time. This did not change anything (ouf!) so this is unrelated to the interpolation of points.

So my guess is that something weird happens due to the evaluation mode, maybe due to batch normalizations or dropout.

GabrielePaolini · 2024-02-07T20:03:27Z

Hi @CharlesGaydon, thanks for the update!
I was going to clarify the situation: at the time I opened this issue, I trained the toy dataset using the default settings (RandLaNet-Overfit experiment). The fact is that I mistakenly inferred using the model checkpoint provided in the repo (proto151_V2.0_epoch_100_Myria3DV3.1.0.ckpt), so that's why I got the perfect results in contrast with the poor validation values.

To confirm your point, I downloaded the latest version of the code and run the default RandLaNet-Overfit and another experiment using the same settings from proto151_V2.0_epoch_100_Myria3DV3.1.0_predict_config_V3.7.0.yaml (with and without overfit).
This time, validation values seem indicative of the goodness of the training (NOT 100% SURE).
However, results are not good.

I attach here the config file, logs and result of the inference (displayed with CloudCompare) of the experiment run with the proto151 settings without overfit_batches. Inference was done using a checkpoint at epoch 71.
The model doesn't seem to learn from the building and vegetation class and at some point the unclassified points starts increasing.

What is the reason why I get such bad results? How was the checkpoint proto151_V2.0_epoch_100_Myria3DV3.1.0.ckpt obtained?

config_tree.txt

CharlesGaydon mentioned this issue Apr 23, 2024

Incoherent logged val/iou value despite coherent confusion matrix and per-class IoUs when training model in DDP #122

Open