Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How IoU values are computed? #108

Open
GabrielePaolini opened this issue Jan 27, 2024 · 6 comments
Open

How IoU values are computed? #108

GabrielePaolini opened this issue Jan 27, 2024 · 6 comments

Comments

@GabrielePaolini
Copy link

Thank you for your efforts on this repo!
I'm trying to train the toy dataset by following the documentation.
I was able to train the network and also to infer on the same point cloud (by first removing the classification labels from the las file).

From the log of the training, it seems that the IoU value never gets close to 1.
For example, for the building class, the highest IoU value is something around 0.30.
This doesn't make sense to me, since a visual inspection of the inferred labels shows an almost perfect score
for ground, building and vegetation classes!

So, how are IoU values computed for individual classes? How should I interpret these results?

Thank you in advance for your help!

@CharlesGaydon
Copy link
Collaborator

Hi @GabrielePaolini, thanks for using Myria3D.

I'm was able to replicate your observation, but I also gave a look to the training IoUs, which show overfitting as expected. See the screenshot below:

image

There is indeed a difference in how IoUs are calculated at eval time. Let me refer to you to the documentation on design choices. See the two last sections : Speed is of the essence and Evaluation is key to select the right approach. In a nutshell, we supervize learning on the subsampled cloud, for efficiency, which I guess could impact IoUs computed during evaluation.

I'm still surprised, since I would imagine that the impact would be rather small if the model got the exact same data during evaluation. Pytorch's doc says that the same dataloader is used for training and evaluation during overfitting. This may mean that the model might see a point cloud subsampled differently than the one seen during training, leading to degraded IoU during evaluation in an overfitting setting.So this observation could be simply due to pytorch's behavior and not to the computation of IoUs itself.

@GabrielePaolini
Copy link
Author

Hi @CharlesGaydon and thank you so much for the explanation!
I wasn't totally accurate in my explanation.
In my question I was referring to the val/IoU values, which I assume are computed on the validation dataset (which for the toy dataset is the same as the training dataset).

In fact, I see that your val/IoU values also doesn't approach 1 (apart from the ground class).
Why does this happen? Shouldn't the model overfit and give good IoU values on the validation step?

I really need to understand how should I evaluate the model performance, since I want to train RandLa-Net on new data.
Should I rely on train/IoU values to assess the generalization capability of my model?

Thank you again for your support!

@CharlesGaydon
Copy link
Collaborator

Shouldn't the model overfit and give good IoU values on the validation step
That is why I would have expected as well. My best guess is that the model is totally disturbed at evaluation time due to a different subsampling, and could have been robust to that if not for the different method of IoU computation at eval time.

This only happens when overfitting this data. So yes, you can totally rely on validation IoUs for your own traninings. I have never seen this outside of overfitting, and I even had the occasion to calculate IoUs out of Myria3D with a different code, with the same results. So I am fairly confident that this is an edge case that happens solely during overfitting on a single cloud.

Sorry if this is causing some confusion! I'm keeping this issue open until I have time to check out what causes this behavior during overfitting.

@GabrielePaolini
Copy link
Author

Everything is clearer now, I hope it is an easily solved edge case!
Anyway, thank you Charles, I am curious to know where the problem comes from.

@CharlesGaydon
Copy link
Collaborator

I gave this a quick look. The data transforms are run at each step. But the batch is of the same size and the average of each feature is constant across run, so it should be the same data that is seen by the model.
The point clouds are in a different order, so I cannot say for sure if the point themselves are shuffled within a cloud. If they are, this could have a high impact on RandLA-Net since the model use decimation subsampling, which is sensitive to point order. But I think this is not the explanation since this would also affect training metrics from one step to another.

Then, I tried removing knn interpolation at evaluation time. This did not change anything (ouf!) so this is unrelated to the interpolation of points.

So my guess is that something weird happens due to the evaluation mode, maybe due to batch normalizations or dropout.

@GabrielePaolini
Copy link
Author

Hi @CharlesGaydon, thanks for the update!
I was going to clarify the situation: at the time I opened this issue, I trained the toy dataset using the default settings (RandLaNet-Overfit experiment). The fact is that I mistakenly inferred using the model checkpoint provided in the repo (proto151_V2.0_epoch_100_Myria3DV3.1.0.ckpt), so that's why I got the perfect results in contrast with the poor validation values.

To confirm your point, I downloaded the latest version of the code and run the default RandLaNet-Overfit and another experiment using the same settings from proto151_V2.0_epoch_100_Myria3DV3.1.0_predict_config_V3.7.0.yaml (with and without overfit).
This time, validation values seem indicative of the goodness of the training (NOT 100% SURE).
However, results are not good.

I attach here the config file, logs and result of the inference (displayed with CloudCompare) of the experiment run with the proto151 settings without overfit_batches. Inference was done using a checkpoint at epoch 71.
The model doesn't seem to learn from the building and vegetation class and at some point the unclassified points starts increasing.

What is the reason why I get such bad results? How was the checkpoint proto151_V2.0_epoch_100_Myria3DV3.1.0.ckpt obtained?

config_tree.txt

Screenshot from 2024-02-07 17-56-23
Screenshot from 2024-02-07 17-56-28
Screenshot from 2024-02-07 17-56-32
train_loss_epoch
train_iou_epoch
train_iou_CLASS_vegetation
train_iou_CLASS_ground
train_iou_CLASS_building
train_iou_CLASS_unclassified

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants