Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem of reproduce Camelyon16 result #54

Open
sy2es94098 opened this issue Sep 19, 2022 · 8 comments
Open

Problem of reproduce Camelyon16 result #54

sy2es94098 opened this issue Sep 19, 2022 · 8 comments

Comments

@sy2es94098
Copy link

Hello, thank you for your excellent work!
Earlier I tried to reproduce the results of Camelyon16, I used a total of 271 training sets, batch size 512 to train simclr for 3 days and train the aggregator, but the results are not as good as the simclr weights you provided for 3 days of training (model-v0 in google drive).
Like #46 , the accuracy of the aggregator will be stuck at about 60% and cannot be improved, and I found that in this case, each patch will produce the same attention score.
Can you provide relevant training parameters, such as the -o or -t parameters in deepzoom_tiler.py, and the learning rate, batch size, epoch, etc. of simclr.

@binli123
Copy link
Owner

It could be that the trained model weights are not loaded correctly. You can remove the warning filter to check warnings from load_state_dictS. Make sure instance normalization is used for both scripts and the loss and the loss indeed decreases during SimCLR training.
image

@sy2es94098
Copy link
Author

During my training, I can observe a drop in loss, but it stops at around 5.5.
image
By the way, I found that the aggregator I trained would give higher attention to the normal area, but I followed the instructions in the README and placed the negative sample folder at index 0 in alphabetical order. I am confused as to why this result is produced.
This is tumer026.tif from Camelyon16.
image
Here is the attention map using the features you provided.
image
This is the attention map obtained by the features obtained by my own trained embedded.
image
And this is the groundtruth.
image

My attention map looks fine focusing on normal areas, but I'm curious why the model behaves differently.

@furlat
Copy link

furlat commented Nov 29, 2022

Any update? I have the same training curve for camelyon simclr as @sy2es94098 that gets stuck at around 5 during training. Unfortunately I can only fit 280 images in batch size as I have only 2x 2080tis as gpus.

@binli123
Copy link
Owner

Based on the loss curve of SimCLR. I think you should let the model further converge. Make sure you have loaded the model weights correctly, and that the type of the normalization layer is consistent.

@wangxinghangcnn
Copy link

您好,感谢您的出色工作!之前我尝试重现 Camelyon16 的结果,我总共使用了 271 个训练集,批量大小为 512 来训练 simclr 3 天并训练聚合器,但结果不如您提供的 simclr 权重 3 天训练(谷歌云端硬盘中的 model-v0)。像#46,聚合器的准确性将停留在60%左右,无法提高,我发现在这种情况下,每个补丁都会产生相同的注意力分数。能否提供相关的训练参数,比如deepzoom_tiler.py中的 -o 或 -t 参数,以及学习率、批量大小、纪元等。的西姆勒。

Hello, I would like to ask why all my test results are black when I am trying to draw attention maps. Thank you

@binli123
Copy link
Owner

binli123 commented Apr 6, 2023

您好,感谢您的出色工作!之前我尝试重现 Camelyon16 的结果,我总共使用了 271 个训练集,批量大小为 512 来训练 simclr 3 天并训练聚合器,但结果不如您提供的 simclr 权重 3 天训练(谷歌云端硬盘中的 model-v0)。像#46,聚合器的准确性将停留在60%左右,无法提高,我发现在这种情况下,每个补丁都会产生相同的注意力分数。能否提供相关的训练参数,比如deepzoom_tiler.py中的 -o 或 -t 参数,以及学习率、批量大小、纪元等。的西姆勒。

Hello, I would like to ask why all my test results are black when I am trying to draw attention maps. Thank you

You can try two things:

  1. Remove the dimension normalization Same attention score and the pre-trained aggregators. #59 (comment),
    A = F.softmax( A / torch.sqrt(torch.tensor(Q.shape[1], dtype=torch.float32, device=device)), 0) # normalize attention scores, A in shape N x C,
  2. Make sure the weights are appropriately loaded. Turn off the warning filter and see if there are missing keys in the weights.

@blz822
Copy link

blz822 commented Feb 28, 2024

How was the ground truth of tumor_026.tif obtained?

@binli123
Copy link
Owner

I incorporated the training/testing into the same pipeline in the latest commit. This change allows you to read the evaluation results on a reserved test set. I also incorporated a simple weights initialization method which helps stabilize the training. You can set --eval_scheme=5-fold-cv-standalone-test which will perform a train/valid/test like this:

A standalone test set consisting of 20% samples is reserved, remaining 80% of samples are used to construct a 5-fold cross-validation.
For each fold, the best model and corresponding threshold are saved.
After the 5-fold cross-validation, 5 best models along with the corresponding optimal thresholds are obtained which are used to perform inference on the reserved test set. A final prediction for a test sample is the majority vote of the 5 models.
For a binary classification, accuracy and balanced accuracy scores are computed. For a multi-label classification, hamming loss (smaller the better) and subset accuracy are computed.

You can also simply run a 5-fold cv --eval_scheme=5-fold-cv

There were some issues with the testing script when loading pretrained weights (i.e., sometimes the weights are not fully loaded or there are missing weights, setting strict=False can reveal the problems.). The purpose of the testing script is to generate the heatmap, you should now read the performance directly from the training script. I will fix the issues in a couple of days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants