Problem of reproduce Camelyon16 result #54

sy2es94098 · 2022-09-19T11:57:54Z

Hello, thank you for your excellent work!
Earlier I tried to reproduce the results of Camelyon16, I used a total of 271 training sets, batch size 512 to train simclr for 3 days and train the aggregator, but the results are not as good as the simclr weights you provided for 3 days of training (model-v0 in google drive).
Like #46 , the accuracy of the aggregator will be stuck at about 60% and cannot be improved, and I found that in this case, each patch will produce the same attention score.
Can you provide relevant training parameters, such as the -o or -t parameters in deepzoom_tiler.py, and the learning rate, batch size, epoch, etc. of simclr.

binli123 · 2022-09-20T16:36:07Z

It could be that the trained model weights are not loaded correctly. You can remove the warning filter to check warnings from load_state_dictS. Make sure instance normalization is used for both scripts and the loss and the loss indeed decreases during SimCLR training.

sy2es94098 · 2022-09-24T13:39:34Z

During my training, I can observe a drop in loss, but it stops at around 5.5.

By the way, I found that the aggregator I trained would give higher attention to the normal area, but I followed the instructions in the README and placed the negative sample folder at index 0 in alphabetical order. I am confused as to why this result is produced.
This is tumer026.tif from Camelyon16.

Here is the attention map using the features you provided.

This is the attention map obtained by the features obtained by my own trained embedded.

And this is the groundtruth.

My attention map looks fine focusing on normal areas, but I'm curious why the model behaves differently.

furlat · 2022-11-29T15:51:33Z

Any update? I have the same training curve for camelyon simclr as @sy2es94098 that gets stuck at around 5 during training. Unfortunately I can only fit 280 images in batch size as I have only 2x 2080tis as gpus.

binli123 · 2022-11-30T16:33:34Z

Based on the loss curve of SimCLR. I think you should let the model further converge. Make sure you have loaded the model weights correctly, and that the type of the normalization layer is consistent.

wangxinghangcnn · 2023-04-06T12:12:12Z

您好，感谢您的出色工作！之前我尝试重现 Camelyon16 的结果，我总共使用了 271 个训练集，批量大小为 512 来训练 simclr 3 天并训练聚合器，但结果不如您提供的 simclr 权重 3 天训练（谷歌云端硬盘中的 model-v0）。像#46，聚合器的准确性将停留在60%左右，无法提高，我发现在这种情况下，每个补丁都会产生相同的注意力分数。能否提供相关的训练参数，比如deepzoom_tiler.py中的 -o 或 -t 参数，以及学习率、批量大小、纪元等。的西姆勒。

Hello, I would like to ask why all my test results are black when I am trying to draw attention maps. Thank you

binli123 · 2023-04-06T13:33:42Z

您好，感谢您的出色工作！之前我尝试重现 Camelyon16 的结果，我总共使用了 271 个训练集，批量大小为 512 来训练 simclr 3 天并训练聚合器，但结果不如您提供的 simclr 权重 3 天训练（谷歌云端硬盘中的 model-v0）。像#46，聚合器的准确性将停留在60%左右，无法提高，我发现在这种情况下，每个补丁都会产生相同的注意力分数。能否提供相关的训练参数，比如deepzoom_tiler.py中的 -o 或 -t 参数，以及学习率、批量大小、纪元等。的西姆勒。

Hello, I would like to ask why all my test results are black when I am trying to draw attention maps. Thank you

You can try two things:

Remove the dimension normalization Same attention score and the pre-trained aggregators. #59 (comment),

dsmil-wsi/dsmil.py

Line 57 in 1e8f111

A = F.softmax( A / torch.sqrt(torch.tensor(Q.shape[1], dtype=torch.float32, device=device)), 0) # normalize attention scores, A in shape N x C,
Make sure the weights are appropriately loaded. Turn off the warning filter and see if there are missing keys in the weights.

blz822 · 2024-02-28T02:21:56Z

How was the ground truth of tumor_026.tif obtained?

binli123 · 2024-03-28T15:01:42Z

I incorporated the training/testing into the same pipeline in the latest commit. This change allows you to read the evaluation results on a reserved test set. I also incorporated a simple weights initialization method which helps stabilize the training. You can set --eval_scheme=5-fold-cv-standalone-test which will perform a train/valid/test like this:

A standalone test set consisting of 20% samples is reserved, remaining 80% of samples are used to construct a 5-fold cross-validation.
For each fold, the best model and corresponding threshold are saved.
After the 5-fold cross-validation, 5 best models along with the corresponding optimal thresholds are obtained which are used to perform inference on the reserved test set. A final prediction for a test sample is the majority vote of the 5 models.
For a binary classification, accuracy and balanced accuracy scores are computed. For a multi-label classification, hamming loss (smaller the better) and subset accuracy are computed.

You can also simply run a 5-fold cv --eval_scheme=5-fold-cv

There were some issues with the testing script when loading pretrained weights (i.e., sometimes the weights are not fully loaded or there are missing weights, setting strict=False can reveal the problems.). The purpose of the testing script is to generate the heatmap, you should now read the performance directly from the training script. I will fix the issues in a couple of days.

HHHedo mentioned this issue Oct 10, 2022

Same attention score and the pre-trained aggregators. #59

Closed

xiaozhu0816 mentioned this issue Nov 3, 2022

Strange score for TCGA #61

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem of reproduce Camelyon16 result #54

Problem of reproduce Camelyon16 result #54

sy2es94098 commented Sep 19, 2022

binli123 commented Sep 20, 2022

sy2es94098 commented Sep 24, 2022

furlat commented Nov 29, 2022 •

edited

binli123 commented Nov 30, 2022

wangxinghangcnn commented Apr 6, 2023

binli123 commented Apr 6, 2023

blz822 commented Feb 28, 2024

binli123 commented Mar 28, 2024

Problem of reproduce Camelyon16 result #54

Problem of reproduce Camelyon16 result #54

Comments

sy2es94098 commented Sep 19, 2022

binli123 commented Sep 20, 2022

sy2es94098 commented Sep 24, 2022

furlat commented Nov 29, 2022 • edited

binli123 commented Nov 30, 2022

wangxinghangcnn commented Apr 6, 2023

binli123 commented Apr 6, 2023

blz822 commented Feb 28, 2024

binli123 commented Mar 28, 2024

furlat commented Nov 29, 2022 •

edited