Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMclr training vs test sets configuration #39

Open
Bontempogianpaolo1 opened this issue May 17, 2022 · 16 comments
Open

SIMclr training vs test sets configuration #39

Bontempogianpaolo1 opened this issue May 17, 2022 · 16 comments

Comments

@Bontempogianpaolo1
Copy link

Hi @binli123 ,

I'm trying to replicate your results without success on camelyon16. I put the number of classes to 1 and also tried weights online for computing the feats on both training and test set. Even with that I still obtain only 0.7% AUC... So I start thinking about how I organized the data different from you. I downloaded the data from here: https://ftp.cngb.org/pub/gigadb/pub/10.5524/100001_101000/100439/CAMELYON16/
the data is divided into training and test. I used as threeshold 25 for filtering out background. So I used only the training set for training the self-supervised model.
After that, even with the model you published on drive, I extracted feats with the compute_feat script for both training and test(especially with the fusion option). Finally, I modified the train_tcga for considering them as sources for the training set and the test set (270 /130 bags). Even

If instead, I use the features precomputed by you the mil model works. So the problem could be how I split data or how I extract embeddings. What am I missing?

@binli123
Copy link
Owner

Could you check out the CSV files containing the features and labels?

@Bontempogianpaolo1
Copy link
Author

Bontempogianpaolo1 commented May 23, 2022

the csv seems correct... Here some screenshots of embeddings extracting using your pretrained model model_v2.pth found at https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi on patches extracted using 19 as threeshold:

camelyon.csv
image
normal143.csv
image

However comparing your features with mine the number of rows is different...So is it possible that the number of patches is influencing the results? Here the number of patches using different background thresholds for 5 different slides:

Slide name th =19 th=25 your features
tumor_108 29905 402 23263
test_124 6693 3001 2402
tumor_095 39960 1002 31791
normal_137 33396 505 23443
tumor_076 61670 42057 19708

Maybe is the image quality not correct for your embedder? Here an example of patch extracted at level=0 magnitude=20
image
image

With this configuration the mil training remains under the 0.7 % AUC
Thanks in advance for your reply

@binli123
Copy link
Owner

the csv seems correct... Here some screenshots of embeddings extracting using your pretrained model model_v2.pth found at https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi on patches extracted using 19 as threeshold:

camelyon.csv image normal143.csv image

However comparing your features with mine the number of rows is different...So is it possible that the number of patches is influencing the results? Here the number of patches using different background thresholds for 5 different slides:

Slide name th =19 th=25 your features
tumor_108 29905 402 23263
test_124 6693 3001 2402
tumor_095 39960 1002 31791
normal_137 33396 505 23443
tumor_076 61670 42057 19708
Maybe is the image quality not correct for your embedder? Here an example of patch extracted at level=0 magnitude=20 image image

With this configuration the mil training remains under the 0.7 % AUC Thanks in advance for your reply

The feature values look strange. There are some abnormal values > 10. Did you use BatchnNorm or InstanceNorm consistently in the training and feature computation?

@Bontempogianpaolo1
Copy link
Author

Bontempogianpaolo1 commented May 23, 2022

I took directly your embedder without training and I passed it to the compute_feats script with InstanceNorm2d since it is the default parameter

@binli123
Copy link
Owner

model_v2.pth

Have you tried model_v0.pth and model_v1.pth, did they also not work?

@Bontempogianpaolo1
Copy link
Author

not yet... I considered the v2 model as the best one

@Bontempogianpaolo1
Copy link
Author

screenshot features using model-v0
image

screenshot features using model-v1

image

@binli123
Copy link
Owner

screenshot features using model-v0 image

screenshot features using model-v1

image

Those are very different from mine. There should not be values>10, they are all around the same scale. If you are using a newer GPU card please make sure cuda>=11.0, not 10.2

@Bontempogianpaolo1
Copy link
Author

Bontempogianpaolo1 commented May 23, 2022

sorry.. excel made some errors during the visualization... the real screenshots are these:

model-v0
image

model-v1

image

So all the numbers seems under the same scale....

@binli123
Copy link
Owner

normal_141_42_54
Does your normal_141_42_54.jpg look like this?
My feature csv using v2
normal_141.csv

@Bontempogianpaolo1
Copy link
Author

Bontempogianpaolo1 commented May 23, 2022

I don't have it... What are the parameters you used for the script deepzoom_tiler.py in the case of Camelyon?

This is my normal_141 48_112.jpeg
image
this is my tumor_047 101_546.jpeg
image

Mine seems with an higher magnification maybe?

@binli123
Copy link
Owner

It turns out that Camleyon16 consists of mixed magnifications, so by experimenting the correct configuration: python deepzoom_tiler.py -m 1 -b 20 -d Camelyon16-pilot -v tif

@Bontempogianpaolo1
Copy link
Author

In this way the magnitude become x10 right? is your embedder trained under this magnitude? Since it is inside the folder called x20 I didn't expect it

@binli123
Copy link
Owner

In this way the magnitude become x10 right? is your embedder trained under this magnitude? Since it is inside the folder called x20 I didn't expect it

I think it is still 20x because the base magnification has ~0.25 micro/pixel which corresponds to 40x for the Aperio scanner (FDA standard). A 20x magnification corresponds to ~0.5 micron/pixel. Camelyon16 uses a mixture of magnifications with different micron/pixel.

image

Notice how their 20x and 40x scanners have almost the same micron/pixel? You will call the "20x" RUMC a "40x" image for UMCU. So better just use the FDA standard.

@Bontempogianpaolo1
Copy link
Author

Bontempogianpaolo1 commented May 23, 2022

Ok! I'm just trying it and inside the folder "temp" the patches are stored inside a "10" folder ( imagining it refers to the magnitude). Anyway, thank you very much for your replies! I'll just try the entire pipeline again with these new patches and I'll tell you the results as soon as possible

@Bontempogianpaolo1
Copy link
Author

It worked !! But I still have problems :(... I'm opening a new issue for that since it is not relative to the dataset but to the embedder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants