Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrained embedders #41

Open
yangsenwxy opened this issue Jun 1, 2022 · 6 comments
Open

Pretrained embedders #41

yangsenwxy opened this issue Jun 1, 2022 · 6 comments

Comments

@yangsenwxy
Copy link

I have a question, your simlr is pre-training, does it include all the data of camelyon16 (training set and test set)? Because I found that your feature extractor is faulty, you leaked the information of the test set, I tried, only pre-trained on the training set, there is no such high result, I think you should check this problem carefully, resulting in your result is too high

@binli123
Copy link
Owner

binli123 commented Jun 2, 2022

https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi
There are several model weights trained using only the training data. I also tested using both the training set and the testing set for SimCLR, the difference in the results is minor. What is the batch size you used? Please make sure the batch size is at least 512 and train for enough iterations in order to get an actual useful embedder from SimCLR, as pointed out in their paper. Bigger batch size and longer training time lead to better embedder and they have quite a big impact on the performance of the downstream task. The best embedder we obtained was trained for 2 months because of the large number of patches.

Plus, we are not the only ones who had luck with self-supervised learning on Camelyon16, https://arxiv.org/pdf/2012.03583.pdf where they showed that very high results can be obtained.

@yangsenwxy
Copy link
Author

Thank you very much, I found that the features you extracted are only 0.86 if you train directly with the CLAM method.

@raycaohmu
Copy link

Hi, are those weights trained using tcga data?

@GeorgeBatch
Copy link
Contributor

GeorgeBatch commented Feb 18, 2023

@raycaohmu

Camelyon16 weights: https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi

  • see folder names for magnifications

TCGA-lung weights: https://drive.google.com/drive/folders/1Rn_VpgM82VEfnjiVjDbObbBFHvs0V1OE

  • magnification: low=2.5x, high=10x
  • pre-taining: v0 for 3 days, v1 for 2 weeks (better results)

@PiumiDS
Copy link

PiumiDS commented Feb 25, 2023

Camelyon16 weights: https://drive.google.com/drive/folders/1_mumfTU3GJRtjfcJK_M0fWm048sYYFqi

  • see folder names for magnifications

TCGA-lung weights: https://drive.google.com/drive/folders/1Rn_VpgM82VEfnjiVjDbObbBFHvs0V1OE

  • magnification: low=2.5x, high=10x
  • pre-taining: v0 for 3 days, v1 for 2 weeks (better results)

Hi @GeorgeBatch,

I have seen the previous discussion on the magnification change for TCGA-lung patches. Could I please verify that when the above pre-trained model is specified as,

  • magnification: low=2.5x, high=10x

this is only for 20x patches of the whole dataset? (so the pre-trained model is trained on 20x,5x (for 40x images) and 10x,2.5x (for 20x images))

Many thanks in advance.
Piumi.

@GeorgeBatch
Copy link
Contributor

GeorgeBatch commented Feb 25, 2023

Hi @PiumiDS,

this is only for 20x patches of the whole dataset? (so the pre-trained model is trained on 20x,5x (for 40x images) and 10x,2.5x (for 20x images))

I am afraid I do not know the answer to your question myself. So here we will both need to wait for @binli123's answer

George

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants