Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance on PRAD downstream #69

Open
mbu93 opened this issue Dec 16, 2023 · 0 comments
Open

Poor performance on PRAD downstream #69

mbu93 opened this issue Dec 16, 2023 · 0 comments

Comments

@mbu93
Copy link

mbu93 commented Dec 16, 2023

Hi People,
First, thanks for your astonishing work! I am currently trying out how HIPT performs in a Grading task and recognised, performance is underwhelming using the features I extracted with the proposed two-stage HIPT4K (using the weights provided in this repo). I then realized, that prepared features are supplied here and recreated my efforts using them. Sadly, to no avail :( I think there is also another ticket with similar issues #19. This is the scatter plot of a 2-component PCA on the slide-level (mean) TCGA-PRAD features:

image

Note, that I filtered 16 relevant features of the total 192 by calculating the Pearson-r against the referred Gleason score (that the labels in the scatter also refer to). There appears to be a bit of clustering for Gleason 7 and 9, but overall, it doesn't seem the pretrained models capture important properties. My theory is, that this is since other Gleason scores have too few examples. I have, however, already worked with SSL vision transformers for prostate cancer histopathology and found the models had good extraction capabilities. I am also aware of work that confirms good SSL feature extraction capabilities when using the TCGA-PRAD data.

Therefore, I wanted to ask if I really got things right here. So:

  1. I am using the tensors stored in HIPT/3-Self-Supervised-Eval/embeddings_slide_lib/embeddings_slide_lib/vit256mean_tcga_slide_embeddings. I assume correctly that these refer to the features of each WSI's extracted 4k patches?
  2. The number of tumours used differs in this repo and the paper, but the provided models HAVE indeed been pretrained on the TCGA-PRAD as well, right?

Besides my technical questions, I'd really love to hear what you think about this. I'm a little short on time currently, but if you think it's worth the effort, I'd also volunteer for adapting the approach such that it achieves sufficient results on TCGA-PRAD. Maybe working on a different magnification could already do the trick, for prostate tumours cell-level information is not of uttermost importance (AFAIK). This would be of great value for the digital pathology world :)

Kind regards
M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant