Skip to content

Latest commit

 

History

History
323 lines (261 loc) · 12.6 KB

README.md

File metadata and controls

323 lines (261 loc) · 12.6 KB

Pytorch implementation Self-Rule to Adapt (SRA):

Self-Rule to Adapt: Generalized Multi-source Feature Learning Using Unsupervised Domain Adaptation for Colorectal Cancer Tissue Detection

Supervised learning is constrained by the availability of labeled data, which are especially expensive to acquire in the field of digital pathology. Making use of open-source data for pre-training or using domain adaptation can be a way to overcome this issue. However, pre-trained networks often fail to generalize to new test domains that are not distributed identically due to variations in tissue stainings, types, and textures. Additionally, current domain adaptation methods mainly rely on fully-labeled source datasets.

In this work, we propose SRA, which takes advantage of self-supervised learning to perform domain adaptation and removes the necessity of a fully-labeled source dataset. SRA can effectively transfer the discriminative knowledge obtained from a few labeled source domain's data to a new target domain without requiring additional tissue annotations. Our method harnesses both domains' structures by capturing visual similarity with intra-domain and cross-domain self-supervision. Moreover, we present a generalized formulation of our approach that allows the architecture to learn from multi-source domains. We show that our proposed method outperforms baselines for domain adaptation of colorectal tissue type classification and further validate our approach on our in-house clinical cohort. The code and models are available open-source:

Pipeline


Usage & requirements

In this section, we present how to use SRA to train your own architecture. Please, first clone the repo and install the dependencies.

# To clone the repo
git clone git@github.com:christianabbet/SRA.git
cd SRA

# Create environment and activate it
conda create --name sra python=3.8 -y
conda activate sra

# Install pytorch 
conda install -y pytorch==1.6.0 torchvision==0.7.0 -c pytorch

# Install other packages
conda install -y matplotlib shapely tqdm tensorboard==2.3.0
pip install albumentations openslide-python
pip install git+https://github.com/lucasb-eyer/pydensecrf.git

Pretrained models

You can download part of the model used for the publication. The pretrained version is composed of the two branches of the architecture without the linear classifier. The classification model is only composed of one branch as well as the classification (source) layer. We indicate the source and target dataset used for each training.

Arch Source Target n classes download
sra Kather19 In-house 9 pretrained classification
srma Kather19 In-house 9 pretrained classification
sra Kather19 + CRCTP In-house 10 pretrained classification
srma Kather19 + CRCTP In-house 10 pretrained classification

Training

Step 1: Download publicly available data (source)

Here is a non-exhaustive list of the publicly available dataset of colorectal tissues:

Name #Samples #Classes Links
Kather16 5,000 8 download paper
Kather19 100,000 9 download paper
CRCTP 196,000 7 download paper

[Dec 2023] !!! The CRCTP is not publicly available anymore !!!

You can download the previous dataset using the commands:

# Create data folder
mkdir data

# Download Kather16 training/test data
wget -O Kather_texture_2016_image_tiles_5000.zip https://zenodo.org/record/53169/files/Kather_texture_2016_image_tiles_5000.zip?download=1
unzip Kather_texture_2016_image_tiles_5000.zip && rm Kather_texture_2016_image_tiles_5000.zip
mv Kather_texture_2016_image_tiles_5000 data

# Download Kather19 training data
wget -O NCT-CRC-HE-100K.zip https://zenodo.org/record/1214456/files/NCT-CRC-HE-100K.zip?download=1
unzip NCT-CRC-HE-100K.zip && rm NCT-CRC-HE-100K.zip
mv NCT-CRC-HE-100K data

# Download CRCTP training/test data (Before Dec 2023)
wget -O fold2.zip https://warwick.ac.uk/fac/cross_fac/tia/data/crc-tp/fold2.zip
7z x fold2.zip && rm fold2.zip
mv Fold2 data/CRCTP

Step 2: Create your Dataset (target)

To perform domain alignment, we need to create a target set. To do so, either use your own dataset or generate one using the following script. The data_query should indicate the query to the target whole slides images (*.mrxs, *.svs, ...). The script extracts from each whole slide n_subset tiles picked at random from the foreground and saves them under export path.

python create_targets.py --data_query "/path/to/data/*.mrxs" --export data/GENERATED_TARGETS --n_subset 200

Step 3: Train the model

To train the model with single-source domain:

# Define variables
DATASET_SRC="data/NCT-CRC-HE-100K"
DATASET_TAR="data/GENERATED_TARGETS"
# Train unsupervised architecture
python train_sra.py --root "${DATASET_SRC}:${DATASET_TAR}" --exp_name sra_k19
# Train linear classifier on top
# Note: You can use the model provided on the google drive (checkpoint_sra_k19_inhouse.pth)
python train_sra_cls.py --name="kather19" --root "${DATASET_SRC}" --loadpath=best_model_sra_k19.pth

To train the model with multi-source domain:

# Define variables
DATASET_SRC1="/data/CRCTP/Training"
DATASET_SRC2="data/NCT-CRC-HE-100K"
DATASET_TAR="data/GENERATED_TARGETS"
# Train unsupervised architecture
python train_sra.py --root="${DATASET_SRC1}:${DATASET_SRC2}:${DATASET_TAR}"  --exp_name sra_crctp_k19
# Train linear classifier on top
# Note: You can use the model provided on the google drive (checkpoint_sra_crctp_k19_inhouse.pth)
python train_sra_cls.py --name="crctp-cstr+kather19" --root "${DATASET_SRC1}:${DATASET_SRC2}" --loadpath=/path/to/pretrained/model.pth

Step 4: WSIs Classification

The pre-trained models (with and without the linear classifier) are available in the pretrained moodle section. Here we show how to classify a slide taken from the TCGA cohort. The slides are available for download.

# Infer WSI using K19 label
python infer_wsi_classification.py \
  --wsi_path TCGA-CK-6747-01Z-00-DX1.7824596c-84db-4bee-b149-cd8f617c285f.svs \
  --model_path best_model_srma_cls_k19.pth \
  --config conf_wsi_classification_k19.yaml

# Infer WSI using K19+CRCTP label
python infer_wsi_classification.py \
  --wsi_path TCGA-CK-6747-01Z-00-DX1.7824596c-84db-4bee-b149-cd8f617c285f.svs \
  --model_path best_model_srma_cls_k19crctp.pth \
  --config conf_wsi_classification_k19crctp.yaml

To run the prediction on multiple slides, you can use unix-like queries. Be careful to use the quotes around the wsi_path argument as below.

python infer_wsi_classification.py \
  --wsi_path "/PATH/TO/DATA/*.svs" \
  --model_path best_model_srma_cls_k19.pth \
  --config conf_wsi_classification_k19.yaml

You can find the predictions under the outputs folder.

Step 5: QuPath Visualization

You can visualize the predictions using QuPath. To do so, follow the steps:

  1. Open QuPath
  2. Open the WSIs image (*.mrsx, *.svs, ...)
  3. Select Automate->Show script editor
  4. Copy paste the script located under SRA/script_qupath/annotation_loader.groovy
  5. Run the script (Run->Run or CTRL+R).
  6. Select the json file containing the detection output. This file is generated by the script infer_wsi_classification.py mentioned above.
  7. Enjoy

The expected output is displayed below in the results section. Note that if the detection is not showing up, please make sure you activated filled detection (Press F or view->Fill detections).


Results

WSI Classification

The expected classification result using SRMA model on the selected slide from TCGA cohort.

Original WSI

TCGA-CK-6747_overlay

Multiclass

TCGA-CK-6747_overlay

Tumor Detection Heatmap

TCGA-CK-6747_overlay

t-SNE

We present the t-SNE projection of the results of domain adaptation processes from Kather19 to our in-house dataset. Kather19 to inhouse

As well as the multi-source case CRCTP_Kather19 to inhouse

Crop Segmentation

To validate our approach on a real case scenario, we perform domain adaptation using our proposed model from Kather19 to whole slide image sections from our in-house dataset. The results are presented here, alongside the original H&E image, their corresponding labels annotated by an expert pathologist, as well as comparative results of previous approaches smoothed using conditional random fields as in L. Chan (2018). The sections were selected such that, overall, they represent all tissue types equally.

Segmentation result

Segmentation cstr result


Citation

If you use this work, please use the following citations :).

# Single-source domain adaptation
@inproceedings{
    abbet2021selfrule,
    title={Self-Rule to Adapt: Learning Generalized Features from Sparsely-Labeled Data Using Unsupervised Domain Adaptation for Colorectal Cancer Tissue Phenotyping},
    author={Christian Abbet and Linda Studer and Andreas Fischer and Heather Dawson and Inti Zlobec and Behzad Bozorgtabar and Jean-Philippe Thiran},
    booktitle={Medical Imaging with Deep Learning},
    year={2021},
    url={https://openreview.net/forum?id=VO7asaS5GUk}
}

# Multi-source domain adaptation (a generalization of previous work to multi-source domains)
@article{
    abbet2022selfrulemulti,
    title = {Self-Rule to Multi-Adapt: Generalized Multi-source Feature Learning Using Unsupervised Domain Adaptation for Colorectal Cancer Tissue Detection},
    journal = {Medical Image Analysis},
    pages = {102473},
    year = {2022},
    issn = {1361-8415},
    doi = {https://doi.org/10.1016/j.media.2022.102473},
    author = {Christian Abbet and Linda Studer and Andreas Fischer and Heather Dawson and Inti Zlobec and Behzad Bozorgtabar and Jean-Philippe Thiran},
}

# Applicatio to Tumor-Stroma ratio quantification
@inproceedings{
   abbet2022toward,
   title={Toward Automatic Tumor-Stroma Ratio Assessment for Survival Analysis in Colorectal Cancer},
   author={Christian Abbet and Linda Studer and Inti Zlobec and Jean-Philippe Thiran},
   booktitle={Medical Imaging with Deep Learning},
   year={2022},
   url={https://openreview.net/forum?id=PMQZGFtItHJ}
}