Skip to content

tomealbuquerque/multimodal-glioma-biomarkers-detection

Repository files navigation

Multimodal Context-Aware Detection of Glioma Biomarkers using MRI and WSI

https://link.springer.com/chapter/10.1007/978-3-031-47425-5_15

by Tomé Albuquerque, Mei Ling Fang, Benedikt Wiestler, Claire Delbridge, Maria João M. Vasconcelos, Jaime S. Cardoso and Peter Schüffler

The most malignant tumors of the central nervous system are adult-type diffuse gliomas. Historically, glioma classification has been based on morphological features. However, since 2016, WHO recognizes that molecular evaluation is critical for subtype classification. Among molecular markers, the mutation status of the IDH1 and the codeletion of 1p/19q are crucial for the precise diagnosis of these malignancies. In pathology laboratories, manual screening is time-consuming and susceptible to error. To overcome these limitations, we propose a novel multimodal biomarker classification method that integrates image features derived from brain Magnetic resonance imaging (MRI) and histopathological exams (WSI). The proposed model consists of two branches, the first branch takes as input a multi-scale Hematoxylin and Eosin (H&E) whole slide image, and the second branch uses the pre-segmented region-of-interest from the MRI. Both branches are based on Convolutional Neural Networks (CNN). After passing the exams by the two embedding branches, the output feature vectors are concatenated and a multi-layer perceptron is used to classify the glioma biomarkers based on a multi-class problem. In this work, several fusion strategies were studied, including a cascade model with mid-fusion; a mid-fusion model, a late fusion model, and a mid-context fusion model. The models were tested using a publicly available data set from The Cancer Genome Atlas (TCGA). The overall cross-validated classification obtained area under the curve (AUC) of 87.48%, 86.32%, and 81.54% for the proposed multimodal, MRI, and H&E stain slide images respectively, outperforming their both unimodal counterparts and the state-of-the-art glioma biomarker classification methods.

Schematic representation of the different models used in this work:

Documentation

Requirements
  • Image==1.5.33
  • monai==1.0.0
  • opencv_python_headless==4.5.5.62
  • openslide_python==1.2.0
  • nibabel==5.0.1
  • Pillow==9.4.0
  • scikit_image==0.19.2
  • scikit_learn==1.2.1
  • seaborn==0.11.2
  • skimage==0.0
  • torch==1.10.0
  • torchvision==0.11.1
pip install -r requirements.txt
Usage

1) Pre-processing

First, let's create a "data.pickle" with an array of dictionaries containing all the data information from MRI and WSI for the training and test process for 5-folds. The dictionary will have the following structure:

X[fold] = {
            'flair':str (path_to_image),
            't1': str (path_to_image),
            't1ce': str (path_to_image),
            't2': str (path_to_image),
            'flair_block':str (path_to_image),
            't1_block': str (path_to_block),
            't1ce_block': str (path_to_block),
            't2_block': str (path_to_block),
            'slide': str (path_to_slide),
            'tiles_coords': list of tuples (int,int)
            'tiles_coords_level': list of tuples (int,int)
            'gender': int, 
            'age': int
          }
  
Y[fold] = {
            'idh1' int, 
            'ioh1p19q': int
          }

Note that '_block' suggestes the segmented region of interest. To create run:

python data_multimodal_tcga/pre_process_data_multi_level.py

P.S.: The "data.pickle" is provided in this repo for TCGA dataset.

2) Train embedder (MRI and WSI)

  • MRI embedder
python MRI_embedder\classifier.py --modalities t1ce flair --fold 0
  • WSI embedder

To train the embedder for 552x512 tiles run the following command:

python WSI_embedder\MIL_512_tiles\mil_train_bin.py --fold 0

for 2048x2048 tiles change just the path: WSI_embedder\MIL_2048_tiles\mil_train_bin.py

After training the model it is necessary to generate a list of all tiles per slide with the output probabilities (e.g."predictions_grid_{typee}fold{args.fold}_bin.csv"), for that run:

python WSI_embedder\MIL_512_tiles\MIL_get_GRIDS.py --fold 0 --model 'checkpoint_best_512_bin_fold_0.pth'

#You can skip the training of the embedders and use the provided pre-trained models weights for WSI and for MRI:

fold WSI Original - MRI T1ce Original - MRI - T1Ce + FLAIR Segmented - MRI - T1ce Segmented - MRI - T1ce + FLAIR
0 [x] [x] [x] [x] [x]
1 [x] [x] [x] [x] [x]
2 [x] [x] [x] [x] [x]
3 [x] [x] [x] [x] [x]
4 [x] [x] [x] [x] [x]

3) Train/test multimodal aggregator (MRI + WSI)

To run the aggregation models you must have at this point 4 different files:

  • data array of dictionaries (1) Preprocessing);
  • MRI_embedder_weights;
  • WSI_embedder_weights;
  • List of output probs ("predictions_grid_{typee}fold{args.fold}_bin.csv")

If you have this files you are ready to train and test the proposed aggregators: There are 5 different .py files for the aggregators.

You can run it using the following code:

#For Unimodal WSI
python MLP_train_test_unimodal_WSI.py --fold 0 --s 10 --mix 'expected' --model 'checkpoint_best_512_bin_fold_0.pth' --results_folder 'path_results_folder'

#For multimodal MRI+WSI

# a) cascade model with mid fusion:
python MLP_train_test_multimodal_cascade_fusion.py --fold 0 --s 10 --mix 'expected' --model_PATH '0_WSI_embedder_weights\checkpoint_best_512_bin_fold_0.pth' --model_MRI '0_MRI_embedder_weights\multiclass_t1ce_flair\multiclass_fold0_t1ce_flair\multiclass_checkpoint_best_tiles.pth' --weights 'CE' --results_folder 'name_results_folder'

# b) mid fusion model:
python MLP_train_test_multimodal_mid_fusion.py --fold 0 --s 10 --mix 'expected' --model_PATH '0_WSI_embedder_weights\checkpoint_best_512_bin_fold_0.pth' --model_MRI '0_MRI_embedder_weights\multiclass_t1ce_flair\multiclass_fold0_t1ce_flair\multiclass_checkpoint_best_tiles.pth' --weights 'CE' --results_folder 'name_results_folder'

# c) late fusion model:
python MLP_train_test_multimodal_late_fusion.py --fold 0 --s 10 --mix 'expected' --model_PATH '0_WSI_embedder_weights\checkpoint_best_512_bin_fold_0.pth' --model_MRI '0_MRI_embedder_weights\multiclass_t1ce_flair\multiclass_fold0_t1ce_flair\multiclass_checkpoint_best_tiles.pth' --weights 'CE' --results_folder 'name_results_folder'

# d) mid context fusion:
python MLP_train_test_multimodal_fusion_context_aware.py --fold 0 --s 10 --mix 'expected' --model_PATH '0_WSI_embedder_weights\checkpoint_best_512_bin_fold_0.pth' --model_MRI '0_MRI_embedder_weights\multiclass_t1ce_flair\multiclass_fold0_t1ce_flair\multiclass_checkpoint_best_tiles.pth' --weights 'CE' --results_folder 'name_results_folder'

The next step to generate a file per model with several metrics (e.g. ACC, AUC, MAE, F1... confusion matrix) please run:

python print_MIL_results_tables.py --fold 0 --s 10 --mix 'expected'  --weights CE --results_folder 'name_results_folder'

4) Plot results

To generate a results table for each modality use the following code (change it according to your needs):

python print_tables.py

To generate ROC and box plots for each modality use the following code (change it according to your needs):

python Plot_ROC_curves.py
python Plot_box_plots.py

Citation

If you find this work useful for your research, please cite our paper:

@InProceedings{10.1007/978-3-031-47425-5_15,
author="Albuquerque, Tom{\'e}
and Fang, Mei Ling
and Wiestler, Benedikt
and Delbridge, Claire
and Vasconcelos, Maria Jo{\~a}o M.
and Cardoso, Jaime S.
and Sch{\"u}ffler, Peter",
editor="Woo, Jonghye
and Hering, Alessa
and Silva, Wilson
and Li, Xiang
and Fu, Huazhu
and Liu, Xiaofeng
and Xing, Fangxu
and Purushotham, Sanjay
and Mathai, Tejas S.
and Mukherjee, Pritam
and De Grauw, Max
and Beets Tan, Regina
and Corbetta, Valentina
and Kotter, Elmar
and Reyes, Mauricio
and Baumgartner, Christian F.
and Li, Quanzheng
and Leahy, Richard
and Dong, Bin
and Chen, Hao
and Huo, Yuankai
and Lv, Jinglei
and Xu, Xinxing
and Li, Xiaomeng
and Mahapatra, Dwarikanath
and Cheng, Li
and Petitjean, Caroline
and Presles, Beno{\^i}t",
title="Multimodal Context-Aware Detection of Glioma Biomarkers Using MRI and WSI",
booktitle="Medical Image Computing and Computer Assisted Intervention -- MICCAI 2023 Workshops",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="157--167",
abstract="The most malignant tumors of the central nervous system are adult-type diffuse gliomas. Historically, glioma subtype classification has been based on morphological features. However, since 2016, WHO recognizes that molecular evaluation is critical for subtyping. Among molecular markers, the mutation status of IDH1 and the codeletion of 1p/19q are crucial for the precise diagnosis of these malignancies. In pathology laboratories, however, manual screening for those markers is time-consuming and susceptible to error. To overcome these limitations, we propose a novel multimodal biomarker classification method that integrates image features derived from brain magnetic resonance imaging and histopathological exams. The proposed model consists of two branches, the first branch takes as input a multi-scale Hematoxylin and Eosin whole slide image, and the second branch uses the pre-segmented region of interest from the magnetic resonance imaging. Both branches are based on convolutional neural networks. After passing the exams by the two embedding branches, the output feature vectors are concatenated, and a multi-layer perceptron is used to classify the glioma biomarkers as a multi-class problem. In this work, several fusion strategies were studied, including a cascade model with mid-fusion; a mid-fusion model, a late fusion model, and a mid-context fusion model. The models were tested using a publicly available data set from The Cancer Genome Atlas. Our cross-validated classification models achieved an area under the curve of 0.874, 0.863, and 0.815 for the proposed multimodal, magnetic resonance imaging, and Hematoxylin and Eosin stain slide images respectively, indicating our multimodal model outperforms its unimodal counterparts and the state-of-the-art glioma biomarker classification methods.",
isbn="978-3-031-47425-5"
}

If you have any questions about our work, please do not hesitate to contact tome.albuquerque@gmail.com

Releases

No releases published

Packages

No packages published