Skip to content

Latest commit

 

History

History

SDATA

Real deep learning can generalise to more than one species: A Comparative Three Species Whole Slide Image Dataset

MIT license DOI

This repository contains code to replicate the results from the paper: Real deep learning can generalise to more than one species: A Comparative Three Species Whole Slide Image Dataset and links to corresponding jupyter notebooks. The dataset can be examined at EXACT with the username SDATA_EIPH_2021 and the password SDATA_ALBA

Paper

@article{marzahl2021MultipleSpecies,
  author    = {Christian Marzahl and
               Jenny Hill and
               Jason Stayt and
               Dorothee Bienzle and
               Lutz Welker and
               Frauke Wilm and
               J{\"{o}}rn Voigt and
               Marc Aubreville and
               Andreas K. Maier and
               Robert Klopfleisch and
               Katharina Breininger and
               Christof A. Bertram},
  title     = {Inter-Species Cell Detection: Datasets on pulmonary hemosiderophages
               in equine, human and feline specimens},
  journal   = {CoRR},
  volume    = {abs/2108.08529},
  year      = {2021},
  url       = {https://arxiv.org/abs/2108.08529},
  abstract  = {Pulmonary hemorrhage (P-Hem) occurs among multiple species and can have various causes. Cytology of bronchoalveolarlavage fluid (BALF) using a 5-tier scoring system of alveolar macrophages based on their hemosiderin content is considered the most sensitive diagnostic method. We introduce a novel, fully annotated multi-species P-Hem dataset which consists of 74 cytology whole slide images (WSIs) with equine, feline and human samples. To create this high-quality and high-quantity dataset, we developed an annotation pipeline combining human expertise with deep learning and data visualisation techniques. We applied a deep learning-based object detection approach trained on 17 expertly annotated equine WSIs, to the remaining 39 equine, 12 human and 7 feline WSIs. The resulting annotations were semi-automatically screened for errors on multiple types of specialised annotation maps and finally reviewed by a trained pathologists. Our dataset contains a total of 297,383 hemosiderophages classified into five grades. It is one of the largest publicly availableWSIs datasets with respect to the number of annotations, the scanned area and the number of species covered.}
  eprinttype = {arXiv},
  eprint    = {2108.08529},
  timestamp = {Mon, 23 Aug 2021 14:07:13 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2108-08529.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}


Start and Structure

  • Please install the requirements.txt pip install -r requirements.txt
  • Download the slides Download.ipynb or download individual files with the EXCEL file
  • Install Openslide for Linux !apt-get install python3-openslide or Windows https://openslide.org/download/
  • The folder Statistics contains notebooks which analyse the dataset annotations and general information about the slides
  • Inference contains code to train the described models and perform inference on slides.
  • Regression trains the regression models to predict a continuous EIPH grade and is used for creating the density maps
  • Cluster contains code to create custom annotation maps and synchronise the generated images and annotations with EXACT.

Annotation formats

The annotations for the cells are available in different formats.

The following EXEL file contains the links to download the individual files. For an automated download, please follow the instructions at the Start and Structure section.

Notebooks

Section Species Name Describtion
- - Download.ipynb Download the WSI files from Figshare
Cluster - ClusterCellsBySize.ipynb Notebook to cluster the cells by size as described in the paper section 2.6 Semi-automatic data cleaning via customised clustering
Cluster Feline Create_DensityWSI-Cat.ipynb Notebook to cluster the cat cells by their EIPH score as described in the paper section 2.8 Density map
Cluster Equine Create_DensityWSI-Equine.ipynb Notebook to cluster the equine cells by their EIPH score as described in the paper section 2.8 Density map
Cluster Human Create_DensityWSI-Human.ipynb Notebook to cluster the human cells by their EIPH score as described in the paper section 2.8 Density map
Cluster - SyncSizeClusterResults.ipynb Notebook to sync changes made by the pathologist in EXACT on the density maps to the original WSIs 2.8 Density map
Inference - FilterInferenceResults.ipynb Notebook to the perform non-maximum-suppression and confidence thresholding on the inference results 2.5 Inter-species inference from a pre-trained model
Inference - UploadPickleToEXACT Notebook to upload the inference results to EXACT
Inference - Fine-tune-SREP-V2 Notebook to fine-tune the deep learning-based object detection model on the new V2 annotations 4.1 Reevaluation of the inference step
Inference Equine EquineFold-1-VS-HumanCat Train on the first equine fold an validate on the human and feline samples.
Inference Equine EquineFold-2-VS-HumanCat Train on the second equine fold an validate on the human and feline samples.
Inference Equine EquineFold-3-VS-HumanCat Train on the third equine fold an validate on the human and feline samples.
Inference - Fine-tune-SREP-V2-Ablation Notebook to fine-tune the deep learning-based object detection model on the new V2 annotations in an ablation manner with an increasing number of annotations 4.1 Reevaluation of the inference step
Inference - TrainSREP-V2 Notebook to train the deep learning-based object detection model on the new V2 annotations 4.1 Reevaluation of the inference step
Domain Adaptation Feline CatvsCat Notebook to train the deep learning-based object detection model on cat WSI and validate on cat WSI 4.2 Inter-species domain adaptation
Domain Adaptation Feline CatvsEquine Notebook to train the deep learning-based object detection model on cat WSI and validate on equine WSI 4.2 Inter-species domain adaptation
Domain Adaptation Feline CatvsHuman Notebook to train the deep learning-based object detection model on cat WSI and validate on human WSI 4.2 Inter-species domain adaptation
Domain Adaptation Equine EquineVsCat Notebook to train the deep learning-based object detection model on equine WSI and validate on cat WSI 4.2 Inter-species domain adaptation
Domain Adaptation Equine EquineVsEquine Notebook to train the deep learning-based object detection model on equine WSI and validate on equine WSI 4.2 Inter-species domain adaptation
Domain Adaptation Equine EquineVsHuman Notebook to train the deep learning-based object detection model on equine WSI and validate on human WSI 4.2 Inter-species domain adaptation
Domain Adaptation Human HumanVsCat Notebook to train the deep learning-based object detection model on human WSI and validate on cat WSI 4.2 Inter-species domain adaptation
Domain Adaptation Human HumanVsEquine Notebook to train the deep learning-based object detection model on human WSI and validate on equine WSI 4.2 Inter-species domain adaptation
Domain Adaptation Human HumanVsHuman Notebook to train the deep learning-based object detection model on human WSI and validate on human WSI 4.2 Inter-species domain adaptation
Domain Adaptation results - CrossSpeciesValidation Notebook to analyse the inter-species cross-validation results 4.2 Inter-species domain adaptation
Domain Adaptation results - SourceTargetDomainVisualisation Notebook to visualise the inter-species cross-validation results 4.2 Inter-species domain adaptation
Ablation study Feline CatVsCatAblation Notebook to train the deep learning-based object detection model to perform an ablation study on an increasing number of slides. 4.3 Ablation study
Ablation study Equine EquineVsEquineAblation Notebook to train the deep learning-based object detection model to perform an ablation study on an increasing number of slides. 4.3 Ablation study
Ablation study Human HumanVsHumanAblation Notebook to train the deep learning-based object detection model to perform an ablation study on an increasing number of slides. 4.3 Ablation study
Ablation study results - AblationStudy Notebook to present the annotation study results as figures for the final paper. 4.3 Ablation study
Ablation study results - TableAblationStudyAnnotations Notebook to show the annotation study results. 4.3 Ablation study
Regression Equine baseline-Regression Notebook to train the deep learning-based regression model to predict continuous EIPh scores. 2.8 Density map
Regression Equine SetScoresAtEXACT Notebook to set the predicted EIPh scores at all cells on EXACT. 2.8 Density map
Annotation statistics Equine AnnotationVersionStatistics-SDATA-Equine Equine annotation statistics. 2 Methods
Annotation statistics Feline AnnotationVersionStatistics-SDATA-Feline Feline annotation statistics. 2 Methods
Annotation statistics Human AnnotationVersionStatistics-SDATA-Human Human annotation statistics. 2 Methods
Annotation statistics SREP Human AnnotationVersionStatistics-SREP Annotation statistics from the SREP publication. 2 Methods
Annotation statistics total Human AnnotationVersionStatistics-Total Annotation statistics from the final publication. 2 Methods
WSIs area - AreaAndScreeing Calculates the area of the WSIs 2 Methods
Download annotations SREP Equine DownloadAnnotationsSREP Download the annotations for the original SREP dataset. 2 Methods
Download annotations Equine DownloadAnnotationsSDATA_Equine Download the annotations for the SDATA dataset. 2 Methods
Download annotations Feline DownloadAnnotationsSDATA_CAT Download the annotations for the SDATA dataset. 2 Methods
Download annotations Human DownloadAnnotationsSDATA_Human Download the annotations for the SDATA dataset. 2 Methods

Troubleshooting

If you are facing the following error message at GitHub

Sorry, something went wrong. Reload?

Please use:

https://nbviewer.jupyter.org/github/ChristianMarzahl/EIPH_WSI/tree/master/SDATA/

Object detection library

For object detection on whole slide images (WSI), we use code from this https://github.com/ChristianMarzahl/ObjectDetection repository. If you are using the repository or parts thereof, please cite the corresponding paper:

@article{marzahl2020deep,
  title={Deep learning-based quantification of pulmonary hemosiderophages in cytology slides},
  author={Marzahl, Christian and Aubreville, Marc and Bertram, Christof A and Stayt, Jason and Jasensky, Anne-Katherine and Bartenschlager, Florian and Fragoso-Garcia, Marco and Barton, Ann K and Elsemann, Svenja and Jabari, Samir and Jens, Krauth and Prathmesh, Madhu and Jörn, Voigt and Jenny, Hill and Robert, Klopfleisch and Andreas, Maier },
  journal={Scientific Reports},
  volume={10},
  number={1},
  pages={1--10},
  year={2020},
  publisher={Nature Publishing Group}
}

Overview

OVerview Overview of the macrophage annotation and validation pipeline: The RetinaNet object-detection model trained on 16 equine slides4 is used to perform inference on the remaining slides, followed by a semi-automatic clustering step which clusters cells by size. Error-prone cells are highlighted and can then be efficiently deleted by a human expert. Afterwards, a human expert screens all WSI to increase the dataset consistency. Finally, a regression-based clustering system is applied to support experts searching for misclassifications of the hemosiderin grade.

Species confusion matrix

OVerview Results of the ablation study using our customised RetinaNet object detector on an increasing number of humane, equine and feline training patches of size 1024 x 1024 pixel from one WSI or up to five complete WSIs. The boxes represent the total number of hemosiderophages used for training in combination with the mAP graphs for each species.

Species confusion matrix

OVerview Each of the nine figures show on the left the species source training domain and on the top the species target domain with the obtained mAP. Green bounding boxes represent grade zero hemosiderophages while red show grade one