Skip to content

joycenerd/mmdet-nucleus-instance-segmentation

Repository files navigation

mmdet-nucleus-instance-segmentation

License: MIT

by Zhi-Yi Chin

This repository is implementation of homework3 for IOC5008 Selected Topics in Visual Recognition using Deep Learning course in 2021 fall semester at National Yang Ming Chiao Tung University.

In this homework, we participate in nuclei segmentation challenge on CodaLab. In this challenge, we perform instance segmentation on TCGA nuclei dataset from the 2018 Kaggle Data Science Bowl. This dataset contains 24 training images with 14,598 nuclei and 6 test images with 2,360 nuclei. For training, pre-trained models are allowed, but no external data should be used. We apply four existing methods to solve this challenge.

Getting the code

You can download a copy of all the files in this repository by cloning this repository:

git clone https://github.com/joycenerd/mmdet-nucleus-instance-segmentation.git

Requirements

You need to have Anaconda or Miniconda already installed in your environment. To install requirements:

1. Create a conda environment

conda create -n openmmlab python=3.7 -y
conda activate openmmlab

2. Install mmdetection

conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch
pip install openmim
mim install mmdet

3. Update mmdetection to custom need

cd mmdetection
python setup.py install

For more information please visit: get_started.md

4. Install imantics (for converting COCO segmentation annotation)

pip install imantics

Dataset

You can choose to download the data that we have pre-processed already or you can download the raw data.

Option#1: Download the data that have been pre-processed

  1. Download the data from the Google drive link: nucleus_data.zip
  2. After decompress the zip file, the data folder structure should look like this:
nucleus_data
├── all_train
│   ├── TCGA-18-5592-01Z-00-DX1.png
│   ├── TCGA-21-5784-01Z-00-DX1.png
│   ├── TCGA-21-5786-01Z-00-DX1.png
│   ├── ......
├── annotations
│   ├── instance_all_train.json
│   ├── instance_test.json
│   ├── instance_train.json
│   ├── instance_val.json
│   └── test_img_ids.json
├── classes.txt
├── test
│   ├── TCGA-50-5931-01Z-00-DX1.png
│   ├── TCGA-A7-A13E-01Z-00-DX1.png
│   ├── ......
├── train
│   ├── TCGA-18-5592-01Z-00-DX1.png
│   ├── TCGA-21-5786-01Z-00-DX1.png
│   ├── ......
└── val
    ├── TCGA-21-5784-01Z-00-DX1.png
    ├── TCGA-B0-5711-01Z-00-DX1.png
    ├── ......

Option#2: Download the raw data

  1. Download the data from the Google drive link: dataset.zip
  2. After decompress the zip file, the data folder structure should look like this:
dataset
├── test
│   ├── .ipynb_checkpoints
│   │   ├── TCGA-50-5931-01Z-00-DX1-checkpoint.png
│   │   ├── TCGA-AY-A8YK-01A-01-TS1-checkpoint.png
│   │   ├── TCGA-G9-6336-01Z-00-DX1-checkpoint.png
│   │   └── TCGA-G9-6348-01Z-00-DX1-checkpoint.png
│   ├── TCGA-50-5931-01Z-00-DX1.png
│   ├── TCGA-A7-A13E-01Z-00-DX1.png
│   ├── TCGA-AY-A8YK-01A-01-TS1.png
│   ├── TCGA-G2-A2EK-01A-02-TSB.png
│   ├── TCGA-G9-6336-01Z-00-DX1.png
│   └── TCGA-G9-6348-01Z-00-DX1.png
├── test_img_ids.json
└── train
    ├── TCGA-18-5592-01Z-00-DX1
    │   ├── images
    │   │   └── TCGA-18-5592-01Z-00-DX1.png
    │   └── masks
    │       ├── .ipynb_checkpoints
    │       │   └── mask_0002-checkpoint.png
    │       ├── mask_0001.png
    │       ├── mask_0002.png
    │       ├── ......
    ├── TCGA-RD-A8N9-01A-01-TS1
    │   ├── images
    │   │   └── TCGA-RD-A8N9-01A-01-TS1.png
    │   └── masks
    │       ├── mask_0001.png
    │       ├── mask_0002.png
    │       ├── ......
    └── ......

Data pre-processing

Note: If you download the data by following option#1 you can skip this step.

If your raw data folder structure is different, you will need to modify train_valid_split.py and mask2coco.py before executing the code.

1. train valid split

In default we split the whole training set to 80% for training and 20% for validation.

python train_valid_split.py --data-root <save_dir>/dataset/train --ratio 0.2 --out-dir <save_dir>/nucleus_data
  • input: original whole training image directory
  • output: new data dir name nucleus_data, inside this directory there will be to folders train/ and val/ with images inside

2. convert binary mask images into COCO segmentation annotation.

python mask2coco.py --mode <train_or_val> --data_root <save_dir>/nucleus_data/<train_or_val> --mask_root <save_dir>/dataset/train --out_dir <save_dir>/nucleus_data/annotations
  • input:
    1. train or val folder path from the last step
    2. binary mask saving root directory
  • output: instance_train.json or instance_val.json in nucleus_data/annotations/

Training

You should have Graphics card to train the model. For your reference, we trained on a single NVIDIA Tesla V100.

1. Download the pre-trained weights (pre-trained on COCO)

Model Backbone Lr_schd Download
Mask RCNN R50 3x model
Mask RCNN X101 3x model
Cascade Mask RCNN R50 3x model
Cascade Mask RCNN X101 3x model
PointRend R50 3x model
Mask Scoring RCNN X101 1x model

2. Modify config file

Go to Results and Models and find model configuration you want to train. You will need to modify the configuration file in order to train the model. Things you need to modify are:

  • ann_file and img_prefix in the data section
  • Put the downloaded pre-trained weights path in load_from

Tips: We get a better results when using all 24 images for training. You can try img_prefix: all_train and ann_file: instance_all_train.json

Also, you can find all my custom configuration files in mmdetection/configs/nucleus, you can modify with your own need.

3. Train the model

python tools/train.py <config_file_path> --work-dir <save_dir>/train
  • input: model configuration file
  • output: checkpoints every epoch and training logs will be saved in <save_dir>/train

Validation

In the configuration file, the testing ann_file and img_prefix should put the validation data path, not the testing data path because test data doesn't has ground truth.

python tools/test.py <config_file_path> <save_dir>/train/epoch<X>.pth --eval bbox segm --work-dir <save_dir>/val
  • input:
    • model configuration file
    • checkpoint you save at the last step
  • output: validation logs

Testing

1. Convert image to coco format

Note: If you download the data by following option#1 in Dataset section you can skip this step.

python tools/dataset_converters/images2coco.py <data_dir>/nucleus_data/test <data_dir>/nucleus_data/classes.txt instance_test.json --imgid_json <data_dir>/nucleus_data/annotations/test_img_ids.json
  • input:
    • test image directory
    • classes.txt: class names
    • test_img_ids.json: test image id
  • output: instance_test.json

2. Generate testing results

Before testing, please ensure the test image folder path and the path of instance_test.json are correct in the model configuration file

python tools/test.py <config_file_path> <save_dir>/train/epoch_<X>.json --format-only --options "jsonfile_prefix=test" --show
  • input:
    • model configuration file
    • trained model checkpoint
  • output:
    • test.segm.json: instance segmentation results
    • test.bbox.json: detection results

Submit the results

  1. rename the result file: mv test.segm.json answer.json
  2. compress the file: zip answer.zip answer.json
  3. upload the result to CodaLab to get the testing score

Results and Models

Model Backbone Lr_schd Mask AP Config Download
Mask RCNN R50 3x 0.2323 config model
Mask RCNN X101 3x 0.2316 config -
Cascade Mask RCNN R50 3x 0.2428 config model
Cascade Mask RCNN X101 3x 0.2444 config model
PointRend R50 3x 0.2439 config model
Mask Scoring RCNN X101 1x 0.2420 config model

Inference

Note we use Cascade Mask RCNN as our model with X101 as our backbone To reproduce our best results, do the following steps:

  1. Getting the code
  2. Install the dependencies
  3. Download the data: please download the data by following option#1
  4. Download pre-trained weights
  5. Modify config file:
  6. Download checkpoints
  7. Testing
  8. Submit the results

FAQ

If any problem occurs when you are using this project, you can first check out faq.md to see if there are solutions to your problem.

GitHub Acknowledgement

We thank the authors of these repositories:

Citation

If you find our work useful in your project, please cite:

@misc{
    title = {mmdet-nucleus-instance-segmentation},
    author = {Zhi-Yi Chin},
    url = {https://github.com/joycenerd/mmdet-nucleus-instance-segmentation},
    year = {2021}
}

Contributing

If you'd like to contribute, or have any suggestions, you can contact us at joycenerd.cs09@nycu.edu.tw or open an issue on this GitHub repository.

All contributions welcome! All content in this repository is licensed under the MIT license.