Histopathologic Cancer Detection

Overview

In this competition, you must create an algorithm to identify metastatic cancer in small image patches taken from larger digital pathology scans. The data for this competition is a slightly modified version of the PatchCamelyon (PCam) benchmark dataset (the original PCam dataset contains duplicate images due to its probabilistic sampling, however, the version presented on Kaggle does not contain duplicates).

PCam is highly interesting for both its size, simplicity to get started on, and approachability. In the authors' words:

[PCam] packs the clinically-relevant task of metastasis detection into a straight-forward binary image classification task, akin to CIFAR-10 and MNIST. Models can easily be trained on a single GPU in a couple hours, and achieve competitive scores in the Camelyon16 tasks of tumor detection and whole-slide image diagnosis. Furthermore, the balance between task-difficulty and tractability makes it a prime suspect for fundamental machine learning research on topics as active learning, model uncertainty, and explainability.

Data

In this dataset, you are provided with a large number of small pathology images to classify. Files are named with an image id. The train_labels.csv file provides the ground truth for the images in the train folder. You are predicting the labels for the images in the test folder. A positive label indicates that the center 32x32px region of a patch contains at least one pixel of tumor tissue. Tumor tissue in the outer region of the patch does not influence the label. This outer region is provided to enable fully-convolutional models that do not use zero-padding, to ensure consistent behavior when applied to a whole-slide image.

The original PCam dataset contains duplicate images due to its probabilistic sampling, however, the version presented on Kaggle does not contain duplicates. We have otherwise maintained the same data and splits as the PCam benchmark.

Evaluation

Submissions are evaluated on area under the ROC curve between the predicted probability and the observed target.

Code

Code is under development. Bear in mind! To facilitated reproducibility, the file package-list-txt contains a list of all the packages present in the conda environment used for the challenge.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.dvc		.dvc
images		images
submissions		submissions
.gitignore		.gitignore
Experimental_diary.md		Experimental_diary.md
LICENSE		LICENSE
README.md		README.md
cnn_analyzer.py		cnn_analyzer.py
data.dvc		data.dvc
data_preprocess.py		data_preprocess.py
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.dvc

.dvc

images

images

submissions

submissions

.gitignore

.gitignore

Experimental_diary.md

Experimental_diary.md

LICENSE

LICENSE

README.md

README.md

cnn_analyzer.py

cnn_analyzer.py

data.dvc

data.dvc

data_preprocess.py

data_preprocess.py

environment.yml

environment.yml

main.py

main.py

Repository files navigation

Histopathologic Cancer Detection

Overview

Data

Evaluation

Code

About

Releases

Packages

Languages

License

Axel-Bravo/19_kaggle_Histopathologic-Cancer-Detection

Folders and files

Latest commit

History

Repository files navigation

Histopathologic Cancer Detection

Overview

Data

Evaluation

Code

About

Topics

Resources

License

Stars

Watchers

Forks

Languages