Name	Name	Last commit message	Last commit date
parent directory ..
configs	configs
datasets	datasets
README.md	README.md
__init__.py	__init__.py
setup.py	setup.py
test.py	test.py
vis.py	vis.py

Semantic Segmentation Models of BDD100K

The semantic segmentation task involves predicting a segmentation mask for each image indicating a class label for every pixel.

The BDD100K dataset contains fine-grained semantic segmentation annotations for 10K images (7K/1K/2K for train/val/test). Each annotation is a segmentation mask containing labels for 19 diverse object classes. For details about downloading the data and the annotation format for this task, see the official documentation.

Model Zoo

For training the models listed below, we follow the common settings used by MMSegmentation (details here), unless otherwise stated. All models are trained on either 8 GeForce RTX 2080 Ti GPUs or 8 TITAN RTX GPUs with a batch size of 2x8=16.

Models
- FCN
- PSPNet
- Deeplabv3
- Deeplabv3+
- UPerNet
- PSANet
- NLNet
- Semantic FPN
- EMANet
- DMNet
- APCNet
- HRNet
- CCNet
- GCNet
- DNLNet
- PointRend
- Vision Transformer
- DeiT
- Swin Transformer
- DPT
- ConvNeXt
Usage
Contribution

FCN

Fully Convolutional Networks for Semantic Segmentation [CVPR 2015 / TPAMI 2017]

Authors: Jonathan Long, Evan Shelhamer, Trevor Darrell

Abstract

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build "fully convolutional" networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a novel architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20\% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes one third of a second for a typical image.

Results

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	769 * 769	59.87	scores	52.59	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	40K	512 * 1024	59.80	scores	53.06	scores	config	model \| MD5	preds	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	512 * 1024	61.88	scores	54.50	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	512 * 1024	62.03	scores	54.99	scores	config	model \| MD5	preds \| masks	visuals
R-101-D8	80K	512 * 1024	63.62	scores	56.32	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	769 * 769	61.62	scores	55.17	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	40K	512 * 1024	62.16	scores	55.20	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	512 * 1024	62.55	scores	55.19	scores	config	model \| MD5	preds \| masks	visuals
R-101-D8	80K	512 * 1024	63.23	scores	56.24	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	769 * 769	61.22	scores	55.61	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	40K	512 * 1024	62.51	scores	55.14	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	512 * 1024	63.96	scores	56.08	scores	config	model \| MD5	preds \| masks	visuals
R-101-D8	80K	512 * 1024	64.49	scores	57.00	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	769 * 769	60.01	scores	54.39	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	40K	512 * 1024	61.12	scores	53.97	scores	config	model \| MD5	preds	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	512 * 1024	61.41	scores	54.56	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	512 * 1024	61.99	scores	54.59	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	512 * 1024	61.38	scores	54.11	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	512 * 1024	60.98	scores	55.00	scores	config	model \| MD5	preds	visuals

Backbone	GN	Deform. Conv.	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-FPN			40K	512 * 1024	59.24	scores	52.89	scores	config	model \| MD5	preds	visuals
R-50-FPN			80K	512 * 1024	60.36	scores	52.92	scores	config	model \| MD5	preds	visuals
R-50-FPN	✓		40K	512 * 1024	59.44	scores	53.42	scores	config	model \| MD5	preds	visuals
R-50-FPN	✓		80K	512 * 1024	60.21	scores	53.00	scores	config	model \| MD5	preds	visuals
R-50-FPN	✓	✓	40K	512 * 1024	61.53	scores	54.31	scores	config	model \| MD5	preds	visuals
R-50-FPN	✓	✓	80K	512 * 1024	60.55	scores	53.91	scores	config	model \| MD5	preds	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	769 * 769	62.05	scores	54.52	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	769 * 769	62.30	scores	55.46	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	512 * 1024	60.94	scores	54.08	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	512 * 1024	62.30	scores	54.82	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
HRNet48	40K	512 * 1024	63.37	scores	56.01	scores	config	model \| MD5	preds \| masks	visuals
HRNet48	80K	512 * 1024	63.93	scores	55.89	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	512 * 1024	62.11	scores	54.61	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	512 * 1024	62.52	scores	55.10	scores	config	model \| MD5	preds \| masks	visuals
R-101-D8	80K	512 * 1024	60.44	scores	55.93	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-D8	40K	512 * 1024	61.93	scores	54.35	scores	config	model \| MD5	preds \| masks	visuals
R-50-D8	80K	512 * 1024	62.64	scores	54.72	scores	config	model \| MD5	preds \| masks	visuals
R-101-D8	80K	512 * 1024	59.54	scores	56.31	scores	config	model \| MD5	preds \| masks	visuals

Backbone	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
R-50-FPN	40K	512 * 1024	61.80	scores	53.61	scores	config	model \| MD5	preds \| masks	visuals
R-50-FPN	80K	512 * 1024	61.02	scores	52.53	scores	config	model \| MD5	preds	visuals

Backbone	FP16	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
Swin-T		40K	512 * 1024	62.00	scores	54.33	scores	config	model \| MD5	preds	visuals
Swin-T		80K	512 * 1024	63.10	scores	54.81	scores	config	model \| MD5	preds	visuals
Swin-S		80K	512 * 1024	65.76	scores	58.00	scores	config	model \| MD5	preds	visuals
Swin-S	✓	80K	512 * 1024	65.51	scores	57.67	scores	config	model \| MD5	preds	visuals
Swin-B	✓	80K	512 * 1024	65.98	scores	58.33	scores	config	model \| MD5	preds	visuals

Backbone	FP16	Iters	Input	mIoU-val	Scores-val	mIoU-test	Scores-test	Config	Weights	Preds	Visuals
ConvNeXt-T	✓	40K	512 * 1024	63.21	scores	56.09	scores	config	model \| MD5	preds	visuals
ConvNeXt-T	✓	80K	512 * 1024	64.36	scores	57.02	scores	config	model \| MD5	preds	visuals
ConvNeXt-S	✓	80K	512 * 1024	66.13	scores	58.15	scores	config	model \| MD5	preds	visuals
ConvNeXt-B	✓	80K	512 * 1024	67.26	scores	59.82	scores	config	model \| MD5	preds	visuals

Files

sem_seg

Directory actions

More options

Directory actions

More options

Latest commit

History

sem_seg

Folders and files

parent directory

Semantic Segmentation Models of BDD100K

Model Zoo

Table of Contents

FCN

Results

PSPNet

Results

Deeplabv3

Results

Deeplabv3+

Results

UPerNet

Results

PSANet

Results

NLNet

Results

Semantic FPN

Results

EMANet

Results

DMNet

Results

APCNet

Results

HRNet

Results

CCNet

Results

GCNet

Results

DNLNet

Results

PointRend

Results

Vision Transformer

Results

DeiT

Results

Swin Transformer

Results

DPT

Results

ConvNeXt

Results

Install

Usage

Model Inference

Output Evaluation

Validation Set

Test Set

Output Visualization

Contribution