Skip to content

This is an official implementation for "AutoFocusFormer: Image Segmentation off the Grid".

License

Notifications You must be signed in to change notification settings

apple/ml-autofocusformer-segmentation

Repository files navigation

AutoFocusFormer

Contributor Covenant CLUSTEN

AFF-Base: PWC PWC

This software project accompanies the research paper, AutoFocusFormer: Image Segmentation off the Grid (CVPR 2023).

Chen Ziwen, Kaushik Patnaik, Shuangfei Zhai, Alvin Wan, Zhile Ren, Alex Schwing, Alex Colburn, Li Fuxin

arXiv | video narration | AFF-Classification | AFF-Segmentation (this repo)

Introduction

AutoFocusFormer (AFF) is the first adaptive-downsampling network capable of dense prediction tasks such as semantic/instance segmentation.

AFF abandons the traditional grid structure of image feature maps, and automatically learns to retain the most important pixels with respect to the task goal.


AFF consists of a local-attention transformer backbone and a task-specific head. The backbone consists of four stages, each stage containing three modules: balanced clustering, local-attention transformer blocks, and adaptive downsampling.


AFF demonstrates significant savings on FLOPs (see our models with 1/5 downsampling rate), and significant improvement on recognition of small objects.

Notably, AFF-Small achieves 44.0 instance segmentation AP and 66.9 panoptic segmentation PQ on Cityscapes val with a backbone of only 42.6M parameters, a performance on par with Swin-Large, a backbone with 197M params (saving 78%!).



This repository contains the AFF backbone and the point cloud-version of the Mask2Former segmentation head.

We also add a few convenient functionalities, such as visualizing prediction results on blurred version of the images, and evaluating on cocofied lvis v1 annotations.

Main Results with Pretrained Models

ADE20K Semantic Segmentation (val)

backbone method pretrain crop size mIoU FLOPs checkpoint
AFF-Mini Mask2Former ImageNet-1K 512x512 46.5 48.3G Apple ML
AFF-Mini-1/5 Mask2Former ImageNet-1K 512x512 46.0 39.9G Apple ML
AFF-Tiny Mask2Former ImageNet-1K 512x512 50.2 64.6G Apple ML
AFF-Tiny-1/5 Mask2Former ImageNet-1K 512x512 50.0 51.1G Apple ML
AFF-Small Mask2Former ImageNet-1K 512x512 51.2 87G Apple ML
AFF-Small-1/5 Mask2Former ImageNet-1K 512x512 51.9 67.2G Apple ML

Cityscapes Instance Segmentation (val)

backbone method pretrain AP checkpoint
AFF-Mini Mask2Former ImageNet-1K 40.0 Apple ML
AFF-Tiny Mask2Former ImageNet-1K 42.7 Apple ML
AFF-Small Mask2Former ImageNet-1K 44.0 Apple ML
AFF-Base Mask2Former ImageNet-22K 46.2 Apple ML

Cityscapes Panoptic Segmentation (val)

backbone method pretrain PQ(s.s.) checkpoint
AFF-Mini Mask2Former ImageNet-1K 62.7 Apple ML
AFF-Tiny Mask2Former ImageNet-1K 65.7 Apple ML
AFF-Small Mask2Former ImageNet-1K 66.9 Apple ML
AFF-Base Mask2Former ImageNet-22K 67.7 Apple ML

COCO Instance Segmentation (val)

backbone method pretrain epochs AP FLOPs checkpoint
AFF-Mini Mask2Former ImageNet-1K 50 42.3 148G Apple ML
AFF-Mini-1/5 Mask2Former ImageNet-1K 50 42.3 120G Apple ML
AFF-Tiny Mask2Former ImageNet-1K 50 45.3 204G Apple ML
AFF-Tiny-1/5 Mask2Former ImageNet-1K 50 44.5 152G Apple ML
AFF-Small Mask2Former ImageNet-1K 50 46.4 281G Apple ML
AFF-Small-1/5 Mask2Former ImageNet-1K 50 45.7 206G Apple ML

Getting Started

Clone this repo

git clone git@github.com:apple/ml-autofocusformer-segmentation.git
cd ml-autofocusformer-segmentation

One can download the pre-trained checkpoints through the links in the tables above.

Create environment and install requirements

sh create_env.sh

See further documentation inside the script file.

Our experiments are run with CUDA==11.6 and pytorch==1.12.

Prepare data

Please refer to dataset README.

Prepare pre-trained backbone checkpoint

Use tools/convert-pretrained-model-to-d2.py to convert any torch checkpoint .pth file trained on ImageNet into a Detectron2 model zoo format .pkl file.

python tools/convert-pretrained-model-to-d2.py aff_mini.pth aff_mini.pkl

Otherwise, d2 will assume the checkpoint is for the entire segmentation model and will not add backbone. to the parameter names, and thus the checkpoint will not be properly loaded.

Train and evaluate

Modify the arguments in script run_aff_segmentation.sh and run

sh run_aff_segmentation.sh

for training or evaluation.

One can also directly modify the config files in configs/.

Visualize predictions for pre-trained models

See script run_demo.sh. More details can be found in Mask2Former GETTING_STARTED.md.

Analyze model FLOPs

See tools README.

Citing AutoFocusFormer

@inproceedings{autofocusformer,
    title = {AutoFocusFormer: Image Segmentation off the Grid},
    booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    author = {Ziwen, Chen and Patnaik, Kaushik and Zhai, Shuangfei and Wan, Alvin and Ren, Zhile and Schwing, Alex and Colburn, Alex and Fuxin, Li},
    year = {2023},
}