Skip to content

SHI-Labs/OneFormer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

44 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

OneFormer: One Transformer to Rule Universal Image Segmentation

Framework: PyTorch Open In Colab HuggingFace space HuggingFace transformers YouTube License

PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC PWC

Jitesh Jain, Jiachen Liโ€ , MangTik Chiuโ€ , Ali Hassani, Nikita Orlov, Humphrey Shi

โ€  Equal Contribution

[Project Page] [arXiv] [pdf] [BibTeX]

This repo contains the code for our paper OneFormer: One Transformer to Rule Universal Image Segmentation.

Features

  • OneFormer is the first multi-task universal image segmentation framework based on transformers.
  • OneFormer needs to be trained only once with a single universal architecture, a single model, and on a single dataset , to outperform existing frameworks across semantic, instance, and panoptic segmentation tasks.
  • OneFormer uses a task-conditioned joint training strategy, uniformly sampling different ground truth domains (semantic instance, or panoptic) by deriving all labels from panoptic annotations to train its multi-task model.
  • OneFormer uses a task token to condition the model on the task in focus, making our architecture task-guided for training, and task-dynamic for inference, all with a single model.

OneFormer

Contents

  1. News
  2. Installation Instructions
  3. Dataset Preparation
  4. Execution Instructions
  5. Results
  6. Citation

News

  • [February 27, 2023]: OneFormer is accepted to CVPR 2023!
  • [January 26, 2023]: OneFormer sets new SOTA performance on the the Mapillary Vistas val (both panoptic & semantic segmentation) and Cityscapes test (panoptic segmentation) sets. Weโ€™ve released the checkpoints too!
  • [January 19, 2023]: OneFormer is now available as a part of the ๐Ÿค— HuggingFace transformers library and model hub! ๐Ÿš€
  • [December 26, 2022]: Checkpoints for Swin-L OneFormer and DiNAT-L OneFormer trained on ADE20K with 1280ร—1280 resolution released!
  • [November 23, 2022]: Roboflow cover OneFormer on YouTube! Thanks to @SkalskiP for making the video!
  • [November 18, 2022]: Our demo is available on ๐Ÿค— Huggingface Space!
  • [November 10, 2022]: Project Page, ArXiv Preprint and GitHub Repo are public!
    • OneFormer sets new SOTA on Cityscapes val with single-scale inference on Panoptic Segmentation with 68.5 PQ score and Instance Segmentation with 46.7 AP score!
    • OneFormer sets new SOTA on ADE20K val on Panoptic Segmentation with 51.5 PQ score and on Instance Segmentation with 37.8 AP!
    • OneFormer sets new SOTA on COCO val on Panoptic Segmentation with 58.0 PQ score!

Installation Instructions

  • We use Python 3.8, PyTorch 1.10.1 (CUDA 11.3 build).
  • We use Detectron2-v0.6.
  • For complete installation instructions, please see INSTALL.md.

Dataset Preparation

  • We experiment on three major benchmark dataset: ADE20K, Cityscapes and COCO 2017.
  • Please see Preparing Datasets for OneFormer for complete instructions for preparing the datasets.

Execution Instructions

Training

  • We train all our models using 8 A6000 (48 GB each) GPUs.
  • We use 8 A100 (80 GB each) for training Swin-Lโ€  OneFormer and DiNAT-Lโ€  OneFormer on COCO and all models with ConvNeXt-XLโ€  backbone. We also train the 896x896 models on ADE20K on 8 A100 GPUs.
  • Please see Getting Started with OneFormer for training commands.

Evaluation

Demo

  • We provide quick to run demos on Colab Open In Colab and Hugging Face Spaces Huggingface space.
  • Please see OneFormer Demo for command line instructions on running the demo.

Results

Results

  • โ€  denotes the backbones were pretrained on ImageNet-22k.
  • Pre-trained models can be downloaded following the instructions given under tools.

ADE20K

Method Backbone Crop Size PQ AP mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-Lโ€  640ร—640 49.8 35.9 57.0 57.7 219M config model
OneFormer Swin-Lโ€  896ร—896 51.1 37.6 57.4 58.3 219M config model
OneFormer Swin-Lโ€  1280ร—1280 51.4 37.8 57.0 57.7 219M config model
OneFormer ConvNeXt-Lโ€  640ร—640 50.0 36.2 56.6 57.4 220M config model
OneFormer DiNAT-Lโ€  640ร—640 50.5 36.0 58.3 58.4 223M config model
OneFormer DiNAT-Lโ€  896ร—896 51.2 36.8 58.1 58.6 223M config model
OneFormer DiNAT-Lโ€  1280ร—1280 51.5 37.1 58.3 58.7 223M config model
OneFormer (COCO-Pretrained) DiNAT-Lโ€  1280ร—1280 53.4 40.2 58.4 58.8 223M config model | pretrained
OneFormer ConvNeXt-XLโ€  640ร—640 50.1 36.3 57.4 58.8 372M config model

Cityscapes

Method Backbone PQ AP mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-Lโ€  67.2 45.6 83.0 84.4 219M config model
OneFormer ConvNeXt-Lโ€  68.5 46.5 83.0 84.0 220M config model
OneFormer (Mapillary Vistas-Pretrained) ConvNeXt-Lโ€  70.1 48.7 84.6 85.2 220M config model | pretrained
OneFormer DiNAT-Lโ€  67.6 45.6 83.1 84.0 223M config model
OneFormer ConvNeXt-XLโ€  68.4 46.7 83.6 84.6 372M config model
OneFormer (Mapillary Vistas-Pretrained) ConvNeXt-XLโ€  69.7 48.9 84.5 85.8 372M config model | pretrained

COCO

Method Backbone PQ PQTh PQSt AP mIoU #params config Checkpoint
OneFormer Swin-Lโ€  57.9 64.4 48.0 49.0 67.4 219M config model
OneFormer DiNAT-Lโ€  58.0 64.3 48.4 49.2 68.1 223M config model

Mapillary Vistas

Method Backbone PQ mIoU
(s.s)
mIoU
(ms+flip)
#params config Checkpoint
OneFormer Swin-Lโ€  46.7 62.9 64.1 219M config model
OneFormer ConvNeXt-Lโ€  47.9 63.2 63.8 220M config model
OneFormer DiNAT-Lโ€  47.8 64.0 64.9 223M config model

Citation

If you found OneFormer useful in your research, please consider starring โญ us on GitHub and citing ๐Ÿ“š us in your research!

@inproceedings{jain2023oneformer,
      title={{OneFormer: One Transformer to Rule Universal Image Segmentation}},
      author={Jitesh Jain and Jiachen Li and MangTik Chiu and Ali Hassani and Nikita Orlov and Humphrey Shi},
      journal={CVPR}, 
      year={2023}
    }

Acknowledgement

We thank the authors of Mask2Former, GroupViT, and Neighborhood Attention Transformer for releasing their helpful codebases.