Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ViG models [NeurIPS 2022] #1578

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

iamhankai
Copy link
Contributor

Add ViG models from paper: Vision GNN: An Image is Worth Graph of Nodes (NeurIPS 2022), https://arxiv.org/abs/2206.00272

Network architecture plays a key role in the deep learning-based computer vision system. The widely-used convolutional neural network and transformer treat the image as a grid or sequence structure, which is not flexible to capture irregular and complex objects. In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks. We first split the image to a number of patches which are viewed as nodes, and construct a graph by connecting the nearest neighbors. Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes. ViG consists of two basic modules: Grapher module with graph convolution for aggregating and updating graph information, and FFN module with two linear layers for node feature transformation. Both isotropic and pyramid architectures of ViG are built with different model sizes. Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture. We hope this pioneering study of GNN on general visual tasks will provide useful inspiration and experience for future research.

Model Params (M) FLOPs (B) Top-1
Pyramid ViG-Ti 10.7 1.7 78.5
Pyramid ViG-S 27.3 4.6 82.1
Pyramid ViG-M 51.7 8.9 83.1
Pyramid ViG-B 82.6 16.8 83.7

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@rwightman
Copy link
Collaborator

@iamhankai FYI, you can use register_notrace_function and @register_notrace_module to register leaf functions or modules in your model that won't trace in FX due to boolean and other flow control concerns...

@rwightman
Copy link
Collaborator

Hmm, seems the tracing issue harder to solve, just preventing trace won't bypass the bool issue without some restructure. I'd also need to tweak some other interface issues wrt to other models.

Trying the model out, the 'base' as example seems roughly on par with a Swin (v1) base for accuracy and param/flops, but it runs at < 1/2 the speed. Any way to improve the runtime performance?

Have there been any weights or attempts to scale the training to larger datasets? Interesting performance differents there vs other vit or vit related hybrid arch?

@iamhankai
Copy link
Contributor Author

iamhankai commented Dec 8, 2022

We have pretrained ViG on ImageNet-22K. It performs slightly better than Swin Transformer:

Model Params (M) FLOPs (B) IN1K Top-1
Swin-S 50 8.7 83.2
Pyramid ViG-M 51.7 8.9 83.8

As for the runtime, accelerating GNN is an open problem.

@iamhankai
Copy link
Contributor Author

@rwightman Hi, we released the weights to scale the training to larger ImageNet22K dataset: https://github.com/huawei-noah/Efficient-AI-Backbones/releases/download/pyramid-vig/pvig_m_im21k_90e.pth

It performs slightly better than IM22K pretrained Swin Transformer:

Model Params (M) FLOPs (B) IN1K Top-1
Swin-S 50 8.7 83.2
Pyramid ViG-M 51.7 8.9 83.8

@@ -295,12 +295,6 @@ def test_model_features_pretrained(model_name, batch_size):
"""Create that pretrained weights load when features_only==True."""
create_model(model_name, pretrained=True, features_only=True)

EXCLUDE_JIT_FILTERS = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems these lines cannot be removed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to fix the other jit exceptions so would rather not add more, I feel it's likely it can be supported with appropriate type decl, etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants