Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add PoseFormer backbone #1215

Open
wants to merge 24 commits into
base: dev-0.26
Choose a base branch
from

Conversation

QwQ2000
Copy link
Contributor

@QwQ2000 QwQ2000 commented Mar 3, 2022

Motivation

Modification

Add backbone, head and config of the PoseFormer (ICCV 2021) into the repository.

BC-breaking (Optional)

Use cases (Optional)

Checklist

Before PR:

  • I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
  • Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
  • Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
  • New functionalities are covered by complete unit tests. If not, please add more unit tests to ensure correctness.
  • The documentation has been modified accordingly, including docstring or example tutorials.

After PR:

  • CLA has been signed and all committers have signed the CLA in this PR.

ly015 and others added 15 commits February 24, 2022 17:40
* update mmvc installation CI and doc

* fix lint
* add derepcation message for deploy tools

* change import warnings positions

* do yapf

* do isort
* hrformer

* modify cfg

* update url and readme for hrformer.

* add readme for hrformer papar

* modify reaadme

* fix publish year

Co-authored-by: ly015 <liyining0712@gmail.com>
* clean inference code

* LoadImageFromFile supports given img
…en-mmlab#1214)

* switch to open-mmlab/pre-commit-hooks

* deprecate .dev_scripts/github/update_copyright.py
* add windows-ci

* reduce input size to save memory

* fix codecov

* reduce input size to save memory

* skip voxelpose unittest for Windows

* remove win+cuda test
@ly015 ly015 changed the base branch from dev-0.24 to dev-0.25 March 8, 2022 03:04
@codecov
Copy link

codecov bot commented Mar 8, 2022

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Files Coverage Δ
mmpose/models/backbones/__init__.py 100.00% <100.00%> (ø)
mmpose/models/heads/__init__.py 100.00% <100.00%> (ø)
mmpose/datasets/pipelines/pose3d_transform.py 83.78% <66.66%> (+8.45%) ⬆️
mmpose/models/heads/poseformer_head.py 75.00% <75.00%> (ø)
mmpose/models/backbones/poseformer.py 91.58% <91.58%> (ø)

... and 26 files with indirect coverage changes

📢 Thoughts on this report? Let us know!

@QwQ2000 QwQ2000 requested a review from ly015 March 8, 2022 11:28
@ly015 ly015 requested review from liqikai9 and jin-s13 March 9, 2022 05:28
@jin-s13 jin-s13 mentioned this pull request Mar 12, 2022
from .base_backbone import BaseBackbone


class MultiheadAttention(BaseModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmcv also has MultiheadAttention module. Can we use that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MultiheadAttention module is adapted from mmcls. Compared with the MultiheadAttention in mmcv, its counterpart in mmcls is more similar to the implementation in the official PoseFormer repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give mmcv Transformer blocks a try. However, if the output and the model weights of the re-implemented PoseFormer based on mmcv Transformer blocks could not fit the official version of PoseFormer, keeping the mmcls Transformer modules might be a good choice.

num_joints=17,
in_chans=2,
embed_dim_ratio=32,
depth=4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the depth for spatial and temporal transformer always the same? If not, do we need to distinguish them?

Copy link
Contributor Author

@QwQ2000 QwQ2000 Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the official implementation of PoseFormer, the spatial and temporal transformer share the same depth.
However, I think it's obviously better to distinguish the two depths for clarity and I will follow this comment.

if norm_cfg is None:
norm_cfg = dict(type='LN')
# Temporal embed_dim is num_joints * spatial embedding dim ratio
embed_dim = embed_dim_ratio * num_joints
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does embed_dim mean the embedding dimension for each frame? How about using embed_dim_per_frame ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I use spatial_embed_dim and temporal_embed_dim instead of embed_dim_ratio and embed_dim.

init_cfg=init_cfg) for i in range(depth)
])

self.blocks = nn.ModuleList([
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does self.blocks mean the temporal transformer blocks? How about using self.temporal_blocks for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.temporal_blocks is a better choice.

return x


class TransformerEncoderLayer(BaseModule):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmcv has the similar layer BaseTransformerLayermodule under mmcv/cnn/bricks/transformer.py, which also includes MultiheadAttention. Can we use that one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TransformerEncoderLayer module is adapted from mmcls. Compared with the BaseTransformerLayer in mmcv, its counterpart in mmcls is more similar to the implementation in the official PoseFormer repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give mmcv Transformer blocks a try. However, if the output and the model weights of the re-implemented PoseFormer based on mmcv Transformer blocks could not fit the official version of PoseFormer, keeping the mmcls Transformer modules might be a good choice.

@ly015 ly015 mentioned this pull request Apr 8, 2022
7 tasks
@liqikai9 liqikai9 changed the base branch from dev-0.25 to dev-0.26 April 20, 2022 05:43
@darcula1993
Copy link

what's wrong with this branch? Is ther any update?
@QwQ2000 @ly015

@QwQ2000
Copy link
Contributor Author

QwQ2000 commented Sep 7, 2022

Sorry, I have resigned from the Shanghai AI Lab for a long time. I'm busy working on other projects so I don't have much time for this. If someone could help me finish this feature, I will be always willing to help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants