Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video Transformer Network (https://arxiv.org/abs/2102.00719) #388

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bomri
Copy link

@bomri bomri commented Mar 24, 2021

  • VTN model setup
  • add the ability to return the entire video
  • add support to return the frames index
  • update defaults
  • VIT_B_VTN.yaml
  • adjusting the if-else in pack_pathway_output
  • VTN README.md + update main README.md + update MODEL_ZOO

- VTN model setup
- add the ability to return the entire video
- add support to return the frames index
- update defaults
- VIT_B_VTN.yaml
- adjusting the if-else in pack_pathway_output
- VTN README.md + update main README.md + update MODEL_ZOO
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 24, 2021
@bomri
Copy link
Author

bomri commented Mar 31, 2021

Hi @feichtenhofer, we recently published our work on video action recognition using Transformers (https://arxiv.org/abs/2102.00719). As PySlowFast aims to provide novel research implementations in this domain, we modified our codebase and models to make them available via this repository. We'd appreciate it if you could consider merging our pull request, we think it would be great to share it here with the community.

@devksingh4
Copy link

+1, we would also appreciate the inclusion of this model in PySlowFast.

@Isminoula
Copy link

+1 would be great to have this model as a backbone for experiments, thank you!

@feichtenhofer
Copy link
Contributor

Hi @bomri thanks for this pull request, and glad PySlowFast is of help for your research. We would need to do a careful review before merging this, because it adds some nontrivial overhead to the main logics, especially as it adds several functionalities and configurations to the core PySF code.

Generally, we would prefer if you could use a fork and we can re-link the implementation, similar as external projects are linked in detectron2 https://github.com/facebookresearch/detectron2/tree/master/projects#external-projects.

Related to this, we will be updating the codebase with some ViT baselines from a concurrent work around next week which should hopefully provide one more base for future work on video transformers

I'm adding @haooooooqi here for further help on this pull request

@bomri
Copy link
Author

bomri commented Apr 25, 2021

Thank you @feichtenhofer for your response.
We tried keeping the changes to the minimum needed to support our approach and only add missing functionalities, like processing full video at inference and fetching the relevant frame index for positional embedding.
If we can make any adjustments, please let me know.
If you prefer using external projects, can you please link our fork at https://github.com/bomri/SlowFast

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants