Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vggish feature vs i3d flow visual feature #121

Open
1980x opened this issue Feb 5, 2024 · 2 comments
Open

Vggish feature vs i3d flow visual feature #121

1980x opened this issue Feb 5, 2024 · 2 comments

Comments

@1980x
Copy link

1980x commented Feb 5, 2024

Hi. I am trying to extract visual and audio features on raw video clips. For visual features,
python main.py stack_size=24 step_size=8 extraction_fps=25 feature_type=i3d
Eg. it gives 112x1024 dimensional rgb and flow features on converted 25fps video using above command.

But for audio features, after converting the video fps to 25
python main.py feature_type=vggish
produces features which don't match with that of visual feature in the first dimension
Eg. It gives 32x128 dim feature only.

Can you please tell what needs to be done so that I can get same 112x128 audio feature?

Thank you

@v-iashin
Copy link
Owner

v-iashin commented Feb 5, 2024

I see. Vggish extracts features from 0.96 sec without overlap. With the command above, I3d extracts features from 0.96 sec with 0.32 sec overlap. hence i3d featues should be 3 times longer but it is not in your case and i don’t know why.

You may want to change the code for vggish feature extraction to support overlap which might solve the issue. You may try to use no overlap for i3d features (step size =24) if your application permits. This should make them of the same size

@1980x
Copy link
Author

1980x commented Feb 5, 2024

Thank you,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants