Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some extracted audio and video features of the same video have different length! #66

Open
ttgeng233 opened this issue Aug 28, 2022 · 3 comments

Comments

@ttgeng233
Copy link

Thanks for your good project!
I used the same sample strategy to operate audio data and video frames, e.g., resample all video frames using 25 fps, and use 24 frames one time to extract a feature using i3d. At the same time, one audio feature represents a 0.96 audio clip. But I got different length features, e.g, audio with (162, 128) and video with (165, 1024). the video features length is correct but with the wrong audio feature length.
How do I deal with it?

@v-iashin
Copy link
Owner

Hi.

With the information that you provide, it is hard to give recommendations.

2% of features are missing in one modality - i would just trim it to the shortest sequence (162 in your case).

By the way, is it happening for every video you tried or some videos? Can you calculate the ratio of videos when shape mismatch occur? Is this ratio large enough to worry?

@ttgeng233
Copy link
Author

I extracted features of 3000+ videos, there are 6 videos with longer visual features and 400+ videos with shorter video features than audio features.
I think the videos whose visual features are 1 shorter than audio features are reasonable since 1 more frame is needed every time to extract optical flow. But the videos whose visual features are longer than audio features are abnormal.
If I directly trim it to the shortest sequence, I'm afraid the two modalities can not correspond with each other well.

@v-iashin
Copy link
Owner

I think one track (audio or visual) is slightly longer than another one. Maybe something is accumulating somewhere -- hard to tell based on the information you are providing.

Does the difference grow as the video gets longer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants