Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word-level timestamps are off by some multiplier #786

Open
nikans opened this issue Apr 12, 2024 · 1 comment
Open

Word-level timestamps are off by some multiplier #786

nikans opened this issue Apr 12, 2024 · 1 comment

Comments

@nikans
Copy link

nikans commented Apr 12, 2024

Hello. I've just updated from 0.1.0 to 1.0.1 version of the library and noticed that timings are incorrect, like it's transcribing a longer audio.

For example, 23 seconds of subtitles:

{"segments": [{"id": 1, "end": 12.16, "start": 4.500000000000001, "words": [{"end": 5.38, "start": 4.500000000000001}, {"end": 5.74, "start": 5.38}, {"end": 6.18, "start": 5.74}, {"end": 6.5, "start": 6.18}, {"end": 6.78, "start": 6.5}, {"end": 7.2, "start": 6.78}, {"end": 8.18, "start": 7.2}, {"end": 9.12, "start": 8.94}, {"end": 9.52, "start": 9.12}, {"end": 9.94, "start": 9.52}, {"end": 10.42, "start": 9.94}, {"end": 10.72, "start": 10.42}, {"end": 11.14, "start": 10.72}, {"end": 11.48, "start": 11.14}, {"end": 12.16, "start": 11.48}]}, {"id": 2, "end": 16.44, "start": 12.9, "words": [{"end": 13.08, "start": 12.9}, {"end": 13.36, "start": 13.08}, {"end": 13.74, "start": 13.36}, {"end": 14.62, "start": 13.74}, {"end": 15.28, "start": 14.62}, {"end": 15.94, "start": 15.28}, {"end": 16.44, "start": 15.94}]}, {"id": 3, "end": 22.58, "start": 19.38, "words": [{"end": 19.68, "start": 19.38}, {"end": 19.94, "start": 19.68}, {"end": 20.46, "start": 19.94}, {"end": 21.06, "start": 20.46}, {"end": 21.58, "start": 21.06}, {"end": 22.14, "start": 21.58}, {"end": 22.58, "start": 22.14}]}]}

for an 18-seconds audio file:

20240412040528-962.wav.zip

I've tried with VAD filter off and on. Anyway, I don't understand how exactly should VAD affect this. I also tried with distil and a regular fw models (all medium). Same.

What could have gone wrong? Thanks.

@nikans
Copy link
Author

nikans commented Apr 12, 2024

The multiplier seems to be ~0.72. So when corrected by this value, the timings point to the right audio time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant