❓ Same .wav file but got different timestamps #415

DarrenChengdu · 2024-01-18T13:50:10Z

❓ Questions and Help

I found the speech timestamps respectively obtained from pytorch and silero-vad-onnx.cpp is somewhat different. The input file is 'en_example.wav' which downloaded from torch.hub.
the speech timestamps from pytorch are as follows (set USE_ONNX = True or False):
{'end': 31200, 'start': 1568},
{'end': 73696, 'start': 42528},
{'end': 108512, 'start': 79392},
{'end': 163808, 'start': 149024},
{'end': 181728, 'start': 166944},
{'end': 211936, 'start': 183328},
{'end': 227808, 'start': 216608},
{'end': 241120, 'start': 229920},
{'end': 252896, 'start': 245280},
{'end': 285664, 'start': 260640},
{'end': 301024, 'start': 294432},
{'end': 311776, 'start': 303648},
{'end': 420320, 'start': 325664},
{'end': 455136, 'start': 422432},
{'end': 490976, 'start': 458784},
{'end': 520160, 'start': 493088},
{'end': 566752, 'start': 523808},
{'end': 601056, 'start': 572448},
{'end': 621024, 'start': 607264},
{'end': 669152, 'start': 638496},
{'end': 691680, 'start': 671776},
{'end': 712672, 'start': 697888},
{'end': 748512, 'start': 720928},
{'end': 798688, 'start': 781856},
{'end': 853984, 'start': 817696},
{'end': 865248, 'start': 856608},
{'end': 903648, 'start': 871968},
{'end': 916960, 'start': 906272},
{'end': 952288, 'start': 920096}]

the length of timestamps is 29.
the speech timestamps from silero-vad-onnx.cpp are as follows:
{start:00002048,end:00031744}
{start:00043008,end:00074752}
{start:00079872,end:00108544}
{start:00149504,end:00164864}
{start:00166912,end:00182272}
{start:00183296,end:00195584}
{start:00195584,end:00212992}
{start:00217088,end:00228352}
{start:00230400,end:00241664}
{start:00245760,end:00252928}
{start:00261120,end:00286720}
{start:00294912,end:00302080}
{start:00304128,end:00312320}
{start:00325632,end:00352256}
{start:00352256,end:00373760}
{start:00373760,end:00419840}
{start:00422912,end:00455680}
{start:00458752,end:00491520}
{start:00493568,end:00521216}
{start:00524288,end:00555008}
{start:00555008,end:00567296}
{start:00572416,end:00602112}
{start:00607232,end:00621568}
{start:00638976,end:00669696}
{start:00671744,end:00680960}
{start:00680960,end:00692224}
{start:00698368,end:00713728}
{start:00720896,end:00739328}
{start:00739328,end:00744448}
{start:00745472,end:00749568}
{start:00782336,end:00798720}
{start:00818176,end:00854016}
{start:00857088,end:00866304}
{start:00872448,end:00904192}
{start:00906240,end:00917504}
{start:00920576,end:00941056}
{start:00941056,end:00949248}
{start:00949248,end:00952320}
{start:00958464,end:00960000}

the length of timestamps is 39.
I wonder if above differences are tolerable and acceptable?
(https://github.com/snakers4/silero-models/wiki) available for our users. Please make sure you have checked it out first.

Simon-chai · 2024-03-08T01:21:23Z

The vad working principle is apply a voting mechanism to model output to decide the start and end of a segment speech. So the result depend on both the implement of the voting mechanism and model output. i.e. there is a control parameter min_silence_duration_ms ,maybe the default value is different bwtween python implement and cpp implement,and it can affect the length of final output(i.e a segment with 80ms silence will be consider as a whole by python version voting,but maybe consider as two segment by cpp version voting). And there are several others this kind of parameter that can affect the length of final output or start and end value. Of cause there is another posibility that the cpp voting implement is different from py version,I am not sure cause I am not familar with cpp. but as I see it the differences are tolerable and acceptable,most segments are valid speech.

DarrenChengdu added the help wanted Extra attention is needed label Jan 18, 2024

DarrenChengdu assigned snakers4 Jan 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ Same .wav file but got different timestamps #415

❓ Same .wav file but got different timestamps #415

DarrenChengdu commented Jan 18, 2024

Simon-chai commented Mar 8, 2024

❓ Same .wav file but got different timestamps #415

❓ Same .wav file but got different timestamps #415

Comments

DarrenChengdu commented Jan 18, 2024

❓ Questions and Help

Simon-chai commented Mar 8, 2024