Is it possible to get the timing of phonemes, instead of full words? #687

tscizzlebg · 2021-09-15T02:51:33Z

I searched for docs, or docstrings in source code, but couldn't find a nice summary of what the options for output were, so figured I'd ask here and it might be a super quick answer.

(Apologies if this is not the right place for questions. I posted on StackOverflow as well, but the vosk tag doesn't have that many total questions so I wasn't sure what y'all prefer.)

The text was updated successfully, but these errors were encountered:

nshmyrev · 2021-09-16T15:44:46Z

We do not support phones yet. There is a pull request though

#528

I posted on StackOverflow as well, but the vosk tag doesn't have that many total questions so I wasn't sure what y'all prefer.

Some time ago Stackoverflow denied me to answer Vosk questions there. So I left it altogether.

tscizzlebg · 2021-09-16T16:13:46Z

Cool, thanks @nshmyrev ! I'm definitely looking forward to that PR getting in.

For getting more into the nitty-gritty of speech, and trying to create training sets for speech decoding models (as opposed to what I'm guessing are the more mainstream use cases of subtitling videos and stuff like that), output by phone is key.

Re StackOverflow, that's too bad. Good to know.

nshmyrev · 2021-09-16T18:31:24Z

trying to create training sets for speech decoding models (as opposed to what I'm guessing are the more mainstream use cases of subtitling videos and stuff like that), output by phone is key.

What are "speech decoding models" exactly? Could you please clarify?

tscizzlebg · 2021-09-16T18:56:01Z

Ah. For decoding intended speech from neural activity.

Here's an example of research toward restoring the communication ability of people with severe paralysis: http://changlab.ucsf.edu/s/anumanchipalli_chartier_2019.pdf

Shallowmallow · 2021-10-22T15:23:15Z

Shouldn't it be possible to use make a model that recognizes all phones. Like this for example https://github.com/xinjli/allosaurus ?

madhephaestus · 2023-06-01T20:31:28Z

For anyone looking for a Java lip-sync software based on vosk, i have a small staand alone example for you! https://github.com/madhephaestus/TextToSpeechASDRTest.git I was able to use the partial results with the word timing to calculate the timing of the phonemems (after looking up the phonemes in a phoneme dictionary). I then down-mapped the phonemes to viseme and stored the visemes in a list with timestamps. THe timestamped visemes process in a static 200ms, and then the audio can begin playing with the mouth movemets synchronized precisly with the phoneme start times precomputed ahead of time. This is compaired to Rubarb which takes as long to run as the audio file is long.

nshmyrev mentioned this issue Feb 3, 2022

How to get phonemes timing. #842

Closed

This was referenced Jan 22, 2023

phonemes for Spanish #1258

Closed

phonemes for Spanish #1259

Closed

nshmyrev mentioned this issue Mar 28, 2023

get phonemes #1311

Closed

madhephaestus linked a pull request Jun 1, 2023 that will close this issue

Add Phoneme labels and timestamps - take two #1377

Open

nshmyrev mentioned this issue Dec 12, 2023

Is there a way to train the data to get phonetic respelling for words? #1413

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to get the timing of phonemes, instead of full words? #687

Is it possible to get the timing of phonemes, instead of full words? #687

tscizzlebg commented Sep 15, 2021 •

edited

nshmyrev commented Sep 16, 2021

tscizzlebg commented Sep 16, 2021

nshmyrev commented Sep 16, 2021

tscizzlebg commented Sep 16, 2021

Shallowmallow commented Oct 22, 2021

madhephaestus commented Jun 1, 2023 •

edited

Is it possible to get the timing of phonemes, instead of full words? #687

Is it possible to get the timing of phonemes, instead of full words? #687

Comments

tscizzlebg commented Sep 15, 2021 • edited

nshmyrev commented Sep 16, 2021

tscizzlebg commented Sep 16, 2021

nshmyrev commented Sep 16, 2021

tscizzlebg commented Sep 16, 2021

Shallowmallow commented Oct 22, 2021

madhephaestus commented Jun 1, 2023 • edited

tscizzlebg commented Sep 15, 2021 •

edited

madhephaestus commented Jun 1, 2023 •

edited