[new-feature]: Add horse power in to speech engines #593

chrsmj · 2024-02-09T17:21:39Z

Feature Description

Current Speech...() apps only let you work with one engine at a time. This presents challenges in multi-lingual IVR environments, situations where you are testing different engines, cloud evaporation events, etc.

The forthcoming patch will allow you to configure the horses inside your engine and sequentially feed them audio frames. It is still a little leaky, eschews linked lists for arrays, and requires additional patches to the engines themselves to do anything useful (patches also forthcoming there eg. Vosk.)

But the dialplan will look like this -- note the carets used to separate the horses (because 🐴 ❤️ 🥕 :)

exten => 46773,1,NoOp(H-O-R-S-E) 
 same = n,Set(TIMEOUT(D)=1)
 same = n,Set(SPEECH_DTMF_MAXLEN=2)
 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechBackground(dial-here-often,5,,mage^secretariat)
 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechBackground(that-tickles,5,p,mage^secretariat)
 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)

And the corresponding res_speech_vosk.conf:

[general]

[mage]
type=horse
url = ws://localhost:2700

[secretariat]
type=horse
url = ws://localhost:2701

You might replace mage and secretariat with en and es...

The text was updated successfully, but these errors were encountered:

jcolp · 2024-02-09T17:34:57Z

This sounds like a major rearchitecture. Such things should really be discussed ahead of time, for example this would need to be backwards compatible to go into any current branches or even master.

jcolp · 2024-02-09T17:36:37Z

With the place to discuss such things being https://groups.io/g/asterisk-dev

Speech Engines can now have multiple horses inside that race to get you results from different backends. UserNote: The various Speech...() dialplan applications and the SPEECH_TEXT function now support the use of carets to control the horses in your engines. SpeechCreate(engine^horse1) SpeechCreate(engine^horse2) SpeechBackground(audio,,,horse1^horse2) SPEECH_TEXT(0^horse1) SPEECH_TEXT(0^horse2) SpeechDestroy(engine^horse1) SpeechDestroy(engine^horse2) Resolves: asterisk#593

chrsmj · 2024-02-09T17:48:48Z

Patch is backwards compatible. May we speak about it next week at AstriDevCon ?

jcolp · 2024-02-09T17:50:18Z

We could, but things should still happen or be recorded in a location where others can participate over a period of time and where it can be referenced for historical purposes.

jcolp · 2024-02-09T17:55:54Z

As well, the way that AEAP handles this is it registers multiple engines each with a unique name, so I don't see why this needs to be that aware of the special ^ thing as it is.

Allows multiple horses (URLs) for semi-concurrent processing. These horses could be multiple language models racing against each other. Requires additional patches to Asterisk. See [Asterisk #593](asterisk/asterisk#593) for further discussion and links to additional required patches. Resolves: alphacep#8, alphacep#35

chrsmj · 2024-02-09T19:06:07Z

There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options. Colon is for string trimming. Comma is over-used - only added one more in the patch, heh. Extra Parenthesis wrapping ))))) is a reason why some shudder and reach for a GPL instead. Bringing some of the cool parts of AEAP functionality back to plain-jane dial plan DSL is one goal of this patch design.

And in the future, AEAP could be extended with this new horse decorator as well:

[my-speech-to-text]
type=client
codecs=!all,ulaw
url=ws://127.0.0.1:9099
protocol=speech_to_text

[nyquist]
type=horse
url=ws://127.0.0.1:2016

[majestic_prince]
type=horse
url=ws://127.0.0.1:1969

Then you could race all four:

exten => 33729,1,NoOp(D-E-R-B-Y) 
 same = n,Set(TIMEOUT(D)=1)
 same = n,Set(SPEECH_DTMF_MAXLEN=2)

 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechCreate(my-speech-to-text^nyquist)
 same = n,SpeechCreate(my-speech-to-text^majestic_prince)

 same = n,SpeechBackground(dial-here-often,5,,mage^secretariat^nyquist^majestic_prince)

 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,Set(speechtext0nyquist=${SPEECH_TEXT(0^nyquist)})
 same = n,Set(speechtext0majestic_prince=${SPEECH_TEXT(0^majestic_prince)})

 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechDestroy(my-speech-to-text^nyquist)
 same = n,SpeechDestroy(my-speech-to-text^majestic_prince)

 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechCreate(my-speech-to-text^nyquist)
 same = n,SpeechCreate(my-speech-to-text^majestic_prince)

 same = n,SpeechBackground(that-tickles,5,p,mage^secretariat^nyquist^majestic_prince)

 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,Set(speechtext0nyquist=${SPEECH_TEXT(0^nyquist)})
 same = n,Set(speechtext0majestic_prince=${SPEECH_TEXT(0^majestic_prince)})

 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechDestroy(my-speech-to-text^nyquist)
 same = n,SpeechDestroy(my-speech-to-text^majestic_prince)

InterLinked1 · 2024-02-09T19:08:19Z

There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options.

The ^ and other characters are generally only used for suboptions within an option (or when the , would not work due to special parsing considerations).

Comma is over-used

The comma is the standard argument separator - so that would be expected.

nshmyrev · 2024-02-10T08:34:12Z

I think its better to control this by the grammar field than to have multiple engine connections.

jcolp · 2024-02-12T16:18:46Z

@nshmyrev That is what grammar was originally for in older engines so would make sense.

chrsmj · 2024-02-12T16:57:31Z

Current engine:grammar relationship on a channel is 1:N. This patch improves that to N:N.

jcolp · 2024-02-12T17:04:20Z

Personally as a user, I would not want that. I would want to just be able to specify multiple grammars and have the engine provide the results back to me ordered in confidence.

chrsmj · 2024-02-12T17:50:01Z

Agreed, that is nice if the engine ranks your results in order of confidence. But not all of them do (yet) cough vosk cough.

And what if your main cloud engine is lagged out, but your less-preferred local backup engines are still available ?

This patch lets users solve it themselves in the dial plan.

jcolp · 2024-02-12T17:52:20Z

Okay, I think this really needs a set of non implementation specific requirements and user stories.

chrsmj · 2024-05-09T20:39:20Z

Astricon discussion was great, thanks!

Following the Kentucky Derby, was reminded to clean up the patch a little to fix some memory leaks, but otherwise not a big rewrite to address (potential) concerns of feeding same codec frames to multiple different ASR backends. Although if you can live with that, say, for example, you are only using local Vosk-Kaldi-Docker containers to translate one speaker into multiple languages simultaneously, then this might be a good fit for you.

chrsmj added new-feature triage labels Feb 9, 2024

chrsmj linked a pull request Feb 9, 2024 that will close this issue

app_speech_utils.c: Add horses to speech engines. #594

Draft

jcolp removed the triage label Feb 9, 2024

jcolp assigned chrsmj Feb 9, 2024

chrsmj mentioned this issue Feb 9, 2024

res_speech_vosk.c: Add horse power. alphacep/vosk-asterisk#47

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[new-feature]: Add horse power in to speech engines #593

[new-feature]: Add horse power in to speech engines #593

chrsmj commented Feb 9, 2024

jcolp commented Feb 9, 2024

jcolp commented Feb 9, 2024

chrsmj commented Feb 9, 2024

jcolp commented Feb 9, 2024

jcolp commented Feb 9, 2024 •

edited

chrsmj commented Feb 9, 2024

InterLinked1 commented Feb 9, 2024 •

edited

nshmyrev commented Feb 10, 2024

jcolp commented Feb 12, 2024

chrsmj commented Feb 12, 2024

jcolp commented Feb 12, 2024

chrsmj commented Feb 12, 2024

jcolp commented Feb 12, 2024

chrsmj commented May 9, 2024

[new-feature]: Add horse power in to speech engines #593

[new-feature]: Add horse power in to speech engines #593

Comments

chrsmj commented Feb 9, 2024

Feature Description

jcolp commented Feb 9, 2024

jcolp commented Feb 9, 2024

chrsmj commented Feb 9, 2024

jcolp commented Feb 9, 2024

jcolp commented Feb 9, 2024 • edited

chrsmj commented Feb 9, 2024

InterLinked1 commented Feb 9, 2024 • edited

nshmyrev commented Feb 10, 2024

jcolp commented Feb 12, 2024

chrsmj commented Feb 12, 2024

jcolp commented Feb 12, 2024

chrsmj commented Feb 12, 2024

jcolp commented Feb 12, 2024

chrsmj commented May 9, 2024

jcolp commented Feb 9, 2024 •

edited

InterLinked1 commented Feb 9, 2024 •

edited