Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[new-feature]: Add horse power in to speech engines #593

Open
chrsmj opened this issue Feb 9, 2024 · 14 comments · May be fixed by #594
Open

[new-feature]: Add horse power in to speech engines #593

chrsmj opened this issue Feb 9, 2024 · 14 comments · May be fixed by #594
Assignees

Comments

@chrsmj
Copy link
Contributor

chrsmj commented Feb 9, 2024

Feature Description

Current Speech...() apps only let you work with one engine at a time. This presents challenges in multi-lingual IVR environments, situations where you are testing different engines, cloud evaporation events, etc.

The forthcoming patch will allow you to configure the horses inside your engine and sequentially feed them audio frames. It is still a little leaky, eschews linked lists for arrays, and requires additional patches to the engines themselves to do anything useful (patches also forthcoming there eg. Vosk.)

But the dialplan will look like this -- note the carets used to separate the horses (because 🐴 ❤️ 🥕 :)

exten => 46773,1,NoOp(H-O-R-S-E) 
 same = n,Set(TIMEOUT(D)=1)
 same = n,Set(SPEECH_DTMF_MAXLEN=2)
 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechBackground(dial-here-often,5,,mage^secretariat)
 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechBackground(that-tickles,5,p,mage^secretariat)
 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)

And the corresponding res_speech_vosk.conf:

[general]

[mage]
type=horse
url = ws://localhost:2700

[secretariat]
type=horse
url = ws://localhost:2701

You might replace mage and secretariat with en and es...

@jcolp
Copy link
Member

jcolp commented Feb 9, 2024

This sounds like a major rearchitecture. Such things should really be discussed ahead of time, for example this would need to be backwards compatible to go into any current branches or even master.

@jcolp
Copy link
Member

jcolp commented Feb 9, 2024

With the place to discuss such things being https://groups.io/g/asterisk-dev

chrsmj added a commit to chrsmj/asterisk that referenced this issue Feb 9, 2024
Speech Engines can now have multiple horses inside
that race to get you results from different backends.

UserNote: The various Speech...() dialplan applications and the SPEECH_TEXT
function now support the use of carets to control the horses in your engines.
SpeechCreate(engine^horse1)
SpeechCreate(engine^horse2)
SpeechBackground(audio,,,horse1^horse2)
SPEECH_TEXT(0^horse1)
SPEECH_TEXT(0^horse2)
SpeechDestroy(engine^horse1)
SpeechDestroy(engine^horse2)

Resolves: asterisk#593
@chrsmj chrsmj linked a pull request Feb 9, 2024 that will close this issue
@jcolp jcolp removed the triage label Feb 9, 2024
@chrsmj
Copy link
Contributor Author

chrsmj commented Feb 9, 2024

Patch is backwards compatible. May we speak about it next week at AstriDevCon ?

@jcolp
Copy link
Member

jcolp commented Feb 9, 2024

We could, but things should still happen or be recorded in a location where others can participate over a period of time and where it can be referenced for historical purposes.

@jcolp
Copy link
Member

jcolp commented Feb 9, 2024

As well, the way that AEAP handles this is it registers multiple engines each with a unique name, so I don't see why this needs to be that aware of the special ^ thing as it is.

chrsmj added a commit to chrsmj/vosk-asterisk that referenced this issue Feb 9, 2024
Allows multiple horses (URLs) for semi-concurrent processing.
These horses could be multiple language models racing against each other.

Requires additional patches to Asterisk.
See [Asterisk #593](asterisk/asterisk#593)
for further discussion and links to additional required patches.

Resolves: alphacep#8, alphacep#35
@chrsmj
Copy link
Contributor Author

chrsmj commented Feb 9, 2024

There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options. Colon is for string trimming. Comma is over-used - only added one more in the patch, heh. Extra Parenthesis wrapping ))))) is a reason why some shudder and reach for a GPL instead. Bringing some of the cool parts of AEAP functionality back to plain-jane dial plan DSL is one goal of this patch design.

And in the future, AEAP could be extended with this new horse decorator as well:

[my-speech-to-text]
type=client
codecs=!all,ulaw
url=ws://127.0.0.1:9099
protocol=speech_to_text

[nyquist]
type=horse
url=ws://127.0.0.1:2016

[majestic_prince]
type=horse
url=ws://127.0.0.1:1969

Then you could race all four:

exten => 33729,1,NoOp(D-E-R-B-Y) 
 same = n,Set(TIMEOUT(D)=1)
 same = n,Set(SPEECH_DTMF_MAXLEN=2)

 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechCreate(my-speech-to-text^nyquist)
 same = n,SpeechCreate(my-speech-to-text^majestic_prince)

 same = n,SpeechBackground(dial-here-often,5,,mage^secretariat^nyquist^majestic_prince)

 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,Set(speechtext0nyquist=${SPEECH_TEXT(0^nyquist)})
 same = n,Set(speechtext0majestic_prince=${SPEECH_TEXT(0^majestic_prince)})

 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechDestroy(my-speech-to-text^nyquist)
 same = n,SpeechDestroy(my-speech-to-text^majestic_prince)

 same = n,SpeechCreate(vosk^mage)
 same = n,SpeechCreate(vosk^secretariat)
 same = n,SpeechCreate(my-speech-to-text^nyquist)
 same = n,SpeechCreate(my-speech-to-text^majestic_prince)

 same = n,SpeechBackground(that-tickles,5,p,mage^secretariat^nyquist^majestic_prince)

 same = n,Set(speechtext0mage=${SPEECH_TEXT(0^mage)})
 same = n,Set(speechtext0secretariat=${SPEECH_TEXT(0^secretariat)})
 same = n,Set(speechtext0nyquist=${SPEECH_TEXT(0^nyquist)})
 same = n,Set(speechtext0majestic_prince=${SPEECH_TEXT(0^majestic_prince)})

 same = n,SpeechDestroy(vosk^mage)
 same = n,SpeechDestroy(vosk^secretariat)
 same = n,SpeechDestroy(my-speech-to-text^nyquist)
 same = n,SpeechDestroy(my-speech-to-text^majestic_prince)

@InterLinked1
Copy link
Contributor

InterLinked1 commented Feb 9, 2024

There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options.

The ^ and other characters are generally only used for suboptions within an option (or when the , would not work due to special parsing considerations).

Comma is over-used

The comma is the standard argument separator - so that would be expected.

@nshmyrev
Copy link

I think its better to control this by the grammar field than to have multiple engine connections.

@jcolp
Copy link
Member

jcolp commented Feb 12, 2024

@nshmyrev That is what grammar was originally for in older engines so would make sense.

@chrsmj
Copy link
Contributor Author

chrsmj commented Feb 12, 2024

Current engine:grammar relationship on a channel is 1:N. This patch improves that to N:N.

@jcolp
Copy link
Member

jcolp commented Feb 12, 2024

Personally as a user, I would not want that. I would want to just be able to specify multiple grammars and have the engine provide the results back to me ordered in confidence.

@chrsmj
Copy link
Contributor Author

chrsmj commented Feb 12, 2024

Agreed, that is nice if the engine ranks your results in order of confidence. But not all of them do (yet) cough vosk cough.

And what if your main cloud engine is lagged out, but your less-preferred local backup engines are still available ?

This patch lets users solve it themselves in the dial plan.

@jcolp
Copy link
Member

jcolp commented Feb 12, 2024

Okay, I think this really needs a set of non implementation specific requirements and user stories.

@chrsmj
Copy link
Contributor Author

chrsmj commented May 9, 2024

Astricon discussion was great, thanks!

Following the Kentucky Derby, was reminded to clean up the patch a little to fix some memory leaks, but otherwise not a big rewrite to address (potential) concerns of feeding same codec frames to multiple different ASR backends. Although if you can live with that, say, for example, you are only using local Vosk-Kaldi-Docker containers to translate one speaker into multiple languages simultaneously, then this might be a good fit for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants