-
Notifications
You must be signed in to change notification settings - Fork 932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[new-feature]: Add horse power in to speech engines #593
Comments
This sounds like a major rearchitecture. Such things should really be discussed ahead of time, for example this would need to be backwards compatible to go into any current branches or even master. |
With the place to discuss such things being https://groups.io/g/asterisk-dev |
Speech Engines can now have multiple horses inside that race to get you results from different backends. UserNote: The various Speech...() dialplan applications and the SPEECH_TEXT function now support the use of carets to control the horses in your engines. SpeechCreate(engine^horse1) SpeechCreate(engine^horse2) SpeechBackground(audio,,,horse1^horse2) SPEECH_TEXT(0^horse1) SPEECH_TEXT(0^horse2) SpeechDestroy(engine^horse1) SpeechDestroy(engine^horse2) Resolves: asterisk#593
Patch is backwards compatible. May we speak about it next week at AstriDevCon ? |
We could, but things should still happen or be recorded in a location where others can participate over a period of time and where it can be referenced for historical purposes. |
As well, the way that AEAP handles this is it registers multiple engines each with a unique name, so I don't see why this needs to be that aware of the special ^ thing as it is. |
Allows multiple horses (URLs) for semi-concurrent processing. These horses could be multiple language models racing against each other. Requires additional patches to Asterisk. See [Asterisk #593](asterisk/asterisk#593) for further discussion and links to additional required patches. Resolves: alphacep#8, alphacep#35
There is precedent elsewhere for the Caret separator when things get clever eg. Dial() 'b' and 'B' options. Colon is for string trimming. Comma is over-used - only added one more in the patch, heh. Extra Parenthesis wrapping ))))) is a reason why some shudder and reach for a GPL instead. Bringing some of the cool parts of AEAP functionality back to plain-jane dial plan DSL is one goal of this patch design. And in the future, AEAP could be extended with this new horse decorator as well:
Then you could race all four:
|
The ^ and other characters are generally only used for suboptions within an option (or when the , would not work due to special parsing considerations).
The comma is the standard argument separator - so that would be expected. |
I think its better to control this by the grammar field than to have multiple engine connections. |
@nshmyrev That is what grammar was originally for in older engines so would make sense. |
Current engine:grammar relationship on a channel is 1:N. This patch improves that to N:N. |
Personally as a user, I would not want that. I would want to just be able to specify multiple grammars and have the engine provide the results back to me ordered in confidence. |
Agreed, that is nice if the engine ranks your results in order of confidence. But not all of them do (yet) cough vosk cough. And what if your main cloud engine is lagged out, but your less-preferred local backup engines are still available ? This patch lets users solve it themselves in the dial plan. |
Okay, I think this really needs a set of non implementation specific requirements and user stories. |
Astricon discussion was great, thanks! Following the Kentucky Derby, was reminded to clean up the patch a little to fix some memory leaks, but otherwise not a big rewrite to address (potential) concerns of feeding same codec frames to multiple different ASR backends. Although if you can live with that, say, for example, you are only using local Vosk-Kaldi-Docker containers to translate one speaker into multiple languages simultaneously, then this might be a good fit for you. |
Feature Description
Current Speech...() apps only let you work with one engine at a time. This presents challenges in multi-lingual IVR environments, situations where you are testing different engines, cloud evaporation events, etc.
The forthcoming patch will allow you to configure the horses inside your engine and sequentially feed them audio frames. It is still a little leaky, eschews linked lists for arrays, and requires additional patches to the engines themselves to do anything useful (patches also forthcoming there eg. Vosk.)
But the dialplan will look like this -- note the carets used to separate the horses (because 🐴 ❤️ 🥕 :)
And the corresponding res_speech_vosk.conf:
You might replace mage and secretariat with en and es...
The text was updated successfully, but these errors were encountered: