Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpi client with extern sepia (stt) server cannot work without internet #228

Open
royrogermcfreely opened this issue Mar 3, 2023 · 5 comments

Comments

@royrogermcfreely
Copy link

royrogermcfreely commented Mar 3, 2023

hey florian,

last week i had no internet cause they renew the street in my city.

so i thought thats the perfect case to test my smart home / voice assistant, since it should work complete offline.

my sepia server with stt is on a proxmox vm, also my home assistant instance. everthing is reachable within my network.

sepia on my phones worked as accpected but not my rpi client. i allways get this info in the client-connection remote terminal:

Broadcaster event: {"broadcast":{"client":"raspi_chrome_app_v0.25.0","deviceId":"raspi","sepia-speech":{"type":"asr_error","msg":"no connection to server"}}}

so i looked up my settings.js file and there i have this config for the asr:

"voiceEngine": "sepia",
"voiceCustomServerURI": "http://192.168.0.xx:59125",
"en-voice": "",
"de-voice": "de-DE marytts de_DE/m-ailabs_low#karlsson", //"de-DE marytts de_DE/m-ailabs_low#eva_k",
"asrEngine": "native",
"asrServerURI": "http://192.168.0.xx:20726/sepia/stt",
"asrServerUser": "any",
"asrServerToken": "test1234",
"en-asrModel": "",
"de-asrModel": "",
"big-screen-mode": true,
"virtualKeyboard": false,

and i looked at the wiki, i changed my config to:

"voiceEngine": "sepia",
"voiceCustomServerURI": "http://192.168.0.xx:59125",
"en-voice": "",
"de-voice": "de-DE marytts de_DE/m-ailabs_low#karlsson", //"de-DE marytts de_DE/m-ailabs_low#eva_k",
"asrEngine": "sepia",
"asrServerURI": "",
"asrServerUser": "any",
"asrServerToken": "test1234",
"en-asrModel": "",
"de-asrModel": "",
"big-screen-mode": true,
"virtualKeyboard": false,

this worked. no internet and a working voice assistant. but now my wakeword rate is very poor and the stt rating is horrible.
i have to speak commands more than 2 times till it get what i mean.

the usability with asr-engine at "native" worked so much better as with "sepia".
but with "native" i cannot use it without internet.

on my smartphones i have this config:

"device":{
"asrEngine": "sepia",
"voiceEngine": "custom-mary-api",
"host-name": "https://192.168.0.xx",
deviceId": "a40",
"de-asr-model": "vosk-model-small-de:assistant",
wakeWordSensitivity":[ "0.6"],

do you know what happens here?
does need the chromium version a internet connection?

@fquirin
Copy link
Contributor

fquirin commented Mar 3, 2023

this worked. no internet and a working voice assistant. but now my wakeword rate is very poor and the stt rating is horrible.

Wake-word is always fully offline and should not be affected by the choice of the "asrEngine" value 🤔 . Since the SEPIA STT server works fine via mobile app I think there might be an issue with the DIY microphone setup, although its weird that this didn't seem to be an issue before.
The first thing you could do is run the microphone test. This can be done remotely via Control-HUB CLEXI terminal. The terminal command is call mictest play recording. It will ask you to speak for about 8s and analyze the volume. If the volume is very low you have 2 options to increase it. First one is via Linux terminal using pulsemixer and raise the input volume, the second one is via settings.js microphone gain value:

"microphoneSettings": {
  "gain": 5.0,
  ...
},

I often use a gain of 5-10, depending on the microphone. The quality of your STT results depends heavily on the microphone and your distance to the device unfortunately. That is usually why it works better on your phone. Googles servers seem to use some unknown black-magic to get crazy good results, idk how they do it ^^, but I'm always working on improving the open-source systems.

does need the chromium version a internet connection?

Only for the native "asrEngine" setting. I called it "native", because it is up to the browser and OS to decide what to do with it, Chrome and Edge for Desktop will usually use their cloud-servers while the same on Chrome/Edge/Samsung Internet/etc. for Android depends on the vendor setup and Apple can use on-device STT for newer iPhone/iPad models.

@royrogermcfreely
Copy link
Author

hmm... but why are the rate for right sentences with native much better than with sepia. i didnt changed anything else. and also chrome should use the same stt engine from the server.
also the time between wakeword is active and listening state is faster on "native" than on "sepia"?

@fquirin
Copy link
Contributor

fquirin commented Mar 5, 2023

hmm... but why are the rate for right sentences with native much better than with sepia

you mean the actual transcription? Or the wake-word? For the wake-word I can't see any logical reason right now, for the transcription the Google service is just better :-/. They probably trained their system on thousands of hours of audio from a very large number of different devices and microphone setups (presumably a large part of their data is "illegally" recorded without user consent or knowledge).

also the time between wake-word is active and listening state is faster on "native" than on "sepia"?

Yes, Google can directly access the audio interface and optimize everything, SEPIA has to work with the official browser APIs. There is still room for improvements though, because currently I destroy and recreate the audio interface after the wake-word trigger for compatibility reasons.

@royrogermcfreely
Copy link
Author

royrogermcfreely commented Mar 5, 2023

you mean the actual transcription?

yes, the transcription is bad on "sepia engine". the wakeword is working.

for the transcription the Google service is just better :-/. They probably trained their system on thousands of hours of audio from a very large number of different devices and microphone setups (presumably a large part of their data is "illegally" recorded without user consent or knowledge).

so the stt with native-engine is done via google/chromium?

Yes, Google can directly access the audio interface and optimize everything, SEPIA has to work with the official browser APIs.

ah ok, so i will get maybe a faster response when i use a rpi4. but just for a client the rpi4 is too overpowerd 😄 at least for me.

@fquirin
Copy link
Contributor

fquirin commented Mar 5, 2023

so the stt with native-engine is done via google/chromium?

The short answer: On the Raspberry Pi DIY client "native" is the Chromium implementation of Web Speech API that will access Google servers, but:
As mentioned above "native" depends on the browser/app, device and OS and can theoretically change with any update. It happened on Raspbery Pi OS for example, where at some point the Raspberry Pi Dev team removed the support from Chromium by removing Google API keys from their build (thats why you need to downgrade or add the keys manually in DIY client) effectively breaking the Web Speech API. Firefox for example never had "native" support because they cannot afford the cloud-infrastructure and were not able to build a useful offline ASR, although their efforts live on in Coqui STT available via SEPIA STT server. Edge, Safari and Samsung Internet all introduced a working "native" engine in the last ~2 years using either their own cloud (Microsoft/Apple) or on-device capabilities (iOS/Android -> can be on-device or cloud).

Currently the consensus in open-source speech recognition is that you either need a lot of compute power or have to work with limited vocabulary and language models to get a good accuracy. This is especially true for all non-english languages, including German :-/. My goal for 2023 is to do 1-2 larger SEPIA STT server updates with specialized language models and new engines for more powerful devices 🤞🙂.

ah ok, so i will get maybe a faster response when i use a rpi4. but just for a client the rpi4 is to overpowerd 😄 at least for me.

Yes. I have plans for smaller clients aswell that will work on even smaller chips, but right now the priority is improving open-source STT.
Do you currently use "pseudo-headless" or "headless" mode? If you don't have a screen "headless" might be a bit more responsive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants