Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sonus in the browser #28

Open
evancohen opened this issue Jan 6, 2017 · 8 comments
Open

Sonus in the browser #28

evancohen opened this issue Jan 6, 2017 · 8 comments

Comments

@evancohen
Copy link
Owner

Continuing our discussion from TalAter/annyang#100

@ghost
Copy link

ghost commented Jan 7, 2017

@evancohen
I had some issues with sonus. Recognizing the "Jarvis" word was a bit of a hassle. Sometimes it did, sometimes it didn't, sometimes i just recognized it without anyone in the house saying anything (that was a funny moment, my girlfriend got scared of my little project when it started talking without being asked anything)

My setup is a rpi with a chrome opened (for SpeechRecognition, so i have unlimited cloud speech api keys).
Sonus was on the same rpi, but running on linux. I successfully did a hotword detection, and started the SpeechRecognition in the browser. You just need to know that you cannot record from chrome and sonus at the same time, so i made a small websocket. When sonus detected a hotword (sonus stopped after detecting) it sent it to the websocket, then the browser knew that a hotword was detected and started the SpeechRecognition. After the speech stopped and processed the commands, it started sonus again via websocket.
It think this is a really quick and dirty setup for "sonus in the browser".

In my opinion, sonus is ok, but processing speech still needs to be done in cloud for the moment.

@evancohen
Copy link
Owner Author

I like that approach, a few suggestions for you:

First, you can record multiple audio streams on the Pi with dsnoop - if I were you I would just use snowboy directly because you are already doing your streaming recognition in the browser, no reason to use Sonus in your scenario (although I wish I could get Sonus to the point that you could).

Second, if you are getting false positives I recommend playing with the recognition sensitivity. Also, short activation phrases tend to be more prone to false positives, so you could also try something like "hey Jarvis".

Anyhow...
Getting truly "free" speech recognition is tricky - unless you are using Chrome (which you are) you're not getting it. A few other approaches I've taken in the past:

  • Electron with webKitSpeechRecognition - used to be free until my smart mirror project used all the requests.
  • Custom Chromium Keys - Get 50 free requests per day. The 51st+ request will always fail. Instructions.
  • Google Cloud Speech - Free to a point, then very reasonably priced (~a cup of coffee every month)

At the end of the day, if you're super dedicated to it being free you end up having to do a little extra legwork.

What I'd like to see is a snowboy keyword spotter that will run in the browser. Then I could write a simple wrapper to make it and webkitSpeechRecognition work well together. The ball is in the Kitt.ai court right now, let's see what they say.

In the meantime I'm going to see if I can write a keyword spotter that will work in the browser, then then use webkitSpeechRecognition for streaming recognition. That way it'll be easy to drop in snowboy if/when they choose to provide browser support.

@Nixellion
Copy link

@evancohen You suggested me to use JsSpeechRecognizer and now I see that I made a huge mistake disregarding it, I assumed it was also related somehow to pocketsphinx. But now I came back to it, and see that they have just what I need - keyword that you can train yourself, without any real recognition and phoneme stuff, that's not really needed for a simple task of recognizing just 1 or even a few keywords.

Will see how that works. Hopefull Ill be able to switch between it and chrome's speech recognition.

Thanks!

@timaschew
Copy link
Contributor

@ghost I had the same idea. Did you change your implementation meanwhile?

Do you have some code snippets which you would like to share?

@timaschew
Copy link
Contributor

@evancohen

First, you can record multiple audio streams on the Pi with dsnoop - if I were you I would just use snowboy directly because you are already doing your streaming recognition in the browser, no reason to use Sonus in your scenario (although I wish I could get Sonus to the point that you could).

But isn't there a chance that you loose some of the audio chunk?

Let's assume this is a timeline in x axis

speech:    random words until the hotword is triggerd snowbow, what is a false friend
snowboy:   ++++++++++++ listening +++++++++++++++++++++++++++++++++++++++++++X
browser:   ------------- waiting --------------------------------------------++++++++

In case snowboy needs some (X) time to realize that the hotword was spoken and the user continues speaking (maybe very fast) then the browser will start to listen to late and would only get the friend in this case instead of what is false friend.

@evancohen
Copy link
Owner Author

@timaschew I've got an experimental implementation that uses a ring buffer for audio on the audio-buffer branch to address that issue, I'm assuming you could create a similar implementation in the browser.

Right now I'm in Cambodia with some rather limited resources, but I'd like to help you get Sonus working. Can you file a separate issue with some repro steps to where you're stuck?

@timaschew
Copy link
Contributor

I've got an experimental implementation that uses a ring buffer for audio

Ah nice.

Can you file a separate issue with some repro steps to where you're stuck?

Actually, I didn't get stuck, it were just some concerns I had.

@evancohen
Copy link
Owner Author

@timaschew happy to answer any questions (or concerns) you have! I am traveling at the moment, so I can't promise I'll respond instantly, but I will get back to you eventually 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants