Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contribution Request: Offline speech recognition #11

Open
patrickjquinn opened this issue Jan 28, 2017 · 28 comments
Open

Contribution Request: Offline speech recognition #11

patrickjquinn opened this issue Jan 28, 2017 · 28 comments

Comments

@patrickjquinn
Copy link
Owner

Hi guys, I want to include offline speech recognition that's open source to this project, initially just for English. It might be worth investigating the work of XNOR.ai and failing that, building a full, optimised model for PocketSphinx.

Longer term id like this model to to be trained by interactions with the platform and then have some sort of central repository for the model so it can be synced across all instances of the platform.

Any one willing to help? Or have any ideas?

@h4ckd0tm3
Copy link

Willing to do what i can to help! :)

@patrickjquinn
Copy link
Owner Author

Excellent thanks for the offer, what skills do you posses?

@h4ckd0tm3
Copy link

PHP, HTML, JavaScript, Java, C#....

I graduated at a Secondary Technical College in Engeneering in Austria. Working as Sysadmin. Advanced Linux skills.

Never worked with Node.js before but basic knowlege.

@patrickjquinn
Copy link
Owner Author

Okay that's perfect, have you ever worked with PocketSphinx or CMUSphinx before?

@Marak
Copy link

Marak commented Jan 28, 2017

@patrickjquinn Which module is currently providing the speech recognition?

@h4ckd0tm3
Copy link

h4ckd0tm3 commented Jan 28, 2017

@patrickjquinn No but looks interesting! And im willing to study this shit xD

@Marak
As far as i know annyang

@patrickjquinn
Copy link
Owner Author

Ah the person behind say! I'm using your fantastic module for the RasPi client!

At the moment, it's done using online APIs ( Google cloud speech and Wit.ai) and node-record-lpc16 to handle speech recognition on the clients.

I've experimented with PocketSphinx but found it to be...too unreliable.

Hence my desire to build something more fit-for task that can be trained dynamically and manually by the community. Open source SST would be a massive coo for the open source community working on projects such as this.

Think you might be able to help out?

@patrickjquinn
Copy link
Owner Author

@developingUnicorn excellent :) well I'd suggest you try and get https://github.com/cmusphinx/node-pocketsphinx or https://syl22-00.github.io/pocketsphinx.js/ (both JavaScript bindings for PocketSphinx) recognising speech locally that should be all the research you'll need :) you can contact me via the projects Gitter https://gitter.im/P-Brain/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

@patrickjquinn
Copy link
Owner Author

In order to kick this off, i've started a new project; Luther https://github.com/patrickjquinn/Luther (i.e Martin Luther King, i.e Free speech), initially just containing a giant text file of 450+k english words. I'll expand this to a giant list of english sentences, popular musicians and slang words sourced from various different databases of such information.

@patrickjquinn
Copy link
Owner Author

Okay guys so tomorrow I'm going to populate Luther with a set up guide for PocketSphinx and a precompiled English dictionary for it. I'll also create a basic node module for recording raw input and isolating the frequencies for human speech which should allow for easier extraction.

Also when the time comes that we have a solid platform, I'll host it on a beefy box "in the cloud" with 20 (I have some spare Azure hosting credits) or so cores so everyone can access it, via a simple API for their projects!

Can anyone who wants to help let me know so I can add them as admins to the Luther project?

@h4ckd0tm3
Copy link

I'll help! Started to playing around with pocketphinx and i'm totally i to this! Looking forward to be a part of this project!

@patrickjquinn
Copy link
Owner Author

Excellent, i'll add you as an admin! Did you make any progress getting it recognising speech?

@h4ckd0tm3
Copy link

h4ckd0tm3 commented Feb 1, 2017

Not by now, here in Austria we have to do military service by the age of 18 and i have 5 Months left so my time is limited. But i hope i'll geht it working by friday!

@patrickjquinn
Copy link
Owner Author

No rush! It's for your own benefit not mine :) I'll have everything mentioned above commited by tonight

@staberas
Copy link

staberas commented Feb 12, 2017

does XNOR.ai does image/video recognition too?
i also think offline should be the priority of this project.

@patrickjquinn
Copy link
Owner Author

Yes indeed they do, but I dont believe they have released anything yet.

While I also believe it should be a priority, its a gigantic task and way beyond the capabilities of any one or two people (Especially if one of those people is me). Basically its not something i can handle alone. Hence this contribution request.

@staberas
Copy link

staberas commented Feb 16, 2017

looking around i found this https://github.com/zzmp/juliusjs which is a 'fork' from this https://github.com/julius-speech/julius speech recognition for ubuntu.

@Marak
Copy link

Marak commented Mar 2, 2017

Any updates on this? Getting local speech recognition to work right can be hard.

Will we have default support for MacOS?

Looking forward to project updates. This is awesome work being done here!

@timstableford
Copy link
Collaborator

I think we'll almost certainly be using pocketsphinx for speech recognition unless we can find something better. I attempted to get pocketsphinx and nodejs talking to each other last week but nobody maintains the nodejs bindings anymore. To answer your question though, it's almost certain it will be cross-platform compatible as long as all the dependencies support it too.

@i-am-malaquias
Copy link

Maybe there's some hope from Mozilla's project "Deep Speech" engine? they're claiming 6.5% error rate at this point. https://github.com/mozilla/DeepSpeech

@patrickjquinn
Copy link
Owner Author

Iiiinnnteresting....anyone want to attempt to write a nose wrapper for this?

If we can make this work then it makes the project (and some of the modular forks I’ve been working on behind closed doors) more viable vs other open source VAs and we can start more actively maintaining it.

Long terms I’d love to see this or a variant of this as a proper open source Alexa competitor with an open skills ecosystem and companion apps.

@timstableford
Copy link
Collaborator

It sounds like a great idea to me. I think DeepSpeech only processes chunks of audio though? We'd also need to extract those chunks from a stream which is quite a big chunk of work to do right

@patrickjquinn
Copy link
Owner Author

patrickjquinn commented May 31, 2018 via email

@timstableford
Copy link
Collaborator

I'm mainly thinking about detecting when a command ends. With the first one, does it work like Google's where you tell it to start and then it automatically ends on silence? With the second I think I read that the NodeJS bindings for deep speech and accept an audio buffer so we could at least cut out the filesystem

@patrickjquinn
Copy link
Owner Author

patrickjquinn commented May 31, 2018 via email

@timstableford
Copy link
Collaborator

Of those two options I prefer the second, otherwise there'd be a large delay after small commands. I'd really like to do it like in this StackOverflow answer. The problem is that's a lot of work and it may need some input normalisation or the silence threshold to be dynamically set off of maybe an average calculation?

@patrickjquinn
Copy link
Owner Author

patrickjquinn commented May 31, 2018 via email

@i-am-malaquias
Copy link

Best approach would probably be having a timer after detecting relative silence, but also including a key shortcut or a tap detection or any such mechanism to have the user manually declare they're done, could speed up things a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants