Skip to content

Voice controlled keyboard and mouse that is lightweight (minimal dependencies), cross-platform, works offline, and is extensible. Check it out! Contributions welcome.

License

Notifications You must be signed in to change notification settings

oeschsec/Sidekick---voice-controlled-keyboard-and-mouse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sidekick (voice controlled keyboard and mouse)

To reduce hand/wrist pain from frequent computer use I decided to create a program that will convert voice to common keyboard/mouse actions in order to reduce clicks/keypresses. It is not intended to replace the keyboard and mouse, but rather to reduce their use, hence the name Sidekick. Sidekick handles transcription, mouse movement/click/wheel, and common commands.

Sidekick is still very much a work in progress, but the sky is the limit. One of the more fun challenges has been controlling the mouse with only voice commands (see user guide for details).

Why use Sidekick?

Sidekick was created with the goal of being general purpose, lightweight, extensible, and easy to use. It works offline and cross-platform. While some other tools (listed below) specialize in voice coding and entirely replacing the keyboard and mouse in specific application contexts, Sidekick is intended to help rather than replace and to work across all applications.

I use sidekick every day - especially when surfing the web, reading, and writing smaller compositions. Just reducing use of the mouse wheel/buttons alone helped me significantly. Let me know if Sidekick is helpful for you.

Contributing

If you would like to contribute, create an issue and let me know. I would be interested in seeing custom parsers, additional features, and improvements in speech recognition.

Install

Download the vosk-model-en-us-daanzu-20200905-lgraph (129M) model folder from https://alphacephei.com/vosk/models, place in same directory as sidekick.py, and rename the folder to 'model'.

On Mac

  • brew install portaudio
  • pip install numpy vosk pyautogui pyaudio

On Ubuntu

  • sudo apt-get install scrot python3-tk python3-dev (for pyautogui on Ubuntu)
  • sudo apt-get install portaudio19-dev python-all-dev python3-all-dev
  • sudo apt-get install python3-pyaudio
  • pip install numpy vosk pyautogui pyaudio

On Windows

  • install Python 3.8 64-bit (https://www.python.org/downloads/windows/) - 3.9 is not supported on Windows yet (see Vosk docs)
  • pip install numpy vosk pyautogui
  • make sure that vosk is at least version 0.3.18 - run pip install vosk --upgrade if not sure
  • pip install PyAudio‑0.2.11‑cp38‑cp38‑win_amd64.whl (download here) or appropriate version at time of install

Usage

  • python3 sidekick.py
  • see the user guide for instructions on use

Approach

  • I wanted to use a speech recognition library that worked out-of-the-box and did not require retraining
  • I wanted it to work offline and be fully open source
  • I decided to use Vosk for speech-to-text because it is a) offline (unlike services such as Google's API), b) more accurate than offline alternatives such as CMU's PocketSphinx, c) entirely open source, unlike picovoice. I initially tried to use python's SR library with google (see srmodule_old folder), but found it too slow and, as mentioned before, requiring internet. I also tried Mozilla's deepspeech, but for this specific application it was less ideal than Vosk due to a lack of out-of-the-box API functionality that I required and resource consumption.
  • I optimized accuracy by using multiple Vosk models for different states. And then made optimizations to help with the speed of switching states.
  • I used the smaller Vosk model because it was faster and responsiveness is important for this application. Also, the larger models do not currently support dynamic word mapping, which was important for my use case to increase accuracy, though models that are hybrid are planned per my understanding.

Vosk Docs

Related Projects & Articles

I am aware of other interesting related projects. I highly recommend checking out each of the following resources if you're interested in this space.

Ideas / Notes

  • faster speech recognition would help significantly (smaller vosk model helped)
  • external usb sound card such as the Andrea PureAudio / Audix OM7 mic

About

Voice controlled keyboard and mouse that is lightweight (minimal dependencies), cross-platform, works offline, and is extensible. Check it out! Contributions welcome.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages