vxtron

A little application listens to my voice and converts to text and copies it to the clipboard. Built using Electron.

Why

I suffer from repetitive strain injury (osteoarthritis in the fingers, I guess), so, it helps a lot if I can type using my voice.
I am not a native English speaker — macOS’s dictation fails to accurately recognize my voice accent.
macOS’ Dictation does not have a public API that apps can use (not hackable).
Google Cloud Text to speech enhanced voice models are much more accurate than macOS’ Dictation and the free webkitSpeechRecognition API. It also comes with automatic punctuation insertion, which means it can automatically add full stops, commas, and question marks.

Supported speech recognizers

vx support pluggable speech recognizers. You can choose to use one of the following:

Google Chrome’s webkitSpeechRecognition (free, default)
- Can be used free of charge
- No time limit
- Requires opening a browser tab in background
- Recognition quality is not so great
- No automatic punctuation insertion
Google Speech-To-Text API (paid)
- Much better recognition accuracy for English language
- Automatically adds punctuation marks
- Does not require opening a browser tab in background
- Costs money
- Each session is limited to 60 seconds

Development setup

Clone this repository.
Install the dependencies for the Electron app:
```
yarn
```
Install the dependencies for the React app, located in vxgui foler:
```
(cd vxgui && yarn)
```
Build the React app:
```
(cd vxgui && yarn build)
```
Build an .app bundle:
```
yarn build
```

Configuration with Google Chrome

Google Chrome provides the webkitSpeechRecognition API which is available for free. However it can only be used inside Google Chrome (which means it is not available in other Chromium-based environment, including Electron). vx uses a hacky workaround by launching Google Chrome to a webpage which helps expose the webkitSpeechRecognition API to the Electron app via socket.io.

This is the default behavior; you don't need to configure anything to use this mode.

You can configure more options by creating ~/.vxrc.yml in the home directory with the following configuration:

speechProvider: chrome
speechProviderOptions:
  port: 5555
  openBrowser: false # default: true
  app: Google Chrome # see: https://www.npmjs.com/package/opn#app

Configuration with Google Cloud Speech-To-Text

Create a Google Cloud platform project and enable billing on it.
Go to Google Cloud API library and enable the Google Cloud Speech API.
To get access to enhanced voice models, turn on data logging.
Set up authentication with a service account. Download a service account file and save it to your computer.
Create ~/.vxrc.yml with the following configuration:
```
speechProvider: google-cloud
speechProviderOptions:
  serviceAccount: /path/to/service-account.json
  recordProgram: /usr/bin/rec
```
- serviceAccount is the full path to your service account file
- recordProgram is the full path to SoX’s rec executable.

Usage

Launch the Electron app at dist/mac/vxtron.app.
Press Cmd+Shift+L to dictate English text. Press it again to make it stop listening.
Press Cmd+Alt+Shift+L to dictate Thai text. Press it again to make it stop listening.
As soon as you finish speaking, the recognized text will be copied to the clipboard automatically.
The app remembers the past texts (not persistent), and you can use Cmd+Alt+Up and Cmd+Alt+Down to cycle through them. As you cycle through the history, the recalled text will also be copied to the clipboard automatically.

Development

Developing the GUI

A browser-based development environment is available. It is purely browser-based and doesn't use Electron APIs or Google Cloud Speech-To-Text. Instead, it uses the webkitSpeechRecognition API to recognize your voice.

This means it doesn't cost anything while development, but recognition accuracy will suffer, and automatic punctuation insertion will not be available.

Run yarn start in vxgui directory:
```
(cd vxgui && yarn start)
```
This will launch create-react-app development server. A browser should open to localhost:3000 automatically. Make sure you are using Google Chrome (otherwise the speech recognition API will not be available).
The key bindings are the same, except that you use Ctrl instead of Cmd key. For example, press Ctrl+Alt+L to listen to text.

The accelerator key is changed to prevent conflict between the development version and the electron version, which may be running at the same time.
The copy functionality will not work because a webapp may not copy stuff to the clipboard without a user interaction. However, this can be circumvented by exposing Chrome DevTool’s copy function into the webapp. You can do that by running the following command in the JavaScript console:
```
copy('...')
Object.assign(window, { copy })
```

Building the GUI as a static HTML file for electron

Once you finish developing, run yarn build in vxgui directory.

(cd vxgui && yarn build)

This will build the files into the vxgui/build directory.

Testing the development app in Electron

Sometimes, you really need to test some Electron-specific APIs, and having to rebuild a bundle every time we want to test it is not ideal.

Alternatively, with the development server running, you can run the Electron app with an environment variable VX_DEV=1 to make the Electron app load the app from localhost:3000 instead of the built files.

VX_DEV=1 yarn start

Architecture

There are two main components in this project:

The web application, built using React and TypeScript.

It contains the core application logic, such as how the transcript from the speech recognition service is handled.
It is designed to run both in Browser environment (for development) and Electron environment (for actual use).

Environment	Browser	Electron
Use case	For development	For real-world usage
Display	As a web app	As an overlay HUD
Activation	Only inside web app	Available system-wide
Speech recognition API	webkitSpeechRecognition API	Google Cloud Speech-To-Text
Recognition quality	Not so accurate for me	Very accurate
Automatic punctuation	Not supported	Supported
Cost of usage	Free	$0.048/min

The electron application

Provides the overlay GUI.
Provides access to global hotkeys
Provides access to Google Cloud Speech APIs.

Cost

I have to use the premium "video" voice model which is able to recognize my voice with acceptable accuracy (none of the other models can do this). The model is also much better at recognizing speech with a lot of technical terms, compared to the default model.

It costs USD 0.048 per minute to use. The first 60 minutes per month are free.

When the speech API is being used, vx keeps track of its usage log in ~/.vx-google-cloud-speech.log. It is a TSV file with 3 columns:

Timestamp
Usage in seconds, rounded up.
The pricing plan (1: normal speech recognition at $0.024/min, 2: enhanced video speech recognition at $0.048/min).

There is also a simple Ruby script that displays a summary of how much is spent on this API per day. You can run it using ruby cost.rb.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.vscode		.vscode
build		build
speech		speech
vxgui		vxgui
.gitignore		.gitignore
README.md		README.md
configuration.js		configuration.js
cost.rb		cost.rb
jsconfig.json		jsconfig.json
main.js		main.js
package.json		package.json
yarn.lock		yarn.lock

dtinth/vxtron

Folders and files

Latest commit

History

Repository files navigation

vxtron

Why

Supported speech recognizers

Development setup

Configuration with Google Chrome

Configuration with Google Cloud Speech-To-Text

Usage

Development

Developing the GUI

Building the GUI as a static HTML file for electron

Testing the development app in Electron

Architecture

Cost

About

Topics

Resources

Stars

Watchers

Forks

Languages