Skip to content
This repository has been archived by the owner on Dec 20, 2023. It is now read-only.

Triggers

René Kliment edited this page Dec 22, 2018 · 7 revisions

General configuration

Every trigger has there configuration options:

  • enabled - this is pretty obvious; just make sure you don't have multiple voice triggers enabled at the same time
  • voice_confirm - say Alexa's yes (or play a user-configured sound) after the trigger has been triggered; this is something you very probably want for voice triggers, so you don't have to have / have to look at LEDs, but might not be necessary for button triggers

Voice triggers

This is the other tricky part of AlexaPi. I'm gonna put it up straight: doing this in AlexaPi is hard and there is no easy solution. There are mainly two reasons for this:

  • Hardware. People have various audio hardware (soundcards, microphones) and probably each person has something that no other has. Therefore the input for the hotword detector is different. Also, the mic input levels are a factor and doing echo cancellation is hard, because some people use an on-board mic, some use PS3 Eye on a 2m cable away from the speakers who knows how and it's just impossible to do one thing that will work for at least the majority of the users. This is something people have to deal with themselves. Amazon's devices are all the same, well-defined and that's where they have a significant advantage.
  • The recognition model. An universal model might not get the best results compared to a personal model, that's pretty obvious. If you want the best recognition for your hardware setup and your voice, you have to create your own model.

Apart from setting the right parameters of the voice trigger engines, to get an optimal recognition rate, note that your microphone quality and input levels in alsamixer (and possibly a PulseAudio companion) are a factor.

Pocketsphinx

This is a nice and popular open-source engine.

Specific configuration options

  • phrase - the trigger phrase; this should not be multiple words as it might be hard to detect and not a single syllable word as it might cause false positives

    from pocketsphinx's FAQ:

    For the best accuracy it is better to have keyphrase with 3-4 syllables. Too short phrases are easily confused.

  • threshold - a sort of a magic number

    from pocketsphinx's FAQ:

    Threshold must be tuned for every keyphrase on a test data to get the right balance missed detections and false alarms. You can try values like 1e-5 (less false positives, harder to recognise) to 1e-50 (more false positives, easier to recognise).

Useful resources:

Snowboy

The installation has to be done manually for now:

# Install snowboy dependencies as described in https://github.com/Kitt-AI/snowboy#ubunturaspberry-pipine64nvidia-jetson-tx1nvidia-jetson-tx2
# If you're on Raspbian < Stretch or you simply have older swig3.0, you need to install a newer than from the repo.
sudo apt-get install swig3.0 libatlas-base-dev

# install snowboy
sudo pip3 install git+https://github.com/Kitt-AI/snowboy.git

Specific configuration options

  • model - a file with the trained model

    Basically, you can use universal models (.umdl) like the default, or you can train your own (.pmdl) on the snowboy's website. To achieve the best result, record your audio on the same device that you run AlexaPi on & upload it to the website in order to get the best personal model possible.

    Simply setting a wakeword as a text value is not possible here. The trained model file defines your wakeword.

  • sensitivity

    from snowboy's docs:

    Detection sensitivity controls how sensitive the detection is. It is a value between 0 and 1. Increasing the sensitivity value lead to better detection rate, but also higher false alarm rate. It is an important parameter that you should play with in your actual application.

Head over to the snowboy's docs as everything is very well explained there.

Useful resources:

Other triggers

platform

Specific configuration options

  • event_type - either oneshot-vad, continuous or continuous-vad; this highly depends on the underlying platform (oneshot is always supported, continuous and continuous-vad doesn't have to be)

    here is an example with GPIO button:

    if set to oneshot-vad, after you press the button, AlexaPi gets triggered and VAD (voice activity detection) kicks in and decides when to end the recording

    if set to continuous-vad, the almost same happens, but as long as you press the button, AlexaPi records your voice (it forces her to record), so you can make pauses in your speech as you like; note that this doesn't disable VAD, so the recording may not end as soon as you release the button

    if set to continuous, AlexaPi will record exactly as long as you press the button