Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain words/phrases are detected inconsistently #200

Open
mhilbush opened this issue Jun 23, 2023 · 9 comments
Open

Certain words/phrases are detected inconsistently #200

mhilbush opened this issue Jun 23, 2023 · 9 comments

Comments

@mhilbush
Copy link

I have two devices running Willow built from a repo I cloned on June 11. Each device is in a completely separate part of the house (1st floor kitchen and lower level family/rec room).

There are several words I use frequently in my home that Willow often detects inconsistently. This occurs with both devices.

  • “pool table” is sometimes also detected as “pull table”
  • “rec room” is sometimes also detected as “wreck room”
  • “sun room” is sometimes also detected as “summer”, “sunroom” and “sub room”

Please see issue #199 for a description and photos of the environment where my devices are located.

@adamast0r
Copy link

adamast0r commented Jun 23, 2023

I am also having the same type of issues when mentioning the names of HA entities, an example would be:

  • "smart plug" being detected as "smart blur" or "smart blue"

@kristiankielhofner
Copy link
Contributor

As a first pass at debugging this try updating your Willow Inference Server URL to our new implementation with a tweak:

https://wisng.tovera.io/api/asr?model=large&beam_size=5

In addition to our new WIS implementation that uses the highest possible quality settings available for Whisper. We default to the medium model with a beam size of 1 otherwise.

@mhilbush
Copy link
Author

mhilbush commented Jun 24, 2023

Thanks.

I set the WIS URL to what you indicated, built the image, and flashed my device.

It detects when I say the wake word (i.e. Alexa), but it's not showing any of the text spoken after the wake word.

This is what I'm seeing in the monitor (HTTP error 422).

I (06:39:36.230) WILLOW: Using WIS URL 'https://wisng.tovera.io/api/asr?model=large&beam_size=5'
I (06:39:36.240) WILLOW: WIS HTTP client starting stream, waiting for end of speech
I (06:39:39.044) WILLOW: AUDIO_REC_VAD_END
I (06:39:39.045) WILLOW: AUDIO_REC_WAKEUP_END
I (06:39:39.087) WILLOW: WIS HTTP client HTTP_STREAM_POST_REQUEST, write end chunked marker
I (06:39:39.175) WILLOW: WIS HTTP client HTTP_STREAM_FINISH_REQUEST
E (06:39:39.175) WILLOW: WIS returned HTTP error: 422
I (06:39:49.071) WILLOW: Wake LCD timeout, turning off LCD

Edit:
Note, I also changed the Wake Word Recognition Operating Mode to DET_MODE_2CH_95

@kristiankielhofner
Copy link
Contributor

I feel terrible...

I gave you the wrong URL! Sorry, brain fart on my part. The URL you should use is actually:

https://wisng.tovera.io/api/willow?model=large&beam_size=5

I'm really sorry about that, I promise I don't want to waste your time!

@mhilbush
Copy link
Author

No worries.

@mhilbush
Copy link
Author

Ok, it's working now. Thanks.

I'll spend some time with it tomorrow and get back to you.

@kristiankielhofner
Copy link
Contributor

With the wisng endpoint (it's in beta) we have debug logging turned on so I was watching your sessions.

You exposed a bug in our production implementation - long story short this server has multiple GPUs and WIS wasn't pinned to the right one - so you were seeing ASR times of ~3s occasionally (load balancing across GPUs) but that is fixed now. You should consistently see response times in the 200-300ms range now.

I'm really off my game!

@mhilbush
Copy link
Author

It was getting a bit late last night, but it did seem like it was taking longer than what was typical. I didn't think much of it at the time knowing it wasn't the full production implementation. Much quicker this morning.

@stintel
Copy link
Collaborator

stintel commented Sep 6, 2023

I too experience some issues with certain phrases.
My most problematic one seems to be "turn on desk light". Here are some wrong results:

  • Turn on this light.
  • Turn on the disc light.
  • Turn on the best glide.
    This seems to happen on small, medium and large models.

My workaround is to use an alias that is less "error" prone. I seem to have way better success with "workstation" as alias to "desk light".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants