Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Text-to-speech hallucination #322

Open
johnson-liang opened this issue Oct 13, 2023 · 1 comment
Open

Reduce Text-to-speech hallucination #322

johnson-liang opened this issue Oct 13, 2023 · 1 comment

Comments

@johnson-liang
Copy link
Contributor

johnson-liang commented Oct 13, 2023

From 2023/10/11 meeting https://g0v.hackmd.io/t9ypB87SQBuMjjW_PheZVg#Comm-AI-transcript

The current implementation for speech-to-text (based on Whisper API) suffers from hallucination problems. Some of the examples are:

https://cofacts.tw/article/TvR6AosBAjOeMOklfe-g
原來 train data 是來自群眾協作字幕的社群呀

無聲
https://cofacts.tw/article/JvRhAosBAjOeMOklpe-v

我會希望他不要翻譯耶其實
雖然他翻得還 OK
https://cofacts.tw/article/FPRXAosBAjOeMOklXO9y

前面好好的
後面沒聲音開始起肖
https://cofacts.tw/article/m_S3AosBAjOeMOkls-_a

慘叫
https://cofacts.tw/article/jvSIBYsBAjOeMOklDvOv

無法解釋
明明有這麼明顯的口白
https://cofacts.tw/article/MvTSCosBAjOeMOklBvlJ

We should investigate:

  • Measures to reduce or cancel hallucination, such as voice activity detection (VAD).
  • Experiment with a different prompt.
  • Experiment with a replacement of Whisper API
  • Assess the feasibility of solutions above, including the operation costs.

References

Precious research on Whisper and mitigation to it's hallucination
https://g0v.hackmd.io/wkx286lmTDaFUpgRhnUawQ#Whisper

@MrOrz
Copy link
Member

MrOrz commented Oct 25, 2023

Tried to remove apparent hallucination in #323

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants