Reduce Text-to-speech hallucination #322

johnson-liang · 2023-10-13T14:20:25Z

From 2023/10/11 meeting https://g0v.hackmd.io/t9ypB87SQBuMjjW_PheZVg#Comm-AI-transcript

The current implementation for speech-to-text (based on Whisper API) suffers from hallucination problems. Some of the examples are:

https://cofacts.tw/article/TvR6AosBAjOeMOklfe-g
原來 train data 是來自群眾協作字幕的社群呀

無聲
https://cofacts.tw/article/JvRhAosBAjOeMOklpe-v

我會希望他不要翻譯耶其實
雖然他翻得還 OK
https://cofacts.tw/article/FPRXAosBAjOeMOklXO9y

前面好好的
後面沒聲音開始起肖
https://cofacts.tw/article/m_S3AosBAjOeMOkls-_a

慘叫
https://cofacts.tw/article/jvSIBYsBAjOeMOklDvOv

無法解釋
明明有這麼明顯的口白
https://cofacts.tw/article/MvTSCosBAjOeMOklBvlJ

We should investigate:

Measures to reduce or cancel hallucination, such as voice activity detection (VAD).
Experiment with a different prompt.
Experiment with a replacement of Whisper API
Assess the feasibility of solutions above, including the operation costs.

References

Precious research on Whisper and mitigation to it's hallucination
https://g0v.hackmd.io/wkx286lmTDaFUpgRhnUawQ#Whisper

MrOrz · 2023-10-25T06:15:12Z

Tried to remove apparent hallucination in #323

johnson-liang added help wanted Good first issue labels Oct 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce Text-to-speech hallucination #322

Reduce Text-to-speech hallucination #322

johnson-liang commented Oct 13, 2023 •

edited by MrOrz

MrOrz commented Oct 25, 2023

Reduce Text-to-speech hallucination #322

Reduce Text-to-speech hallucination #322

Comments

johnson-liang commented Oct 13, 2023 • edited by MrOrz

References

MrOrz commented Oct 25, 2023

johnson-liang commented Oct 13, 2023 •

edited by MrOrz