Replies: 2 comments 2 replies
-
Here is one approach to solve transcription with multiple languages (sample source code in the link)
Other possibilities to consider: |
Beta Was this translation helpful? Give feedback.
-
I have indeed implemented a long post processing script just to unify the time slots between the speaker diarization, transcription and translation. It's working very well indeed, but in some cases ( 2 speakers are talking in the same ) since each model is giving different timestamps! I am just afraid that chunking based on the timestamps of the speaker diarization will reduce the accuracy for sure. Anyway, It looks like this is the best I can do! Regarding AssemblyAI, I am looking for an offline solution indeed, so that's not gonna help much! Thank you |
Beta Was this translation helpful? Give feedback.
-
I'm currently using Whisper Large V3 and I'm encountering two main issues with the pipeline shared on HuggingFace:
If the audio has 2 languages, sometimes it processes them without issue, but other times it requires me to select one language. To solve this issue, I need to transcribe the audio in 2 languages separately and then do some post processing. To do so, I need a way to detect the languages present in the audio.
Also, For certain languages like Persian and Urdu (and possibly others), I must explicitly specify the language.
I am using the pipeline here, but there is no way I can detect the language, and checking the transcribe function here, I cant find a way to explicitly specify the language, I am not sure what to do in this case!
{ "detail": "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing language=... or make sure all input audio is of the same language." }
Beta Was this translation helpful? Give feedback.
All reactions