Fine-tuning Whisper in more than one language #1432
Replies: 2 comments 18 replies
-
i don't think you can fine tune in one-click but you can do sequentially one language after another |
Beta Was this translation helpful? Give feedback.
-
As long as your language is included in the Whisper language, it will be correctly encoded and decoded, so yes, it is language independent. Regarding the self.processor.tokenizer.batch_decode, it is used when computing the metrics for the ASR task, so it is correct to skip special tokens (you only want to compute the metric of the ASR task). |
Beta Was this translation helpful? Give feedback.
-
Suppose I have a dataset in two or more languages (one of them under-represented in Whisper's pre-trained models), and I want to fine-tune those 2 or more languages to continue with a multilingual model and avoid catastrophic forgetting. Is fine-tuning possible?
Can I define the tokenizer and the processor without indicating the language?
Beta Was this translation helpful? Give feedback.
All reactions