You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, for the default method param (task = None), it gets tokenizer.task. But before that, the task is mapped to int by faster_whisper.tokenizer.Tokenizer. In consequence, it gets an int name called 50359 (transcribe=50358, translate =50359).
The problem exists when we change the tokenizer language for the existing model wrapper.
Temporal fix, use following:
model.transcribe(..., task="transcribe")
# or
model.transcribe(..., task="translate")
Desired fix:
To reverse map the id of the task from the tokenizer.
Bug scenario
In the following scenario:
model = whisperx.load_model(..., language='es')
(model e.g. large-v2)model.transcribe(..., language='es')
model.align(..., language='es)
model.transcribe(..., language='en')
model.align(..., language='en')
We got the error names:
Cause
It's based on the sub-case of one problem (default params etc.) In the following lines of code
whisperX/whisperx/asr.py
Lines 201 to 205 in f2da2f8
So, for the default method param (
task = None
), it getstokenizer.task
. But before that, the task is mapped to int byfaster_whisper.tokenizer.Tokenizer
. In consequence, it gets an int name called 50359 (transcribe=50358, translate =50359).The problem exists when we change the tokenizer language for the existing model wrapper.
Temporal fix, use following:
Desired fix:
To reverse map the id of the task from the tokenizer.
Other info about that problem:
#huggingface/transformers#22331
The text was updated successfully, but these errors were encountered: