Transcription/translation language change -- '50358' / '50359' is not a valid task (accepted tasks: transcribe, translate) #784

grzegorz700 · 2024-04-19T10:05:17Z

Bug scenario

In the following scenario:

model = whisperx.load_model(..., language='es') (model e.g. large-v2)
model.transcribe(..., language='es')
model.align(..., language='es)
model.transcribe(..., language='en')
model.align(..., language='en')

We got the error names:

ValueError: '50359' is not a valid task (accepted tasks: transcribe, translate)

Cause

It's based on the sub-case of one problem (default params etc.) In the following lines of code

whisperX/whisperx/asr.py

Lines 201 to 205 in f2da2f8

    
           task = task or self.tokenizer.task 
        
           if task != self.tokenizer.task or language != self.tokenizer.language_code: 
        
               self.tokenizer = faster_whisper.tokenizer.Tokenizer(self.model.hf_tokenizer, 
        
                                                                   self.model.model.is_multilingual, task=task, 
        
                                                                   language=language)

So, for the default method param (task = None), it gets tokenizer.task. But before that, the task is mapped to int by faster_whisper.tokenizer.Tokenizer. In consequence, it gets an int name called 50359 (transcribe=50358, translate =50359).

The problem exists when we change the tokenizer language for the existing model wrapper.

Temporal fix, use following:

model.transcribe(...,  task="transcribe")
# or 
model.transcribe(...,  task="translate")

Desired fix:

To reverse map the id of the task from the tokenizer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcription/translation language change -- '50358' / '50359' is not a valid task (accepted tasks: transcribe, translate) #784

Transcription/translation language change -- '50358' / '50359' is not a valid task (accepted tasks: transcribe, translate) #784

grzegorz700 commented Apr 19, 2024 •

edited

Transcription/translation language change -- '50358' / '50359' is not a valid task (accepted tasks: transcribe, translate) #784

Transcription/translation language change -- '50358' / '50359' is not a valid task (accepted tasks: transcribe, translate) #784

Comments

grzegorz700 commented Apr 19, 2024 • edited

Bug scenario

Cause

Temporal fix, use following:

Desired fix:

Other info about that problem:

grzegorz700 commented Apr 19, 2024 •

edited