Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcription/translation language change -- '50358' / '50359' is not a valid task (accepted tasks: transcribe, translate) #784

Open
grzegorz700 opened this issue Apr 19, 2024 · 0 comments

Comments

@grzegorz700
Copy link

grzegorz700 commented Apr 19, 2024

Bug scenario

In the following scenario:

  1. model = whisperx.load_model(..., language='es') (model e.g. large-v2)
  2. model.transcribe(..., language='es')
  3. model.align(..., language='es)
  4. model.transcribe(..., language='en')
  5. model.align(..., language='en')

We got the error names:

ValueError: '50359' is not a valid task (accepted tasks: transcribe, translate)

Cause

It's based on the sub-case of one problem (default params etc.) In the following lines of code

whisperX/whisperx/asr.py

Lines 201 to 205 in f2da2f8

task = task or self.tokenizer.task
if task != self.tokenizer.task or language != self.tokenizer.language_code:
self.tokenizer = faster_whisper.tokenizer.Tokenizer(self.model.hf_tokenizer,
self.model.model.is_multilingual, task=task,
language=language)

So, for the default method param (task = None), it gets tokenizer.task. But before that, the task is mapped to int by faster_whisper.tokenizer.Tokenizer. In consequence, it gets an int name called 50359 (transcribe=50358, translate =50359).

The problem exists when we change the tokenizer language for the existing model wrapper.

Temporal fix, use following:

model.transcribe(...,  task="transcribe")
# or 
model.transcribe(...,  task="translate")

Desired fix:

To reverse map the id of the task from the tokenizer.

Other info about that problem:

#huggingface/transformers#22331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant