Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate trainer on Code-Switched Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." #30654

Closed
3 of 4 tasks
sproocht opened this issue May 4, 2024 · 7 comments · Fixed by #30865
Labels

Comments

@sproocht
Copy link

sproocht commented May 4, 2024

System Info

  • transformers version: 4.41.0.dev0
  • Platform: Linux-6.5.0-28-generic-x86_64-with-glibc2.35
  • Python version: 3.10.12
  • Huggingface_hub version: 0.23.0
  • Safetensors version: 0.4.3
  • Accelerate version: 0.30.1.dev0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.3.0+cu121 (True)
  • Tensorflow version (GPU?): 2.13.1 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@sanchit-gandhi
@ArthurZucker
@muellerzr

This issue is related to finetuning Whisper on datasets that may contain switches from a base language to other languages, or simply low resource languages for which language identification by the pre-trained model is not accurate enough. So the issue may be reproduced by mixing a few audio utterances from French into a German dataset, for example, and running "trainer.evaluate" on it .

Up until transformers version 4.37.2, fine-tuning and evaluating on these types of datasets did not raise any issues and the fine-tuning result was very acceptable. In more recent versions, starting with 4.38.0, model evaluation systematically fails on such datasets (in transformers/models/whisper/generation_whisper.py)

I can understand the idea of forcing a single language in a batch, but in real-life situations, people use many languages concurrently in their daily interactions and this is reflected in the datasets. However, this issue prohibits fine-tuning for languages such as Luxembourgish, where it is frequent to mix Luxembourgish with English, French or German in the same utterances. Many other cases concerns Spanglish or Hinglish cases, or low resource languages borrowing words or phrases from other high-resource languages. So, it could prevent using the transformers library to fine-tune for such languages.

The only workaround that I have at the moment, is to stick to version 4.37.2 . Please have a look at this regression.

Thank you in advance!

Here is the full error code and messages:

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_12853/1263219524.py in
1 # Get initial evaluation results
----> 2 trainer.evaluate()

~/.local/lib/python3.10/site-packages/transformers/trainer_seq2seq.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix, **gen_kwargs)
178 self.gather_function = self.accelerator.gather
179 self._gen_kwargs = gen_kwargs
--> 180 return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)
181
182 def predict(

~/.local/lib/python3.10/site-packages/transformers/trainer.py in evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3513
3514 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3515 output = eval_loop(
3516 eval_dataloader,
3517 description="Evaluation",

~/.local/lib/python3.10/site-packages/transformers/trainer.py in evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3696
3697 # Prediction step
-> 3698 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
3699 main_input_name = getattr(self.model, "main_input_name", "input_ids")
3700 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None

~/.local/lib/python3.10/site-packages/transformers/trainer_seq2seq.py in prediction_step(self, model, inputs, prediction_loss_only, ignore_keys, **gen_kwargs)
308 k: v for k, v in inputs.items() if k not in ("decoder_input_ids", "decoder_attention_mask")
309 }
--> 310 generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
311
312 # Temporary hack to ensure the generation config is not initialized for each iteration of the evaluation loop

~/.local/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py in generate(self, input_features, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, return_timestamps, task, language, is_multilingual, prompt_ids, prompt_condition_type, condition_on_prev_tokens, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, num_segment_frames, attention_mask, time_precision, return_token_timestamps, return_segments, return_dict_in_generate, **kwargs)
528
529 # pass self.config for backward compatibility
--> 530 init_tokens = self._retrieve_init_tokens(
531 input_features,
532 generation_config=generation_config,

_~/.local/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py in _retrieve_init_tokens(self, input_features, generation_config, config, num_segment_frames, kwargs)
1167
1168 if torch.unique(lang_ids).shape[0] > 1:
-> 1169 raise ValueError(
1170 "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing language='...' or make sure all input audio is of the same language."
1171 )

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing language='...' or make sure all input audio is of the same language.`_

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Run : trainer.evaluate() on a dataset containing a mix of languages.

Expected behavior

Works in transformers versions up to 4.37.2

@sproocht sproocht changed the title Evaluate trainer on Code-Switching Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." Evaluate trainer on Code-Switched Speech fails with "ValueError: Multiple languages detected when trying to predict the most likely target language for transcription." May 4, 2024
@amyeroberts
Copy link
Collaborator

cc @kamilakesbi

@kamilakesbi
Copy link
Contributor

Hi @sproocht,

Thanks for sharing this error! It will be solved with PR #29688.

@sproocht
Copy link
Author

Hi @kamilakesbi,
Perfect! Thank you for confirming and for working on this.
Best regards,

@sanchit-gandhi
Copy link
Contributor

Hey @sproocht - thanks for reporting! This issue was in-fact closed by #29938 for the Transformers example, and huggingface/blog#1944 for the blog post.

If you copy the latest example script and use the latest version of Transformers, you should be able to force the language token by setting the --language argument, which will bypass the automatic language detection.

Hope that helps!

@sanchit-gandhi
Copy link
Contributor

Hey @sproocht - I battle-tested this a bit and found you're indeed correct, the generation config is still not correctly updated. This PR should fix this once and for all: #30865

@leophill
Copy link

Hey @sanchit-gandhi,
That's great! Thank you for the updates. I look forward to testing the fix once the PR is merged.

@sproocht
Copy link
Author

Hey @sanchit-gandhi,
Nice job! Thanks for confirming. I will definitely give it a try after the PR is merged.
Best regards,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants