Gibberish Outputs #825

RohitMidha23 · 2024-05-08T21:29:42Z

On translating a fine-tuned model from Huggingface Whisper to ctranslate2 and running with faster whisper, i get extremely gibberish output.

I've tried it with various different versions but the output contains a lot of periods and dashes that doesn't make too much sense.

The same audios, when passed to the normal model perform exceptionally well and hence the question..

I am currently translating the model with ctranslate2 = v4.1.0 and faster-whisper = v1.0.1.

@trungkienbkhn can you please help?

The text was updated successfully, but these errors were encountered:

trungkienbkhn · 2024-05-09T03:39:12Z

@RohitMidha23 , hello. Which HF model did you use to convert to ctranslate2 format ? And could you show your convertion command ?

RohitMidha23 · 2024-05-09T05:23:31Z

@trungkienbkhn it is a finetuned model on whisper-large-v2.
The command I used is:

ct2-transformers-converter --model "model_path" \
--output_dir "output_model_path" \
--copy_files tokenizer_config.json preprocessor_config.json special_tokens_map.json generation_config.json \
 --quantization float16

trungkienbkhn · 2024-05-09T07:33:03Z

@RohitMidha23 In fact, there are also a few models after conversion whose quality is not as good as the previous model. You can try to remove option --quantization float16 in conversion command. Or a second way, add option condition_on_previous_text=False when transcribing. We had same issue with distil-large-v2 model conversion, you can refer to this comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gibberish Outputs #825

Gibberish Outputs #825

RohitMidha23 commented May 8, 2024 •

edited

trungkienbkhn commented May 9, 2024

RohitMidha23 commented May 9, 2024

trungkienbkhn commented May 9, 2024

Gibberish Outputs #825

Gibberish Outputs #825

Comments

RohitMidha23 commented May 8, 2024 • edited

trungkienbkhn commented May 9, 2024

RohitMidha23 commented May 9, 2024

trungkienbkhn commented May 9, 2024

RohitMidha23 commented May 8, 2024 •

edited