Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gibberish Outputs #825

Open
RohitMidha23 opened this issue May 8, 2024 · 3 comments
Open

Gibberish Outputs #825

RohitMidha23 opened this issue May 8, 2024 · 3 comments

Comments

@RohitMidha23
Copy link

RohitMidha23 commented May 8, 2024

On translating a fine-tuned model from Huggingface Whisper to ctranslate2 and running with faster whisper, i get extremely gibberish output.

I've tried it with various different versions but the output contains a lot of periods and dashes that doesn't make too much sense.

The same audios, when passed to the normal model perform exceptionally well and hence the question..

I am currently translating the model with ctranslate2 = v4.1.0 and faster-whisper = v1.0.1.

@trungkienbkhn can you please help?

@trungkienbkhn
Copy link
Collaborator

@RohitMidha23 , hello. Which HF model did you use to convert to ctranslate2 format ? And could you show your convertion command ?

@RohitMidha23
Copy link
Author

@trungkienbkhn it is a finetuned model on whisper-large-v2.
The command I used is:

ct2-transformers-converter --model "model_path" \
--output_dir "output_model_path" \
--copy_files tokenizer_config.json preprocessor_config.json special_tokens_map.json generation_config.json \
 --quantization float16

@trungkienbkhn
Copy link
Collaborator

@RohitMidha23 In fact, there are also a few models after conversion whose quality is not as good as the previous model. You can try to remove option --quantization float16 in conversion command. Or a second way, add option condition_on_previous_text=False when transcribing. We had same issue with distil-large-v2 model conversion, you can refer to this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants