Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate words #811

Open
vincaslt opened this issue Apr 26, 2024 · 1 comment
Open

Duplicate words #811

vincaslt opened this issue Apr 26, 2024 · 1 comment

Comments

@vincaslt
Copy link

I'm transcribing a relatively long video and I'm often get a bunch of duplicated words for the same timestamp, e.g.:

 talk about the frameworks that entrepreneurs can use to think about how the ad value, and how it's the balance balance balance balance balance balance balance balance balance balance balance balance. what most entrepreneurs do wrong.

I use distil-large-v2 model with faster-whisper standalone executable. Here are the arguments I'm passing into faster-whisper.

  const whisperArgs = [
    `"${audioFilePath}"`,
    "--beep_off",
    "--model",
    "distil-large-v2",
    "--language",
    "en",
    "--word_timestamps",
    "True",
    "--output_format",
    "json",
    "--output_dir",
    `"${opts.resourceManager.tempDir}"`,
    "--model_dir",
    `"${opts.resourceManager.appDataDir}"`,
    "--beam_size",
    "1",
    "--one_word",
    "2",
    "--verbose",
    opts.verbose ? "True" : "False",
    opts.verbose ? "" : "--print_progress",
  ].filter(Boolean);

I saw a relevant discussion, but it proposed a fix already, which did not fix the issue for me: #716

I made sure I'm on the latest version as of today. I also tried playing around with beam_size setting, but no effect, just slower transcription. I need the one_word setting, though it might be causing the issue, but haven't tested yet (might test it later). The video I'm testing with is this one: https://www.youtube.com/watch?v=q3xN1iYeTNI (downloaded with youtube-dl)

@Purfview
Copy link
Contributor

I use distil-large-v2 model with faster-whisper standalone executable.

Then you are posting in the wrong repo.
Try standard model, medium or large-v2, or --hallucination_silence_threshold 2.
Imo, the distil models are not good for the long form transcriptions.

I need the one_word setting, though it might be causing the issue

It can't cause any issue as it's just srt/vtt writing setting and it has no effect in your example as output there is json.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants