Skip to content

Subtitle generation w/ Speaker Diarization using Whisper and pyannote.audio

License

Notifications You must be signed in to change notification settings

dptools/WhisperNote

Repository files navigation

WhisperNote

A simple Python script to Transcribe audio and perform Speaker Diarization using OpenAI's Whisper and pyannote.audio.

Based on Majdoddin's work discussed on GitHub and available as a Google Colab Notebook.

Running the script

This Project was tested only on Linux, using CPU only and GPU configurations. While it is expected to work on other platforms, it is not guaranteed.

Benchmarks

Input File: 10 minutes of audio in .mp3, of an interview between 2 people.

Transcription

CPU: 2.36 minutes GPU: 2.05 minutes

Citations

pyannote/speaker-diarization pyannote/segmentation

@inproceedings{Bredin2021,
  Title = {{End-to-end speaker segmentation for overlap-aware resegmentation}},
  Author = {{Bredin}, Herv{\'e} and {Laurent}, Antoine},
  Booktitle = {Proc. Interspeech 2021},
  Address = {Brno, Czech Republic},
  Month = {August},
  Year = {2021},
}

About

Subtitle generation w/ Speaker Diarization using Whisper and pyannote.audio

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages