Cog Whisper Diarization

Audio transcribing + diarization pipeline.

Models used

Whisper Large v3 (CTranslate 2 version faster-whisper)
Pyannote audio 3.1.1

Usage

Used at Audiogest
Or try at Replicate
Or deploy yourself at Replicate (Make sure to add your own HuggingFace API key and accept the terms of use of the pyannote models used)

Input

file_string: str: Either provide a Base64 encoded audio file.
file_url: str: Or provide a direct audio file URL.
file: Path: Or provide an audio file.
group_segments: bool: Group segments of the same speaker shorter than 2 seconds apart. Default is True.
num_speakers: int: Number of speakers. Leave empty to autodetect. Must be between 1 and 50.
language: str: Language of the spoken words as a language code like 'en'. Leave empty to auto detect language.
prompt: str: Vocabulary: provide names, acronyms, and loanwords in a list. Use punctuation for best accuracy.
offset_seconds: int: Offset in seconds, used for chunked inputs. Default is 0.
transcript_output_format: str: Specify the format of the transcript output: individual words with timestamps, full text of segments, or a combination of both.
- Default is both.
- Options are words_only, segments_only, both,

Output

segments: List[Dict]: List of segments with speaker, start and end time.
- Includes avg_logprob for each segment and probability for each word level segment.
num_speakers: int: Number of speakers (detected, unless specified in input).
language: str: Language of the spoken words as a language code like 'en' (detected, unless specified in input).

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.gitignore

.gitignore

README.md

README.md

cog.yaml

cog.yaml

predict.py

predict.py

Repository files navigation

Cog Whisper Diarization

Models used

Usage

Input

Output

Thanks to

About

Releases

Languages

thomasmol/cog-whisper-diarization

Folders and files

Latest commit

History

Repository files navigation

Cog Whisper Diarization

Models used

Usage

Input

Output

Thanks to

About

Resources

Stars

Watchers

Forks

Languages