Generating subtitles using whisper on youtube and immediately using them #397

ganqqwerty · 2024-03-25T10:01:07Z

Is your feature request related to a problem? Please describe.
Youtube generated subs are really bad. The whisper subs are more accurate. Right now I'm downloading the video, converting it to audio, running whisper to produce srt file and load it to asb player.

Describe the solution you'd like
The process of downloading, conversion, transcription and loading the srt file can be managed by asbplayer. The whisper subs would have been available in one click

AlexCatDev · 2024-05-01T18:21:01Z

Im doing something similar i have made this python script that monitors my clipboard for youtube links and then downloads the video audio and converts it and transcribes it with the reazonspeech model

Script

from pathlib import Path
def sexagesimal(secs):
  mm, ss = divmod(secs, 60)
  hh, mm = divmod(mm, 60)
  return f'{hh:0>2.0f}:{mm:0>2.0f}:{ss:0>6.3f}'

import io
import subprocess
import numpy as np
def convert_webm_to_mp3(webm_buffer):
    # Run ffmpeg as a subprocess
    process = subprocess.Popen([
      'ffmpeg',
      '-i', '-',          # Input from stdin
      '-vn',              # Disable video
      '-ac', '1', '-ar', '16k',
      '-acodec', 'pcm_s16le',  # Set audio codec to mp3
      '-f', 's16le',        # Set output format to mp3
      '-'],               # Output to stdout
      stdin=subprocess.PIPE,  # Redirect stdin to the webm buffer
      stdout=subprocess.PIPE, # Capture stdout
      stderr=subprocess.PIPE  # Capture stderr
   )

    # Write the webm buffer to ffmpeg's stdin
    stdout, stderr = process.communicate(input=webm_buffer.read())

    # Check for errors
    if process.returncode != 0:
        raise RuntimeError(f'ffmpeg error: {stderr.decode()}')

    # Return the converted audio as bytes
    result = np.frombuffer(stdout, np.int16).astype(np.float32) / 32768.0
    return audio_from_numpy(result, 16000)

from reazonspeech.nemo.asr import load_model, transcribe, audio_from_path

import torch
print(f'Has Cuda? {torch.cuda.is_available()}')

print("Loading Model...")
model = load_model()
print("Finished")

import gc
gc.collect()

import time
from datetime import datetime
from pytube import YouTube

from reazonspeech.nemo.asr import audio_from_numpy

import pyperclip
import re


YOUTUBE_REGEX = r'(https?://)?(www\.)?(youtube\.com|youtu\.?be)/.+$'

def is_youtube_link(text):
    return re.match(YOUTUBE_REGEX, text) is not None

#Ignore whatever is currently in the clipboard when the program starts
video_url = pyperclip.paste()

while True:
  # video_url = input("Youtube link: ")
  print("Waiting for youtube url in clipboard...")

  while True:
    clipboard = pyperclip.paste()

    if clipboard != video_url:
      # check if video url is a valid youtube video_url
      video_url = clipboard

      if is_youtube_link(video_url):
        subprocess.run(['notify-send', '-u', 'normal', '-a', "AI Youtube Subtitles", "-t", "10000", f"Starting processing on {video_url}"])
        break

    #Poll every 100ms
    time.sleep(0.1)

  startTime = time.monotonic()

  buffer = io.BytesIO()
  print("Downloading audio..")
  yt = YouTube(video_url)
  yt.streams.filter(only_audio=True).order_by('abr')[-1].stream_to_buffer(buffer)
  print("Done")

  buffer.seek(0);

  print("Converting..")
  audio = convert_webm_to_mp3(buffer);
  print("Done")

  s = time.monotonic()
  transcription = transcribe(model, audio)
  e = time.monotonic()

  print('Finished transcription took {:0.2f}s'.format(e - s))

  r = 'WEBVTT\n\n' + '\n\n'.join([f"{sexagesimal(seg.start_seconds)} --> {sexagesimal(seg.end_seconds)}\n{seg.text}" for seg in transcription.segments])
  # print(r)
  with Path('out.vtt').open("w", encoding="utf8") as o:
    o.write(r)

  endTime = time.monotonic()
  print("Done: out.vtt")

  timeStr = '{:0.2f}s'.format(endTime - startTime);

  # Generate notification virker kun på linux xd
  subprocess.run(['notify-send', '-u', 'normal', '-a', "AI Youtube Subtitles", "-t", "20000", f"Finished {video_url} in {timeStr}"])
  gc.collect()

It outputs the file to out.vtt i wonder is there a way to easily send the subtitle file directly to asbplayer? so i dont have to manually drag it

killergerbah · 2024-05-02T06:29:05Z

Loading subtitles could be an addition to asbplayer's web socket interface

AlexCatDev · 2024-05-02T13:08:29Z

Loading subtitles could be an addition to asbplayer's web socket interface

That would be awesome if you added that

killergerbah added the enhancement New feature or request label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generating subtitles using whisper on youtube and immediately using them #397

Generating subtitles using whisper on youtube and immediately using them #397

ganqqwerty commented Mar 25, 2024

AlexCatDev commented May 1, 2024

killergerbah commented May 2, 2024

AlexCatDev commented May 2, 2024

Generating subtitles using whisper on youtube and immediately using them #397

Generating subtitles using whisper on youtube and immediately using them #397

Comments

ganqqwerty commented Mar 25, 2024

AlexCatDev commented May 1, 2024

Script

killergerbah commented May 2, 2024

AlexCatDev commented May 2, 2024