Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generating subtitles using whisper on youtube and immediately using them #397

Open
ganqqwerty opened this issue Mar 25, 2024 · 3 comments
Open
Labels
enhancement New feature or request

Comments

@ganqqwerty
Copy link

Is your feature request related to a problem? Please describe.
Youtube generated subs are really bad. The whisper subs are more accurate. Right now I'm downloading the video, converting it to audio, running whisper to produce srt file and load it to asb player.

Describe the solution you'd like
The process of downloading, conversion, transcription and loading the srt file can be managed by asbplayer. The whisper subs would have been available in one click

@killergerbah killergerbah added the enhancement New feature or request label Mar 27, 2024
@AlexCatDev
Copy link

Im doing something similar i have made this python script that monitors my clipboard for youtube links and then downloads the video audio and converts it and transcribes it with the reazonspeech model

Script

from pathlib import Path
def sexagesimal(secs):
  mm, ss = divmod(secs, 60)
  hh, mm = divmod(mm, 60)
  return f'{hh:0>2.0f}:{mm:0>2.0f}:{ss:0>6.3f}'

import io
import subprocess
import numpy as np
def convert_webm_to_mp3(webm_buffer):
    # Run ffmpeg as a subprocess
    process = subprocess.Popen([
      'ffmpeg',
      '-i', '-',          # Input from stdin
      '-vn',              # Disable video
      '-ac', '1', '-ar', '16k',
      '-acodec', 'pcm_s16le',  # Set audio codec to mp3
      '-f', 's16le',        # Set output format to mp3
      '-'],               # Output to stdout
      stdin=subprocess.PIPE,  # Redirect stdin to the webm buffer
      stdout=subprocess.PIPE, # Capture stdout
      stderr=subprocess.PIPE  # Capture stderr
   )

    # Write the webm buffer to ffmpeg's stdin
    stdout, stderr = process.communicate(input=webm_buffer.read())

    # Check for errors
    if process.returncode != 0:
        raise RuntimeError(f'ffmpeg error: {stderr.decode()}')

    # Return the converted audio as bytes
    result = np.frombuffer(stdout, np.int16).astype(np.float32) / 32768.0
    return audio_from_numpy(result, 16000)

from reazonspeech.nemo.asr import load_model, transcribe, audio_from_path

import torch
print(f'Has Cuda? {torch.cuda.is_available()}')

print("Loading Model...")
model = load_model()
print("Finished")

import gc
gc.collect()

import time
from datetime import datetime
from pytube import YouTube

from reazonspeech.nemo.asr import audio_from_numpy

import pyperclip
import re


YOUTUBE_REGEX = r'(https?://)?(www\.)?(youtube\.com|youtu\.?be)/.+$'

def is_youtube_link(text):
    return re.match(YOUTUBE_REGEX, text) is not None

#Ignore whatever is currently in the clipboard when the program starts
video_url = pyperclip.paste()

while True:
  # video_url = input("Youtube link: ")
  print("Waiting for youtube url in clipboard...")

  while True:
    clipboard = pyperclip.paste()

    if clipboard != video_url:
      # check if video url is a valid youtube video_url
      video_url = clipboard

      if is_youtube_link(video_url):
        subprocess.run(['notify-send', '-u', 'normal', '-a', "AI Youtube Subtitles", "-t", "10000", f"Starting processing on {video_url}"])
        break

    #Poll every 100ms
    time.sleep(0.1)

  startTime = time.monotonic()

  buffer = io.BytesIO()
  print("Downloading audio..")
  yt = YouTube(video_url)
  yt.streams.filter(only_audio=True).order_by('abr')[-1].stream_to_buffer(buffer)
  print("Done")

  buffer.seek(0);

  print("Converting..")
  audio = convert_webm_to_mp3(buffer);
  print("Done")

  s = time.monotonic()
  transcription = transcribe(model, audio)
  e = time.monotonic()

  print('Finished transcription took {:0.2f}s'.format(e - s))

  r = 'WEBVTT\n\n' + '\n\n'.join([f"{sexagesimal(seg.start_seconds)} --> {sexagesimal(seg.end_seconds)}\n{seg.text}" for seg in transcription.segments])
  # print(r)
  with Path('out.vtt').open("w", encoding="utf8") as o:
    o.write(r)

  endTime = time.monotonic()
  print("Done: out.vtt")

  timeStr = '{:0.2f}s'.format(endTime - startTime);

  # Generate notification virker kun på linux xd
  subprocess.run(['notify-send', '-u', 'normal', '-a', "AI Youtube Subtitles", "-t", "20000", f"Finished {video_url} in {timeStr}"])
  gc.collect()

It outputs the file to out.vtt i wonder is there a way to easily send the subtitle file directly to asbplayer? so i dont have to manually drag it

@killergerbah
Copy link
Owner

Loading subtitles could be an addition to asbplayer's web socket interface

@AlexCatDev
Copy link

Loading subtitles could be an addition to asbplayer's web socket interface

That would be awesome if you added that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants