Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Fix YouTube's autogenerated subtitles doubling #433

Open
ershovev opened this issue Mar 21, 2024 · 2 comments
Open

[FEATURE REQUEST] Fix YouTube's autogenerated subtitles doubling #433

ershovev opened this issue Mar 21, 2024 · 2 comments
Labels
enhancement New feature or request yt-dlp related Not a bug about the app but with the yt-dlp library

Comments

@ershovev
Copy link

Is your feature request available in yt-dlp? Please describe.
Not available.

When you download automatic subtitles from YouTube, the resulting subtitle is a rolling subtitle - every time a new line is added, the previous one is moved up a line - if there's more than two lines, the first one disappears. Think Star Wars intro, but with only two lines:

A subtitle converted from VTT to SRT by yt-dlp would look something like this

00:00 --> 00:03 
This is the first line

00:03 --> 00:10 
This is the first line
This is what happens when another line is added

00:10
This is what happens when another line is added
If a third one is added, the first one disappears and the second one shoots up

The problem with this is that it's really hard to read, since you expect both lines to change, and it becomes really distracting.

Describe the solution you'd like
Maybe some flag "Fix YouTube autogenerated subtitles doubling" in settings?

Users at github and superuser.com suggests some fixes for ytdl:
1)


def fix_youtube_vtt(vtt_file_path) -> str:
    """Fixes Youtube's autogenerated VTT subtitles and returns a srt-formatted string"""

    import webvtt

    pretty_subtitle = ''  
    previous_caption_text = ''
    i = 1
    for caption in webvtt.read(vtt_file_path):

        if previous_caption_text == caption.text.strip():
            # if previous and current lines are `identical`, print the start time from the previous
            # and the end time from the current.
            pretty_subtitle += f"{i}\n{previous_caption_start} --> {caption.end}\n{previous_caption_text}\n\n"
            i += 1

        elif previous_caption_text == caption.text.strip().split("\n")[0]: 
            # if the current caption is multiline, and the previous caption is equal to 
            # the current's first line, just ignore the first line and move on with the second.
            previous_caption_text = caption.text.strip().split("\n")[1]
            previous_caption_start = caption.start
            last_caption_end = caption.end

        else:	    
            previous_caption_text = caption.text.strip()
            previous_caption_start = caption.start.strip()

    return pretty_subtitle

yt-dlp --embed-subs --merge-output-format mkv -f 'bv+ba' --write-auto-subs --sub-langs 'en' 'https://youtu.be/3_HG33-IYaY' --sub-format ttml --convert-subs srt --exec 'before_dl:fn=$(echo %(_filename)s| sed "s/%(ext)s/en.srt/g") && ffmpeg -fix_sub_duration -i "$fn" -c:s text "$fn".tmp.srt && mv "$fn".tmp.srt "$fn"'

function cleanVttFile($fileName, $outputName) {

    $lines = file($fileName);
    $headers = ['WEBVTT', 'Kind: captions', 'Language: en'];
    $modified_lines = [];
    $prev_line = "";

    foreach ($lines as $line) {
        // Skip headers
        if (in_array(trim($line), $headers)) {
            $modified_lines[] = $line;
            continue;
        }

        // Skip timestamp lines and blank lines
        if (preg_match('/\d{2}:\d{2}:\d{2}\.\d{3} --> \d{2}:\d{2}:\d{2}\.\d{3}.*/', $line) || trim($line) == "") {
            $modified_lines[] = $line;
            continue;
        }

        // Remove time tags
        $stripped_line = preg_replace('/<[^>]*>/', '', $line);

        // Compare with previous line
        if ($stripped_line != $prev_line || $prev_line == "") {
            $modified_lines[] = $line;
        }

        // Update previous line
        $prev_line = $stripped_line;
    }

    file_put_contents($outputName, $modified_lines);
}
@ershovev ershovev added the enhancement New feature or request label Mar 21, 2024
@zaednasr
Copy link
Collaborator

@ershovev you need to make this issue to the yt-dlp repository, not here. They will be able to handle this.
I dont code the core ytdlp itself, just the android app interface of it.

@zaednasr zaednasr added the yt-dlp related Not a bug about the app but with the yt-dlp library label Mar 21, 2024
@ershovev
Copy link
Author

@ershovev you need to make this issue to the yt-dlp repository, not here. They will be able to handle this. I dont code the core ytdlp itself, just the android app interface of it.

Got it, sorry

According to these topics, it seems that they are not planning to fix it

yt-dlp/yt-dlp#6274
yt-dlp/yt-dlp#1734

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request yt-dlp related Not a bug about the app but with the yt-dlp library
Projects
None yet
Development

No branches or pull requests

2 participants