Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seems to struggle with "♪♪~" lines #33

Open
cyphar opened this issue May 17, 2021 · 5 comments
Open

seems to struggle with "♪♪~" lines #33

cyphar opened this issue May 17, 2021 · 5 comments

Comments

@cyphar
Copy link

cyphar commented May 17, 2021

Japanese subtitles seem to usually be more like English "hard of hearing" subtitles, resulting in lots of "♪♪~" lines as well as descriptions for sounds that are present in the video but are not spoken (and thus are not present in the source "correctly timed" subtitle file).

In theory ALASS might be able to handle this (since it's kind of like a "mini" ad-break) but it seems that ALASS instead shifts the spoken lines as though the (resulting in the subtitle lines having okay timing but offset by a couple of lines). A similar issue occurs when the source subtitles have more lines than the subtitles being retimed (such as OP/EP subtitles in English subtitles). My guess is this is because the only way ALASS can handle this is by making more splits, but the split penalty won't allow a split to be made for a one-line subtitle.

Now, I have come up with workarounds for this myself (I have scripts that will strip out these lines before I run alass) but this seems suboptimal -- I'm mainly posting this issue to raise awareness in case there is a more clever solution. (But I find alass incredibly useful. When paired with correctly timed subtitles in another language it is second-to-none when it comes to retiming subtitles.)

@chrysn
Copy link

chrysn commented Aug 13, 2021

Is there any annotation that can be added (possibly automatically, using the rules you now use to strip the lines) to non-text lines? Do hard-of-hearing subtitles commonly have such annotations? alass would then only need to discard them for matching, and should probably keep them relative to the rest of surrounding words.

@kaegi
Copy link
Owner

kaegi commented Aug 13, 2021

In theory alass should be able to handle such cases. It tries to move the whole subtitle data (with certain splits) so that it best matches the reference subtitle data globally. Some extra/missing lines should not influence what the global best fit looks like. In practice the introduced noise by these extra subtitle lines unfortunately might sometimes be great enough to throw off the global optimum...

The core principle behind alass is that no subtitle data is used (only the timestamps), to keep everything simple. Cutting out specific lines requires language and/or subtitle knowledge. I think that this kind of preprocessing is done best with another tool which is specific to the given language/subtitles.

@chrysn
Copy link

chrysn commented Aug 13, 2021

The language or subtitle knowledge could be done out of band -- it'd just be convenient if alass had a way for these tools to tell it that "this line is to be kept for output, but it's not expected to correspond to anything speech-like". SRT seems not to have any convention for stating this, unfortunately.

@cyphar
Copy link
Author

cyphar commented Aug 15, 2021

I didn't mention this in the original issue because I wasn't sure if there was a nice solution to this -- but there is also a secondary problem that alass also seems to really struggle with ASS files that have "signs and songs" subtitle lines (think most anime fansubs that have subtitles for signs but also for the OP and ED). I have some basic scripts which strip out all non-dialogue lines and have found that alass does a much better job retiming subtitle lines after you've stripped out everything which is not dialogue. This does make sense but I get the feeling Alass should already be handling the EP/OD case already because conceptually it should be the same as an adbreak (but despite this, the subtitle lines end up mistimed by several lines).

Unfortunately writing the scripts I use led me to the conclusion there isn't really a nice way to automate this -- for each show I need to fix ASS subtitles for, I look at the defined dialogue types and check which ones correspond to actual dialogue and then strip out all but those ones. I reckon you could do some heuristics to make this more automated (check for subtitle types only ever used once, or that have lots of overlapping lines) but I doubt you'd be able to make it very accurate without spending a lot of time fine-tuning for every show you do this for.

But I do get what you're saying @kaegi -- the reason for opening this issue is that I don't understand why alass doesn't handle cases like OP/ED the way it treats ad breaks (the "♪♪~" problem seems pretty intractable without subtitle-specific logic) -- does alass only ignore gaps and not subtitles that are present in one source but not the other?

@Sparh4wk
Copy link

Sparh4wk commented Aug 16, 2021

maybe I have same problem. I'm trying to sync about 160+ episodes of anime show, but for some reason, I'm unable to do so, even if subtitles lines count is very similar. And it have those ♪ as well.

Cant add them here, but uploaded them: https://easyupload.io/m/dqm3h6
I will be very happy if someone can help me with this one.

I didnt have any problems with tvshows or movies in english, but for some reason I cant sync those two.
Tried it with sync with video file as well, but its in Chinese, so its not work at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants