Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to detect when a correct alignment is not possible? #302

Open
zxul767 opened this issue Aug 8, 2023 · 0 comments
Open

Is it possible to detect when a correct alignment is not possible? #302

zxul767 opened this issue Aug 8, 2023 · 0 comments

Comments

@zxul767
Copy link

zxul767 commented Aug 8, 2023

I'm exploring possibilities on how to gauge whether a transcription algorithm did a good job when we have no supervision available (i.e., no annotated dataset).

It occurred to me that perhaps one way to do this would be to compute some kind of reconstruction score on the audio domain (when doing the forced alignment):

(audio) --> [transcribe] --> (text) --> [force-align] --> (alignment score)
 |                                        ^
 |                                        |
 +----------------------------------------+

Not being too familiar with the implementation of aeneas, I tried testing what would happen if I passed a completely erroneous transcription, but I didn't see an error in the output or anything in the resulting alignment that would help me detect automatically that the transcription was really bad.

After having read how the underlying algorithm works, I suspect this is because the alignment is bounded to a small region along the diagonal of the cost matrix, so even a completely erroneous transcription would result in an alignment that appears reasonable (at least until a human has a look and realizes the transcription is totally wrong).

I was wondering if there's any simple way to modify the algorithm to detect this case? I suspect that it might be possible if we somehow quantified how often the alignment happens on the "fringe" of the diagonal's margin, but I'm not sufficiently familiar with DTW to know if this would actually be a good idea.

Your guidance and help is much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant