Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting phones out of diarized TextGrid input files #734

Open
stcoats opened this issue Jan 21, 2024 · 1 comment
Open

Getting phones out of diarized TextGrid input files #734

stcoats opened this issue Jan 21, 2024 · 1 comment

Comments

@stcoats
Copy link

stcoats commented Jan 21, 2024

I am trying to align files that have two speakers to get phones as segments. If I have the audio file and a non-diarized transcript, as a .txt file in the corpus folder, the output TextGrid contains all the words in the .txt file and the corresponding phones.

If, however, I use the same audio file and a .TextGrid file with two tiers, one for each speaker, the output is a .TextGrid that is missing a lot of words. During alignment, the message WARNING There were 24 utterances ignored due to an issue in feature generation, see the log file for full details or run mfa validate on the corpus. is generated.

I have tried using --beam 400 --retry_beam 1000, to no avail. Are there better ways of making the aligner align all the words in the input file?

@mmcauliffe
Copy link
Member

I would double check that your tiers actually have text in them corresponding to the transcript? The log file should list out all utterances that were ignored, but it's either due to duration being very short or no text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants