Getting phones out of diarized TextGrid input files #734

stcoats · 2024-01-21T20:05:28Z

I am trying to align files that have two speakers to get phones as segments. If I have the audio file and a non-diarized transcript, as a .txt file in the corpus folder, the output TextGrid contains all the words in the .txt file and the corresponding phones.

If, however, I use the same audio file and a .TextGrid file with two tiers, one for each speaker, the output is a .TextGrid that is missing a lot of words. During alignment, the message WARNING There were 24 utterances ignored due to an issue in feature generation, see the log file for full details or run mfa validate on the corpus. is generated.

I have tried using --beam 400 --retry_beam 1000, to no avail. Are there better ways of making the aligner align all the words in the input file?

The text was updated successfully, but these errors were encountered:

mmcauliffe · 2024-02-27T02:32:32Z

I would double check that your tiers actually have text in them corresponding to the transcript? The log file should list out all utterances that were ignored, but it's either due to duration being very short or no text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting phones out of diarized TextGrid input files #734

Getting phones out of diarized TextGrid input files #734

stcoats commented Jan 21, 2024

mmcauliffe commented Feb 27, 2024

Getting phones out of diarized TextGrid input files #734

Getting phones out of diarized TextGrid input files #734

Comments

stcoats commented Jan 21, 2024

mmcauliffe commented Feb 27, 2024