No longer able to align using phonemes directly as inputs #804

leandro-gracia-gil · 2024-05-04T18:13:35Z

I have been using mfa align to generate the alignments of audio with input IPA phonemes directly instead of text. This was done by using a handmade dictionary that simply maps IPA phonemes to themselves. The reason for this is that my use case forces me to do G2P separately in my own way, although ensuring that the produced phonemes are supported by the MFA acoustic model.

However, after updating from version 2.x to 3.x (in particular, 3.0.7), I'm seeing that mfa align now attempts to do a text tokenization step that is modifying my input IPA phonemes and affecting the alignment results.

Here's an example with Japanese text (好きにする):

Input phonemes: s ɨ c i ɲ i s ɨ ɾ ɯ
Text returned by the tokenizer: s c i i s
Pronunciations: s c アイアイ s

(I got these tokenizer results by checking tokenization/japanese.py in the installed MFA package code while debugging the issue)

Is there any way to bypass the tokenizer and align using my input phonemes directly?

Corpus structure
- What language is the corpus in? Japanese
- How many files/speakers? For now, this is just a single speaker test to check things work.
- Are you using lab files or TextGrid files for input? Input text files with IPA phonemes directly.
Dictionary
- Are you using a dictionary from MFA? If so, which one? current phonemes should come from japanese_mfa v3.0.0
- If it's a custom dictionary, what is the phoneset?
Acoustic model
- If you're using an acoustic model, is it one download through MFA? If so, which one? japanese_mfa v3.0.0
- If it's a model you've trained, what data was it trained on?

Log file
No log files were generated, since the problem does not manifest as a runtime error.

The text was updated successfully, but these errors were encountered:

leandro-gracia-gil · 2024-05-04T18:55:22Z

Note: this example is for Japanese, but I expect to do the same (feeding phonemes as input) in a few other latin script languages. I haven't checked yet if these are also affected by the same issue.

leandro-gracia-gil · 2024-05-04T18:58:39Z

Also, one thing I had to fix while debugging. I can open a separate bug if needed.

In file tokenization/japanese.py, line 19:

config_path = resource_dir.joinpath("japanese", "sudachi_config.json")

This fails later because config_path is a pathlib object, which is not supported by sudachipy. It can be easily fixed by forcing a conversion to string.

config_path = str(resource_dir.joinpath("japanese", "sudachi_config.json"))

mmcauliffe · 2024-05-04T19:34:42Z

You can download the old 2.0 Japanese model via mfa download acoustic japanese_mfa --version 2.0.1a --force (see https://mfa-models.readthedocs.io/en/latest/acoustic/Japanese/Japanese%20MFA%20acoustic%20model%20v2_0_1a.html). The 3.0 Japanese model uses sudachipy's tokenization for input text and assumes it's normal Japanese kana/kanji/romaji, which is why i is getting mapped to アイ and IPA specific symbols are ignored.

leandro-gracia-gil · 2024-05-05T03:43:12Z

I see, thanks. Regardless from the tokenization issue, is there any other new feature, or improvement in quality I would be missing by using the old 2.0.1a model instead of the 3.0.0 one?

Also, since the 3.0.0 model uses text + tokenization, is it trying to align with all possible pronunciations (as in different phonemes with different probabilities for a same word in a dict) and picking the best match, or rather using some criteria to pick the most likely pronunciation first and then attempt to align with it?

leandro-gracia-gil added the bug label May 4, 2024

leandro-gracia-gil assigned mmcauliffe May 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No longer able to align using phonemes directly as inputs #804

No longer able to align using phonemes directly as inputs #804

leandro-gracia-gil commented May 4, 2024

leandro-gracia-gil commented May 4, 2024

leandro-gracia-gil commented May 4, 2024

mmcauliffe commented May 4, 2024 •

edited

leandro-gracia-gil commented May 5, 2024

No longer able to align using phonemes directly as inputs #804

No longer able to align using phonemes directly as inputs #804

Comments

leandro-gracia-gil commented May 4, 2024

leandro-gracia-gil commented May 4, 2024

leandro-gracia-gil commented May 4, 2024

mmcauliffe commented May 4, 2024 • edited

leandro-gracia-gil commented May 5, 2024

mmcauliffe commented May 4, 2024 •

edited