Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No longer able to align using phonemes directly as inputs #804

Open
leandro-gracia-gil opened this issue May 4, 2024 · 4 comments
Open
Assignees
Labels

Comments

@leandro-gracia-gil
Copy link

I have been using mfa align to generate the alignments of audio with input IPA phonemes directly instead of text. This was done by using a handmade dictionary that simply maps IPA phonemes to themselves. The reason for this is that my use case forces me to do G2P separately in my own way, although ensuring that the produced phonemes are supported by the MFA acoustic model.

However, after updating from version 2.x to 3.x (in particular, 3.0.7), I'm seeing that mfa align now attempts to do a text tokenization step that is modifying my input IPA phonemes and affecting the alignment results.

Here's an example with Japanese text (好きにする):

  • Input phonemes: s ɨ c i ɲ i s ɨ ɾ ɯ
  • Text returned by the tokenizer: s c i i s
  • Pronunciations: s c アイ アイ s

(I got these tokenizer results by checking tokenization/japanese.py in the installed MFA package code while debugging the issue)

Is there any way to bypass the tokenizer and align using my input phonemes directly?

  1. Corpus structure
    • What language is the corpus in? Japanese
    • How many files/speakers? For now, this is just a single speaker test to check things work.
    • Are you using lab files or TextGrid files for input? Input text files with IPA phonemes directly.
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? current phonemes should come from japanese_mfa v3.0.0
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? japanese_mfa v3.0.0
    • If it's a model you've trained, what data was it trained on?

Log file
No log files were generated, since the problem does not manifest as a runtime error.

@leandro-gracia-gil
Copy link
Author

Note: this example is for Japanese, but I expect to do the same (feeding phonemes as input) in a few other latin script languages. I haven't checked yet if these are also affected by the same issue.

@leandro-gracia-gil
Copy link
Author

Also, one thing I had to fix while debugging. I can open a separate bug if needed.

In file tokenization/japanese.py, line 19:

config_path = resource_dir.joinpath("japanese", "sudachi_config.json")

This fails later because config_path is a pathlib object, which is not supported by sudachipy. It can be easily fixed by forcing a conversion to string.

config_path = str(resource_dir.joinpath("japanese", "sudachi_config.json"))

@mmcauliffe
Copy link
Member

mmcauliffe commented May 4, 2024

You can download the old 2.0 Japanese model via mfa download acoustic japanese_mfa --version 2.0.1a --force (see https://mfa-models.readthedocs.io/en/latest/acoustic/Japanese/Japanese%20MFA%20acoustic%20model%20v2_0_1a.html). The 3.0 Japanese model uses sudachipy's tokenization for input text and assumes it's normal Japanese kana/kanji/romaji, which is why i is getting mapped to アイ and IPA specific symbols are ignored.

@leandro-gracia-gil
Copy link
Author

I see, thanks. Regardless from the tokenization issue, is there any other new feature, or improvement in quality I would be missing by using the old 2.0.1a model instead of the 3.0.0 one?

Also, since the 3.0.0 model uses text + tokenization, is it trying to align with all possible pronunciations (as in different phonemes with different probabilities for a same word in a dict) and picking the best match, or rather using some criteria to pick the most likely pronunciation first and then attempt to align with it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants