Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MFA align output change original text #805

Open
ponymhc opened this issue May 7, 2024 · 0 comments
Open

MFA align output change original text #805

ponymhc opened this issue May 7, 2024 · 0 comments
Assignees
Labels

Comments

@ponymhc
Copy link

ponymhc commented May 7, 2024

Debugging checklist

[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensure that your issue is not addressed there?
[x] Have you updated to latest MFA version (check https://montreal-forced-aligner.readthedocs.io/en/latest/changelog/changelog_3.0.html)? What is the output of mfa version? 3.0.7
[x] Have you tried rerunning the command with the --clean flag? yes

Describe the issue
A clear and concise description of what the bug is.

When I checked the output of the MFA alignment, I found a case where some entities were changed to be different from the original text. The characters have changed from Simplified Chinese to Traditional Chinese. Here is the content from the .lab file:

霞浦县牙城镇乌岐,瓦窑村水位猛涨。

This is the output of the MFA alignment:

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0
xmax = 4.150354
tiers?
size = 2
item []:
item [1]:
class = "IntervalTier"
name = "words"
xmin = 0
xmax = 4.150354
intervals: size = 8
intervals [1]:
xmin = 0.0
xmax = 0.64
text = "霞浦縣"
intervals [2]:
xmin = 0.64
xmax = 0.99
text = "牙城鎮"
intervals [3]:
xmin = 0.99
xmax = 2.08
text = "烏岐"
intervals [4]:
xmin = 2.08
xmax = 2.42
text = ""
intervals [5]:
xmin = 2.42
xmax = 3.09
text = "瓦窯村"
intervals [6]:
xmin = 3.09
xmax = 3.53
text = "水位"
intervals [7]:
xmin = 3.53
xmax = 4.12
text = "猛漲"
intervals [8]:
xmin = 4.12
xmax = 4.150354
text = ""
item [2]:
class = "IntervalTier"
name = "phones"
xmin = 0
xmax = 4.150354
intervals: size = 12
intervals [1]:
xmin = 0.0
xmax = 0.64
text = "spn"
intervals [2]:
xmin = 0.64
xmax = 0.99
text = "spn"
intervals [3]:
xmin = 0.99
xmax = 2.08
text = "spn"
intervals [4]:
xmin = 2.08
xmax = 2.42
text = ""
intervals [5]:
xmin = 2.42
xmax = 3.09
text = "spn"
intervals [6]:
xmin = 3.09
xmax = 3.22
text = "ʂ"
intervals [7]:
xmin = 3.22
xmax = 3.26
text = "w"
intervals [8]:
xmin = 3.26
xmax = 3.33
text = "ej˨˩˦"
intervals [9]:
xmin = 3.33
xmax = 3.41
text = "w"
intervals [10]:
xmin = 3.41
xmax = 3.53
text = "ej˥˩"
intervals [11]:
xmin = 3.53
xmax = 4.12
text = "spn"
intervals [12]:
xmin = 4.12
xmax = 4.150354
text = ""

For Reproducing your issue
Please fill out the following:

  1. Corpus structure
    • What language is the corpus in? mandarin
    • How many files/speakers? only 1
    • Are you using lab files or TextGrid files for input? .lab
  2. Dictionary
    • Are you using a dictionary from MFA? If so, which one? I tried both mandarin_mfa and mandarin_china_mfa, but encountered the same issue.
    • If it's a custom dictionary, what is the phoneset?
  3. Acoustic model
    • If you're using an acoustic model, is it one download through MFA? If so, which one? mandarin_mfa
    • If it's a model you've trained, what data was it trained on?

Log file
Please attach the log file for the run that encountered an error (by default these will be stored in ~/Documents/MFA).

Desktop (please complete the following information):

  • OS: [e.g. Windows, OSX, Linux] linux
  • Version [e.g. MacOSX 10.15, Ubuntu 20.04, Windows 10, etc] wsl ubuntu 20.04
  • Any other details about the setup (Cloud, Docker, etc)

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants