Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPA for Vietnamese #11

Open
TasseDeCafe opened this issue Nov 14, 2018 · 4 comments
Open

IPA for Vietnamese #11

TasseDeCafe opened this issue Nov 14, 2018 · 4 comments

Comments

@TasseDeCafe
Copy link
Contributor

TasseDeCafe commented Nov 14, 2018

Hi!

There is an excellent IPA converter that can convert a text using the Vietnamese script into IPA (for 3 different accents): https://github.com/kirbyj/vPhon

I tested it myself when I was learning Northern Vietnamese. I haven't noticed any inaccuracies when I compared it to audio recordings from a native speaker. The only thing missing is a good Vietnamese dictionary. I'm going to try to find one, but you might already have one.

Edit: Okay, this should do the trick: https://www.informatik.uni-leipzig.de/~duc/software/misc/wordlist.html

@dohliam
Copy link
Member

dohliam commented Nov 15, 2018

@TasseDeCafe Thanks very much for sharing this! 👍 It would be great to add data for Vietnamese, and it looks like the links you found have everything we would need to get started.

I tested out vPhon and it seems to produce excellent results. We should probably convert the tone numbers to IPA tone letters to be consistent though. This seems like it could be pretty straightforward using for example the chart here.

Would you be interested in generating the pronunciations using vPhon and submitting a PR? If so I would be happy to merge it. Otherwise I can do this myself using the links you provided above.

@TasseDeCafe
Copy link
Contributor Author

Okay, great! I will try to generate the dictionary myself, it should be fun. In which format do you want it?

I might be able to generate dictionaries for other languages as well, but let's see how it goes with this one first.

@dohliam
Copy link
Member

dohliam commented Nov 15, 2018

@TasseDeCafe Awesome!

The raw data format is pretty simple -- you can find a description here. Basically it's just a plain text file with the word and corresponding IPA separated by a tab.

The other formats (JSON, XML, etc) are automatically generated from the raw data when I update the releases.

Maybe we could generate three different dictionaries -- one each for North, Central, and South. Would that make sense?

@dohliam
Copy link
Member

dohliam commented Nov 22, 2018

@TasseDeCafe By the way, just wanted to let you know about an early application of this Vietnamese IPA data. It's still in beta and very experimental, but if you go into any of the stories in the link and click on the "ipa" button on the right hand side of the page you should see the corresponding IPA! (Words that didn't match anything in the dictionary -- mostly proper names -- are marked with @.) No audio yet, but we're working on it... 😄

Anyway, thanks again for finding these sources and generating all the data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants