Skip to content

uniglot/korean-word-ipa-dictionary

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Korean Word-IPA Dictionary

Notice 2 (28 Apr, 2023)

Sorry, I just realized that I'm a weary lazy procrastinator.

I restructured the codebase earlier, but I've got no notable updates yet. I'll do remaining things maybe... within September?

Notice (19 Aug, 2022)

I'm going to refactor the entire code soon and add a CI pipeline to keep the dictionary updated!

I'll put efforts to get it done within September.

1. Getting List of Word Entries

From the latest Kowiktionary dump, I got the list of every word in main namespace. After getting this list, I filtered out all entries which are not written in Hangul, and stored Korean word entries in the file kodict_entry.txt.

2. Crawling

By running crawl.py simultaneously on 11 subsets of kodict_entry.txt, which consist of 6000 words (except the last one), I extracted IPA information, forming a word-IPA dictionary for Korean language. After the crawling processes are all completed, I appended the results in alphabetical order, and deleted entries with no extracted IPA.

3. Converting IPA to X-SAMPA

From any word-IPA dictionary files, you can convert it to word-X-SAMPA dictionary.

from convert import Converter

conv = Converter()
conv.subst_dict(<NAME_OF_DICT>)

4. Licenses

You can make use of the results of scripts (i.e., .dict files and kodict_entry.txt file) under CC BY-SA. You can use the scripts under MIT License.

About

Dictionary of pairs of Korean word and IPA crawled from Wiktionary (Korean edition)

Topics

Resources

License

MIT, CC-BY-SA-4.0 licenses found

Licenses found

MIT
LICENSE
CC-BY-SA-4.0
LICENSE-CC-BY-SA

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages