Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Support for phonetic reading of character based languages and abjads #71

Open
marknsikora opened this issue Jul 13, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@marknsikora
Copy link

Summary

The current implementation works well for languages that use alphabets/abugidas, but I feel there is a hole for languages where the reading is not explicit.

Example 1, would be Mandarin and Japanese. The hanzi and kanji don't have obvious pronunciations. For Chinese flash cards it is common to convert the whole sentence into pinyin. For Japanese using furigana to annotate the pronunciation of the kanji is the norm.

Example 2 would be Arabic. Arabic is an abjad, which means for most words only the consonants are written and the vowels are inferred. Arabic has optional diacritics to explicitly annotate the vowels.

Proposal

First would be adding two new fields, for my cards I use SentencePinyin and Pinyin for the reading of the sentence and the word. Something more general that would apply to all languages would be useful, but pronunciation is already used.

Second is the more difficult part, getting the readings. Chinese is relatively simple, most characters have a single reading and the alternate readings are usually clear from an adjacent character. From my understanding Japanese is much more difficult, with each kanji having multiple readings depending on context. My poor understanding of Arabic is that most words have a single reading.

From a technical perspective though, the second part would likely require a separate library and local dictionary for lookups. Fortunately I believe the only widely spoken languages that would need this support would be Chinese, Japanese, Arabic, and Hebrew. That would hopefully mean just 4 small dependencies and then requiring the user to download the supporting dictionary/binary files themselves.

I will have a look at Chinese support, but we're talking a timeline of a few months here. For testing any of the other languages we'd need speakers/learners of the language.

@marknsikora marknsikora added the enhancement New feature or request label Jul 13, 2023
@1over137
Copy link
Contributor

My preferred way to handle this would be to include pronunciation lookups as a kind of dictionary that can be handled in a special way, but it will probably be a while until I have time to implement the changes to make that possible, and even then a good source of pronunciation is needed in a usable format, so actually making this work might be harder than expected. My suggestion for now is to either find or create your own dictionary that includes the pronunciation somewhere in the definition.

@1over137
Copy link
Contributor

I intend to implement this by extending the current "Source" system to include lemmatizers, reading dictionaries, and tag dictionaries which can be used for gender, noun classes, etc, though it would probably be a while before that can be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants