[Feature Request] Support for phonetic reading of character based languages and abjads #71

marknsikora · 2023-07-13T16:16:11Z

Summary

The current implementation works well for languages that use alphabets/abugidas, but I feel there is a hole for languages where the reading is not explicit.

Example 1, would be Mandarin and Japanese. The hanzi and kanji don't have obvious pronunciations. For Chinese flash cards it is common to convert the whole sentence into pinyin. For Japanese using furigana to annotate the pronunciation of the kanji is the norm.

Example 2 would be Arabic. Arabic is an abjad, which means for most words only the consonants are written and the vowels are inferred. Arabic has optional diacritics to explicitly annotate the vowels.

Proposal

First would be adding two new fields, for my cards I use SentencePinyin and Pinyin for the reading of the sentence and the word. Something more general that would apply to all languages would be useful, but pronunciation is already used.

Second is the more difficult part, getting the readings. Chinese is relatively simple, most characters have a single reading and the alternate readings are usually clear from an adjacent character. From my understanding Japanese is much more difficult, with each kanji having multiple readings depending on context. My poor understanding of Arabic is that most words have a single reading.

From a technical perspective though, the second part would likely require a separate library and local dictionary for lookups. Fortunately I believe the only widely spoken languages that would need this support would be Chinese, Japanese, Arabic, and Hebrew. That would hopefully mean just 4 small dependencies and then requiring the user to download the supporting dictionary/binary files themselves.

I will have a look at Chinese support, but we're talking a timeline of a few months here. For testing any of the other languages we'd need speakers/learners of the language.

1over137 · 2023-07-16T13:11:07Z

My preferred way to handle this would be to include pronunciation lookups as a kind of dictionary that can be handled in a special way, but it will probably be a while until I have time to implement the changes to make that possible, and even then a good source of pronunciation is needed in a usable format, so actually making this work might be harder than expected. My suggestion for now is to either find or create your own dictionary that includes the pronunciation somewhere in the definition.

1over137 · 2024-03-23T15:16:40Z

I intend to implement this by extending the current "Source" system to include lemmatizers, reading dictionaries, and tag dictionaries which can be used for gender, noun classes, etc, though it would probably be a while before that can be done.

marknsikora added the enhancement New feature or request label Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support for phonetic reading of character based languages and abjads #71

[Feature Request] Support for phonetic reading of character based languages and abjads #71

marknsikora commented Jul 13, 2023

1over137 commented Jul 16, 2023

1over137 commented Mar 23, 2024

[Feature Request] Support for phonetic reading of character based languages and abjads #71

[Feature Request] Support for phonetic reading of character based languages and abjads #71

Comments

marknsikora commented Jul 13, 2023

Summary

Proposal

1over137 commented Jul 16, 2023

1over137 commented Mar 23, 2024