Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode issue with Greek accented vowels in prosody #1191

Open
JoshuaCCampbell opened this issue Nov 6, 2022 · 1 comment
Open

Unicode issue with Greek accented vowels in prosody #1191

JoshuaCCampbell opened this issue Nov 6, 2022 · 1 comment
Labels

Comments

@JoshuaCCampbell
Copy link

JoshuaCCampbell commented Nov 6, 2022

Unicode has two code points for acute accented vowels, one in the Greek and Coptic block and one in the Greek extended block (for omicron they are U+03CC and U+1F79. The list of accented vowels only takes into account the acute accents in the Greek and Coptic block resulting in some vowels not being properly scanned.

>>> from cltk.prosody.grc import Scansion
>>> text_string = "πότνια, θῦμον"
>>> Scansion()._make_syllables(text_string)
[[['πότνι', 'α'], ['θῦ', 'μον']]]

Expected behavior

>>> from cltk.prosody.grc import Scansion
>>> text_string = "πότνια, θῦμον"
>>> Scansion()._make_syllables(text_string)
[[['πο', 'τνι' , 'α'], ['θῦ', 'μον']]]

Desktop

  • MacOS 13.0
@pharos-alexandria
Copy link
Contributor

Working with Greek, I normalize (unicodedata.normalize to NFC) everything before processing further. U+1F79 is normalized to U+03CC (https://www.unicode.org/charts/normalization/).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants