Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect syllabification in Greek prosody #1246

Closed
sjhuskey opened this issue Jan 25, 2024 · 8 comments
Closed

Incorrect syllabification in Greek prosody #1246

sjhuskey opened this issue Jan 25, 2024 · 8 comments
Labels

Comments

@sjhuskey
Copy link

The Scansion()._make_syllables() method does not syllabify words as expected.

To Reproduce
Steps to reproduce the behavior:

  1. Install Python version 3.9.18
  2. Install CLTK version 1.2.1.
  3. In a script or REPL, run the following code
from cltk.prosody.grc import Scansion
text_string = "ἀνδρῶν ἀρίστων, οἳ τὸ πάγχρυσον δέρας."
print(Scansion()._make_syllables(text_string))
print(Scansion().scan_text(text_string))

Expected behavior
For example, this line:

ἀνδρῶν ἀρίστων, οἳ τὸ πάγχρυσον δέρας

Should be broken into these syllables:

ἀν δρῶ νἀ ρίσ των οἳ τὸ πάγ χρυ σον δέ ρας

But Scansion()._make_syllables() produces this:

[[['α', 'νδρων'], ['α', 'ρι', 'στων'], ['οι'], ['το'], ['πα', 'γχρυ', 'σον'], ['δε', 'ρας']]]

That has ramifications for the scansion. It should be this:

––˘–|––˘–|––˘–

But Scansion().scan_text() on that line produces this:

¯¯¯¯¯¯˘˘˘¯˘x

It looks like the syllabification process is not accounting well for double consonants.

Desktop (please complete the following information):

  • OS and version: Mac OS 13.6.3
@sjhuskey sjhuskey added the bug label Jan 25, 2024
@clemsciences
Copy link
Member

Hello @sjhuskey,

Thank you for the report, I'm not the on that developed the syllabifier for Ancient Greek, but I can look at how this can be fixed. The syllabification can be either fixed here in the prosody module for Ancient Greek or done with a specific process.
I'll try solutions when I have time.

@SDCLA
Copy link
Contributor

SDCLA commented Mar 5, 2024

Hi @sjhuskey,
I've been having similar problems. Here are my non-expert thoughts from testing.

The syllabifier doesn't syllabify the way you might want, but tries to end a "syllable" with a vowel. Therefore the unusual splitting of syllables, while perhaps undesirable from an academic perspective, seems to be the behaviour desired by the author as described in the comments on the program.

When testing this sentence, the first α of ἀρίστων is falling prey to the problem of line 288 (next_syll = sentence[sentence.index(syllable) + 1]) which finds the index of the syllable. It should return an index of 2, and instead returns [0] because it is returning the index of the first syllable 'α' in the sentence. This means it is returning True to _long_by_position case 1.

The πα of πάγχρυσον should return True to case 1 (since it is followed by two consonants and not a mute + liquid combination). The phrasing of lines 290—291, however, is:

        if (next_syll[0] in self.sing_cons and next_syll[1] in self.sing_cons) and (
            next_syll[0] not in self.stops and next_syll[1] not in self.liquids)

I think that:
if (next_syll[0] in self.sing_cons and next_syll[1] in self.sing_cons) and (
next_syll[0] not in self.stops or next_syll[1] not in self.liquids)

should be correct, since the lack of either of these ought to allow for a return True, and it works in this case. I can't think of any undesirable outcomes, but I will look more closely and consult the method in McCabe 1981.

As for the γχρυ of πάγχρυσον, I can't think of any way to recognize the length (which ought to be long) without an implementation of a similar dictionary-based macronizer to that in the Latin scansion module. I don't know if a morpheus solution for Greek would work, but something similar would be excellent. The scansion model as Kirby wrote it asks for fully macronized texts to begin with.

All best.

@SDCLA
Copy link
Contributor

SDCLA commented Mar 5, 2024

The solution I am currently trying out for my own research adds an enumerator to prevent the index function fetching the wrong syllable, so changing:

line 266 to: def _long_by_position(self, sentence_index, syllable: str, sentence: list[str]) -> bool:

288 to: next_syll = sentence[sentence_index + 1]
using the variable sentence_index to locate the syllable in question as a part of the function _long_by_position. The variable is set in the _scansion function.

324 to: for i, syllable in enumerate(sentence):
325 to: if self._long_by_position(i, syllable, sentence) or self._long_by_nature(

These fairly minimal changes (with the change to the logic of lines 290–291) make the line ἀνδρῶν ἀρίστων, οἳ τὸ πάγχρυσον δέρας scan ['¯¯˘¯¯¯˘¯˘¯˘x'], and have removed repeated syllable errors in other texts I have tried.

@kylepjohnson
Copy link
Member

Hi @SDCLA @sjhuskey I don't have the time to help on this, but would gladly accept a working pull request that incorporates your changes. I only ask that you be mindful not to make any unnecessary changes.

@SDCLA
Copy link
Contributor

SDCLA commented Apr 8, 2024

@kylepjohnson sure I'll sort that, I just need to get around to a little more testing in real text examples.

kylepjohnson pushed a commit that referenced this issue Apr 9, 2024
* Update grc.py

This is a solution to two bugs on the grc prosody function. The first changes the logic of the long by position so that the lack of either a mute OR a liquid allows for a long rather than the lack of them both. 
The second adds an enumerator to the scansion function so that locating the position of a syllable in a sentence is not reliant on the "index" function and therefore ought to handle reduplicated syllables better.

* Update grc.py

space error
@SDCLA
Copy link
Contributor

SDCLA commented Apr 9, 2024

Hi @sjhuskey my changes have been integrated. I would love to know what you think if you get a chance to test it! I am planning to look at the possibilities of a dictionary lookup based macronizer for those edge cases that just won't work any other way

@sjhuskey
Copy link
Author

Thanks for your work @SDCLA! I'll check it out sometime this week.

@clemsciences
Copy link
Member

Fixed, reopen this issue if the @kylepjohnson's fix is not sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants