Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another root error with Collatinus.Decliner #1221

Open
SnakyGrain opened this issue May 5, 2023 · 1 comment
Open

Another root error with Collatinus.Decliner #1221

SnakyGrain opened this issue May 5, 2023 · 1 comment
Labels

Comments

@SnakyGrain
Copy link

SnakyGrain commented May 5, 2023

Collatinus.Decliner produces incorrect results with words such as 'omnis' and 'parens'. This is a separate problem to the one reported at #1127 (problems with 'puer'), and is not solved by the changes to lat.py recommended there.

Python version: 3.9.13
CLTK version: CLTK 1.1.6
Windows 10

  1. Running the following script:
    from cltk.morphology.lat import CollatinusDecliner
    decliner = CollatinusDecliner()
    print(decliner.decline("omnis",False,False))

We see the following erroneous output:
[('omnis', '--s---mn-'), ('omnis', '--s---mv-'), ('omnem', '--s---ma-'), ('omniem', '--s---ma-'), ('omnis', '--s---mg-'), ('omniis', '--s---mg-') etc.]

There is no such form as 'omniem' or 'omniis'. CollatinusDecliner has created both "omn-" and "omni-" as roots for the same root_id.

We should expect to see (and Collatinus gets this right):
[('omnis', '--s---mn-'), ('omnis', '--s---mv-'), ('omnem', '--s---ma-'), ('omnis', '--s---mg-'), etc.]

  1. Running the following script:
    from cltk.morphology.lat import CollatinusDecliner
    decliner = CollatinusDecliner()
    print(decliner.decline("parens",False,False))

We see the following output:
[('parens', '--s---mn-'), ('parens', '--s---mv-'), ('pareiem', '--s---ma-'), ('parentem', '--s---ma-'), ('pareiis', '--s---mg-'), ('parentis', '--s---mg-')...

There are no forms 'pareie-'

We would expect to see:
[('parens', '--s---mn-'), ('parens', '--s---mv-'), ('parentem', '--s---ma-'), ('parentis', '--s---mg-')

Once again, CollatinusDecliner has created both "parent-" and "parei-" as roots

Certainly in the case of 'parens' there seems to be a problem in the "cltk_data\lat\model\lat_models_cltk\lemmata\collatinus\collected.json" files. The model for 'parens' is given as 'infans', and in the models section for 'infans' we see the following root info:

"infans": {"R": {"0": ["2", ""], "1": ["2", "i"], "2": ["2", "issim"], "4": ["K", null], "5": ["2", "i"]}

This cannot be right, as there is no circumstance in which we would remove two letters and replace them with an 'i' (which is what has happened here).

Oddly we find the same root info for 'fortis' which is the model for 'omnis' (and 'fortis' also declines incorrectly):
"fortis": {"R": {"0": ["2", ""], "1": ["2", "i"], "2": ["2", "issim"], "4": ["K", null], "5": ["2", "i"]}

@SnakyGrain SnakyGrain added the bug label May 5, 2023
@SnakyGrain
Copy link
Author

SnakyGrain commented May 6, 2023

I've been trying to work out exactly how the decliner works (and how it works in Collatinus itself) and I may have got this wrong - but it seems to me that:

a] if there is an entry in the lemma-entry for geninf, this should be assigned to root_id 1
b] if there is an entry in the lemma-entry for perf, this should be assigned to root_id 2
c] and in both cases these should replace anything in the root data from the model.

I don't know quite what is happening in lines 126-129 of lat.py:

        if model_root_id in original_roots:
            returned_roots[model_root_id].extend(original_roots[model_root_id])
        returned_roots[model_root_id] = list(set(returned_roots[model_root_id]))
    original_roots.update(returned_roots)

but it looks as if we end up with multiple options for various root_ids (i.e. original_roots[1] = "parent, parei". Replacing the above lines with the following (note that the first line is at the same ident level as 'original_roots.update(returned_roots)' in lat.py seems to fix this (though I don't know whether it breaks something else):

   for model_root_id in returned_roots:
      if model_root_id not in original_roots:
          original_roots[model_root_id]=returned_roots[model_root_id]

However, there also seems to be a problem in collected.json data: under the model for 'infans', the ending data for the neuter singular nominative, voc, and acc (pos 37, 38, 39) is given as root ID 1, ending "-ns". The ending should be the same as the masc/ fem n/v sing, correctly give at pos 13/14 and 25/26 as root ID 4 and no ending (i.e. the canonical form). However, the output is either *parentns or *pareins for the neuter form, instead of expected parens.

This error seems to be in Collatinus itself: in the modeles.la file, the entry for infans reads:
modele:infans
pere:fortis
des:37-39:1:ns
des+:18,30,42:1:ē
des+:22,34,46:1:ŭm3

In Collatinus, des 37-39 refer to the neut sing n/v/a - either the root id should be 0 (remove two letters, then add ns) or 37-39 should be 4:K (I think)

EDIT: in the most up-to-date branch of Collatinus (the Medieval one), this error with infans has been corrected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant