Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regnal names/epithets #802

Closed
aryamanarora opened this issue Jul 29, 2021 · 19 comments
Closed

Regnal names/epithets #802

aryamanarora opened this issue Jul 29, 2021 · 19 comments

Comments

@aryamanarora
Copy link
Member

aryamanarora commented Jul 29, 2021

I and @AdamFarris have been annotating some Aśokan Prakrit texts over in the UD_Prakrit-DIPI repo. (These were inscriptions commissioned by the Mauryan king Aśoka a long time ago and represent the earliest written stage of Middle Indo-Aryan after Sanskrit.)

One issue that has come up is how to deal with Aśoka's regnal names: Devānaṃ-priyena Priya-dasinā rāña "beloved-of-the-gods looking-with-kindness King" (note that each nominal here is in instrumental case). Sanskrit nominal compounds like this are always headed by the last nominal, so currently we have this (using Compound=Yes for non-declined parts of compounds like priya, like UD_Sanskrit-UFAL does per #539):

UD table

ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
4-5 Devānaṃpriyena _ _ _ _ _ _ _ _
4 devānaṃ deva PROPN _ Case=Gen|Gender=Masc|Number=Plur 5 nmod:poss _ Gloss=of-the-Gods
5 priyena priya PROPN _ Case=Inst|Gender=Masc|Number=Sing 8 appos _ Gloss=Beloved
6-7 Priyadasinā _ _ _ _ _ _ _ _
6 priya priya PROPN _ Compound=Yes|Gender=Masc 7 compound _ Gloss=friendly
7 dasinā dasin PROPN _ Case=Ins|Gender=Masc|Number=Sing 8 compound _ Gloss=looking
8 rāña rājan PROPN _ Case=Ins|Gender=Masc|Number=Sing 9 obl:agent _ Gloss=by-king

The issue is a couple of different dependency relations between the various elements are possible here:

  • appos (I'm sure the title orders could be flipped here and it's fine as long as rāña "king" is last, so that fits this)
  • compound
  • flat:name (but these aren't really part of his name, they're special titles that have also been used by other kings, e.g. Devanampiya Tissa of Anuradhapura).

Uncertain about which one is best. (Also don't think this issue is Aśokan-specific, hence put it here.)

@nschneid
Copy link
Contributor

Titles and other miscellaneous nominal constructions are a sore spot in general, e.g. #757.

@aryamanarora
Copy link
Member Author

Well that's messy...

@dan-zeman dan-zeman added this to the v2.9 milestone Aug 6, 2021
@dan-zeman
Copy link
Member

Using flat would mean that the first part (Devānaṃ-priyena, or its head if it is tokenized to multiple tokens) is the head of the whole, and the other parts (here, 2. Priya-dasinā, and 3. rāña) are attached to it via flat relations. It does not have to be subtyped as flat:name, so it is probably fine that the first two parts are not name parts. (Just like President is not a name part in President Barack Obama.) I think I would go with flat.

@manning
Copy link
Contributor

manning commented Aug 16, 2021

While having no particular language expertise, using appos as you have looks acceptable to me, and close to a canonical use of apposition (a reorderable second description). Since these appear to be quite transparent syntactic constructions (and not quite a canonical proper name), I think you could use relations as you have suggested. However, I can also see the wisdom and simplicity of @dan-zeman's view: You could just regard it all as a long and complex title, and then do it with flat the same as is done for President Biden or Vice-President Kamala Harris.

@Stormur
Copy link
Contributor

Stormur commented Sep 6, 2021

Among the three proposals, as already expressed in other discussions over the past months I think that compound should not be considered (ever).

The first problem I see to better understand the case is that UPOS here is uniformly PROPN, but I don't think this makes sense: these are all "common" words which just concur to create a regnal name. The PROPN labels here (apart from the elusive definition of this UPOS) contribute to completely obscure the internal structure. However, if I am to base myself on your translation 'beloved-of-the-gods looking-with-kindness King' and the glosses, then I'd propose:

UD table

ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
4-5 Devānaṃpriyena _ _ _ _ _ _ _ _
4 devānaṃ deva NOUN _ Case=Gen|Gender=Masc|Number=Plur 5 nmod:poss _ Gloss=of-the-Gods
5 priyena priya VERB? _ Case=Inst|Gender=Masc|Number=Sing 8 acl/amod _ Gloss=Beloved
6-7 Priyadasinā _ _ _ _ _ _ _ _
6 priya priya ADV? _ Compound=Yes|Gender=Masc 7 advmod? _ Gloss=friendly
7 dasinā dasin VERB? _ Case=Ins|Gender=Masc|Number=Sing 8 acl/amod _ Gloss=looking
8 rāña rājan NOUN _ Case=Ins|Gender=Masc|Number=Sing 9 obl:agent _ Gloss=by-king

That is, also independently from the chosen UPOSs, these attributes are expressed as expressions with a verbal (participial) or adjectival head, or the like, anyway attributively. The agreement in instrumental, masculine and singular seems to signal that. I do not see any particular "horizontal" relation like flat (as in King Aśoka) or appos (as in Aśoka, the great king, ordered that...). Of course, this all is subjected to my limited understanding of Prakrit and these kinds of texts.

By the way, out of curiousity, how is a regnal name defined in literature? In general, I would take "epithet" as a cover term for a kind of attributive (*mod/appos) expression.

@aryamanarora
Copy link
Member Author

Right, this is exactly the issue I was thinking but had not been able to articulate @Stormur, thanks for the input. Good point on the POS tagging/deprels of the morphemes, I think that is the better way to do it rather than pure PROPN. On the deprel of the nominals: Ashokan Prakrit, like Sanskrit, hardly differentiates the categories of adjective and noun, and so a list of nominals like this syntactically behaves the same as an adjective + adjective + noun construction--all elements agree in case/number/gender with the final element. So it certainly cannot be a headless construction as necessitated by flat.

I agree now that compound just doesn't work here. The three nominals all refer to one entity, there is not a specific thing thing that is a Priyadasinā rāña.

The argument for appos: "Good tests include to ask whether the two halves are full nominals, whether the two halves can be swapped or not, and whether there is case or agreement concord (in a language with rich morphology)." In the Ashokan we have multiple full nominals which have case and agreement concord with the last nominal. I don't think they're swappable though, at least not without changing the head.

In that sense maybe you are in the right direction with the amod relation. Given the ambiguity between adjectives/nominals in Middle Indo-Aryan (like Sanskrit), all we can concretely say is that the last element is UD NOUN. Furthermore, priya is usually an adjective (it can decline in any of the genders) and dasin can be one as well. I would like amod if it wasn't that these are also epithets which can be nominal. It's not just "beloved-of-the-Gods", it's "the Beloved-of-the-Gods". We could drop rāña and have it still be grammatical.

@nschneid and @amir-zeldes's recent paper (https://arxiv.org/pdf/2108.12928.pdf) has some discussion on titles for English which are relevant here. I like the suggestion of nmod:title in there since it (1) preserves the correct headedness, and (2) treats the different parts as nominals.

There's no good linguistic discussion of epithets/regnal names in Sanskrit or MIA that I can find, I just picked a useful descriptive name for this phenomenon. Generally only Indologists study this kind of stuff and they are more interested in the semantics or history of these terms rather than the syntax. (And admittedly, no one is really working on synchronic Middle Indo-Aryan linguistics as of the past decade.)

@amir-zeldes
Copy link
Contributor

I like the suggestion of nmod:title

The draft compares a lot of different options, but my favorite would be to fold this under the suggested nmod:desc which would be more general and also capture things that are not strictly titles, including bare NP descriptive modifiers, such as @nschneid 's example "French actor/nmod:desc Gaspard Ulliel"

For the ADJ/NOUN issue, we've run into this in several languages, and as best as I can see the only way to maintain the distinction is to talk about nominals with inherent gender (so maybe there is "priestess" and "priest", but those are distinguished by derivation, not inflection, and each has its own inflection), vs. flexible gender, which is the ADJ class.

In some languages, the "flexible" words are understood to be nominalized into NOUNs in context, and only attributive and predicative contexts are treated as ADJ, while argument filler instances are treated as NOUN. In languages where such conversions are rare, like English, the UD guidelines to keep them as ADJ are applied more liberally (e.g. "the poor", which also stands out by not having a plural inflection).

@Stormur
Copy link
Contributor

Stormur commented Sep 13, 2021

In that sense maybe you are in the right direction with the amod relation. Given the ambiguity between adjectives/nominals in Middle Indo-Aryan (like Sanskrit), all we can concretely say is that the last element is UD NOUN. Furthermore, priya is usually an adjective (it can decline in any of the genders) and dasin can be one as well. I would like amod if it wasn't that these are also epithets which can be nominal. It's not just "beloved-of-the-Gods", it's "the Beloved-of-the-Gods". We could drop rāña and have it still be grammatical.

This issue comes up very often in Latin (and not only there), too, and touches upon some fundamental questions of annotation. I'll try to explain my point of view very briefly:

  • the distinction between an ADJ and a NOUN cannot rest entirely on morphological grounds, because, as you notice, in a language like Sanskrit/Latin/... both categories share the same nominal paradigms: this makes them nearly undistinguishable from a purely morphological point of view;
  • in such languages, it can neither be purely syntactical, because they allow adjectival elements to be heads of a noun phrase. I am opposed to an annotation of contextual type where an adjectival element "gains" the NOUN tag if it happens to be the head: this is a loss of information, because it becomes a mechanical variation and does not tell anything anymore about this lexeme. The fact that an ADJ "works as a noun" is instead expressed by syntactical relation, where you'll find an nsubj pointing directly towards it, and so on.

So, if you say that priya is usually an adjective, I think it should stay like that, especially if it keeps a referential function (in this case to rāña or a similar intended entity). Full substantivisation is a further step and entails e.g. that the adjectival element becomes fixated with the same gender independently from context. It is not what is happening here, at least not in this expression. That, in my opinion, rules out nmod.

@aryamanarora
Copy link
Member Author

Thanks for the input @amir-zeldes and @Stormur, really clears up the problem on ADJ/NOUN differentiation in this sort of language. Both priya and dasin are of flexible gender (even if they can be substantivised) unlike nouns so ADJ and the amod relation probably are the best fit. This is compatible with nmod for nominal titles in other languages too, so seems like the best decision.

Would a subtype amod:desc make sense, given the name-like behaviour of the title here?

@amir-zeldes
Copy link
Contributor

In the interest of not proliferating too many labels, we've only advocated for subtyping the nmod case.

But actually are you sure in this specific case you don't want priya- to be compound to dasinā? I mean, my Sanskrit is pretty rusty, but isn't this the compound forming stem form, without a case ending, equivalent of Greco-Latin stem modifiers in -o (like "Greco" actually?)

If so, then I think compound could be good, since these things are not really independent word forms, but only appear as a modifier within a compound, which then carries case inflection and behaves like a whole adjective or noun. At least, that's how I think of stems like "Greco-" or "anthropo-" when they appear in a word like "anthropocentric" - they are neither amod nor nmod, they're just a modifier part of a compound word.

@aryamanarora
Copy link
Member Author

But actually are you sure in this specific case you don't want priya- to be compound to dasinā?

@amir-zeldes well, not necessarily. It is true that the caseless stem form only occurs in non-final morphemes in compounds but the UD deprel compound is far too restrictive to apply in every instance of this. The UD_Sanskrit-UFAL corpus has a lot of instances of caseless adj + noun compounding with the deprel amod on the adjective, e.g. pūrva-patni "former wife", tatʰya-vacana "true word". (Grew query) What they do is stick a morphological feature Compound=Yes on caseless items but still analyse the dependency structure.

The Sanskrit grammatical tradition has a very comprehensive analysis of compound types (in fact, it would be pretty neat to try to map that to UD relations.) Ashokan priya-dasin (< Skt. priya-darśin) is a karmadʰāraya compound, i.e. it can be rephrased as a nominative case modifier priyo dasin (Skt. priyaḣ darśin) "loving looker" where the adjective is pretty clearly describing the noun. ADJ + NOUN karmadʰāraya-s seem to fit under amod while a NOUN + NOUN one would probably be compound (e.g. Skt. nara-siṁhaḣ = naraḣ siṁhaḣ "man-lion"). I'm not really sure if that is the best treatment.

I think there's a bit of computational work on Sanskrit compound parsing by Amba Kulkarni but it has not been explored in UD, only kāraka-style dependency formalisms which were popular in Indian comp ling for a while.

@amir-zeldes
Copy link
Contributor

OK, so it sounds like "yes, they are the typical Indo-European compounding stems", but there is a tradition that sub-types them based on the category of the stem and the semantic roles, right?

I think in Germanic languages the tradition has been to call these "compounds" regardless of the constituent properties (so in German, we speak of A+N compounds, but still they are called compound). Recently this was discussed for English "hot dog":

#756

And then continued here:

#761
#757

I think if normal adjectival modification looks distinct from the stem+noun construction (unlike in English), it maybe makes sense not to call this amod in Sanskrit, since that would be the relation in priyodarśin (or priyah darśin % sandhi), and it's "compound-y" in that the result behaves like one word and we can't have phrasal modifiers on priya-. But as you can see from the issues linked above, there are lots of language-specific decisions, and of course staying consistent with existing Sankskrit data is important, so if that overrides this consideration I certainly can't argue with that...

As a side note, Latin and Greek UD TBs do not tokenize such modifiers at all, so there is more inconsistency there.

@aryamanarora
Copy link
Member Author

Thinking about this a bit more, @Stormur noted that in Latin an ADJ can be nominalised and head an NP, which seems to be what is happening here as well with each title. So perhaps nmod (or a subtype thereof) for the titles is actually acceptable here. (And that fits better with the discussion in English and I imagine other languages with titles.)

OK, so it sounds like "yes, they are the typical Indo-European compounding stems", but there is a tradition that sub-types them based on the category of the stem and the semantic roles, right?

Right, exactly, and this seems to be the convention retained up by the Sanskrit dependency corpora so it would make sense for Ashokan Prakrit (and other future Middle Indo-Aryan) corpora to stick to that. But I'm not sure how far this convention should go, since it seems to start infringing into semantic territory with e.g. deva-putra "son of a God" which would be deprel'd with nmod:poss or nr̩pa-hata "killed by the king" with nsubj(:pass).

It seems compound has a lot of variation across corpora, and on top of it it's not clear where to draw a line as to how far we should break down words. Not really certain of how to use it now.

@Stormur
Copy link
Contributor

Stormur commented Sep 14, 2021

Would a subtype amod:desc make sense, given the name-like behaviour of the title here?

This double seems a bit redundant to me, in that an amod is usually some kind of "description". But maybe some specific label for regal titles might be envisioned?

But actually are you sure in this specific case you don't want priya- to be compound to dasinā? I mean, my Sanskrit is pretty rusty, but isn't this the compound forming stem form, without a case ending, equivalent of Greco-Latin stem modifiers in -o (like "Greco" actually?)

This feature is already covered at the morphological level with the combination Compound=Yes. But otherwise, the relation between the element is quite transparent, and, since the word has already been split during tokenisation, it deserves to be annotated like that: the reduced morphology is "accidental", as it were, and particular of the Sanskrit system.
Anyway, the relation compound, apart from being controversial and poorly defined (as discussed in #761 ), has as one of its major disadvantages that it obscures these kinds of relationships. I'd see it best as a (still redundant, but maybe useful) subrelation, in this case for example amod:compound.

@Stormur
Copy link
Contributor

Stormur commented Sep 14, 2021

As a side note, Latin and Greek UD TBs do not tokenize such modifiers at all, so there is more inconsistency there.

This is in fact an issue that will need to be discussed (and that I have on my notebook). At the moment, Latin is using the Compound feature, but in a slightly different way than discussed here, i.e. to point out that the marked word has arisen as the combination of two stems. Some examples: princeps 'foremost' < primus 'first' + capio 'to seize'; sicut 'as (comparative)' < sic 'so' + ut 'as'; animadverto 'to give attention' > animum adverto 'I turn my mind towards'.

Now, we see different cases here. While it is more or less obvious that it does not make sense to split princeps (morphologically and lexically) or sicut (functionally), even if we can pursue their etymological iter, a case can be made for animadverto. But Latin is not very satisfying in this sense, as there are no clear, large-scale compounding strategies as in Greek ,Sanskrit or German. There, the argument for splitting is much stronger than for animadverto: you have systematic methods and morphological instruments that are regularly applied and can simply be seen as "fusive counterparts" of more analytic strategies applied by other languages. So, if we have e.g. γεωλογία in Greek, Latin would refer to it as scientia terrae, and so on. Morphologically different, but syntactically equivalent.

@Stormur
Copy link
Contributor

Stormur commented Sep 14, 2021

Thinking about this a bit more, @Stormur noted that in Latin an ADJ can be nominalised and head an NP, which seems to be what is happening here as well with each title. So perhaps nmod (or a subtype thereof) for the titles is actually acceptable here. (And that fits better with the discussion in English and I imagine other languages with titles.)

Maybe I repeat myself, but I'd like to point out that what we call "nominalisation" or "substantivisation" of an adjective most of the time actually just refers to a possible variation in syntactical variation and does not really imply a change at the "lexeme level" (pardon me if I am not using too technical, or vague expressions here). So I would even refrain to say that an adjective is "nominalised", and just note that it can be the head of an NP (whereas a language like English usually requires a dummy element like one), where however some head is implied.

Now, I was asking about definitions of regnal names, or epithets because I think I see a small confusion: some attributes associated to the names of kings & co. become standardised and then can have a sort of life of their own; this probably makes us reanalyse them as independent, thus favouring an annotation as NOUN. Notwithstanding that this may indeed happen sometimes in the long run, usually such attributes just stay the same fro ma syntactic point of view. Then, I'd see an nmod justified in a case like "the will of the beloved by the Gods", with nmod(will,beloved) while beloved stays ADJ (or VERB in a nominal form). But if we fill the implied "blank" with "the bloved-by-the-Gods king X", we see the attribute again "stepping back" in an amod/acl role.

@amir-zeldes
Copy link
Contributor

I think the tension in these analyses is about what's more important: the compound relation stresses the fact that the modifier components are less than a complete word in some sense, and that the result of the compounding process behaves like a "word".

The individual relations analysis (modifiers as amod, obj, etc. based on semantics) stress the benefit of sub-typing the modifiers, since there are many types of compounds. This is something like a "syntax below zero" approach in theoretical morphology, where we would say that argument structure compound (a.k.a. synthetic compounds) still have something like nsubj or obj dependents 'below the word level'.

For languages like English, where compounds are spelled apart and there is little formal difference between a compound modifier and an independent word, I can see how it is more tempting to choose the second option and use multiple types (currently, compound is used mainly for N-N compounds, and other types like A-N actually use amod).

For languages in which there is a clear difference between compound modifier and independent word modifiers, this is less tempting, because using the normal relation (amod, obj etc.) clashes with the expected form of a word in these roles (e.g. accusative marking on an obj). The way to identify that something is an obj inside such a compound without marking is primarily semantic, not syntactic. It also creates a problem when the modifier has very ad hoc semantics, such as Downing's (1977) "apple juice seat" meaning "seat in
front of which a glass of apple-juice had been placed".

Finally if the language has a tradition of spelling compounds together, like German or Greek, then many TBs will simply not tokenize the compound components, and avoid having to make this decision.

I'm not sure if "Universal" UD guidelines will be helpful here, since traditions clearly differ across languages, and it's probably not realistic to revise data to be consistent across so many datasets... But I am curious what @dan-zeman and other people who are involved in guidelines across languages think about this tension.

@aryamanarora
Copy link
Member Author

aryamanarora commented Sep 14, 2021

Notwithstanding that this may indeed happen sometimes in the long run, usually such attributes just stay the same fro ma syntactic point of view.

I looked around the corpus and there are instances where "king" in the epithet gets dropped, e.g. in the edition of this sentence in the edict at Kalsi: iyaṁ dʰaṁma-lipi Devānaṁpiyēna Piyadas[i]nā [lēkʰit]ā (some parts hard to read but this is the consensus reading). That makes me want to treat the titles as nominals with nmod(:desc) rather than attributive amod.

@dan-zeman dan-zeman modified the milestones: v2.9, v2.11 Jun 13, 2022
@dan-zeman dan-zeman modified the milestones: v2.11, v2.13 May 25, 2023
@dan-zeman
Copy link
Member

But I am curious what @dan-zeman and other people who are involved in guidelines across languages think about this tension.

It is difficult to say anything sufficiently general and cross-linguistically valid. I might prefer compound over amod if the adjectival part has a form different from normal adjectival modifiers, but it clearly depends on the language.

For languages like German, similar decision may have to be made in the enhanced representation under the emerging proposal from Dagstuhl on optional compound splitting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants