Skip to content

sayarghoshroy/Bengali_Anaphora_Resolution_Challenges

Repository files navigation

Bengali Anaphora Resolution Challenges

Outline of a Rule Based Anaphora Resolution System for Bengali

We present an anaphora resolution system for Bengali which utilizes linguistic features for disambiguation of possible antecedents of a particular anaphor. The system accepts a sentence in UTF8 Bengali script as its input, identifies the pronominal anaphors and detects all possible antecedents for that particular anaphor. Using a set of hierarchical rules, the system disambiguates and selects the best contextual antecedent from the set of possibilities. The system is evaluated on a manually tagged dataset.

This study shows exactly how far a system with linguistic rules can go and illustrates the stepping off point after which the use of core knowledge becomes paramount. However, for a simple rule based system, it performs reasonably well often disambiguating a set of possible antecedents down to the most obvious and correct choice.

Disambiguation Features

  1. Part-Of-Speech information: The POS tag not only identifies pronouns and nouns, but also helps us recognize named entities in text.

  2. Number: The pronoun and its antecedent must have number-agreement i.e references to singular and plural entities must tally with the antecedents themselves.

  3. Person: Agreement of person helps classify one particular antecedent as more probable than another given other features are equivalent.

  4. Status: Honorifics used give us clues as to which particular person the referent is actually referring to.

  5. Morphological : Morphological features are used to identify number in nouns and type of referent in case of pronouns.

For details, refer to the full report.