Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote attributions to Character Ids #3

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

NikhilPr95
Copy link

Created a function that assigns character Ids beforehand in the BookNLP process() function, rather than at the end during printing.

For the option "d", In printQuotes, added two extra attributes, sentenceID and characterId to be attributed to each quote and printed to file. This quickly widens the scope of quote Attribution to a large extent, as every 'he' and 'she' that a quote is attributed to is mapped to their character Ids, making it possible to see which character actually said which quote by a hugely increased amount.

New function setCharacterIDs contains code copy-pasted
New function setCharacterIDs contains code cut-pasted from PrintUtils.PrintTokens. This sets the characterIds of each token beforehand making the Ids easier to access as well as resulting in more accurate tokens during processing. Originally, the characterIds are set only during printing, leaving them inaccessible until that point.
Added sentenceID as well as attributed Character Id for each quote that is printed. This makes it convenient to attribute quotes to characters even when the speakers are referred to only by pronouns such as 'he' and 'she'.
Calling new function setCharacterIds that sets character Ids beforehand rather than during printing only.
In each 'para' containing multiple quotes, all said quotes are spoken by the first speaker, except when an error is made by the parser itself. The code added reflects this change.
switched positions of the calls of printWithLinksAndCorefAndQuotes and dumpForAnnotation, as the former uses information from the latter
Used alternate method that does not rely on book.animateEntities for extracting phrase names. This was mostly done due to the changes added in the quote attribution method that required a quote-attributed name to be valid only if it came from a phrase put into animateEntities. The  improvements I made to the quote-attribution program stood in contradiciton with this, as the name I extracted for quote-attribution did not always stay in animateEntities. I could have added that particular 'phrase' containing the name to animateEntities instead, but I decided to subvert the requirement itself as I did not want to meddle with the code for extracting phrases unnecessarily. The code I wrote would add phrases which the phrase-generating code did not deem as legitimate to add. As my program does not need this requirement anyway (I don't know if requirement is the correct word - I saw the code few weeks ago, and all I can say is that all the quote-attributed names happen to be from phrases in animateEntities - I don't remember whether the name dictates whether the phrase is added or the other way round), I just skipped it and used an alternate method that requires the printHTML option to be processed first as can be seen in my changes to BookNLP.java
Added condition for ner being 'PERSON' as well as a new feature for the same, isPerson
Includes checking whether the quote is in same para as last quote, and assigning the former attribution to the latter
Now using Stanford CoreNLP (latest version) for parsing including dependency parsing (formerly done by MaltParser) including option 'depparse' for faster parsing using neural networks. This uses Universal Dependencies rather than Stanford dependencies as the CoreNLP does itself. Universal dependencies, however, create trees which sometimes have loops in them for dependency trees because of multiple found relations. This is dealt with here by choosing the best link, and tree from a graph.
New weights generated as a result of adding new feature 'isPerson'
New feature isPerson added
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant