Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable list of classes or labels #102

Open
kwalcock opened this issue Mar 29, 2022 · 3 comments
Open

Configurable list of classes or labels #102

kwalcock opened this issue Mar 29, 2022 · 3 comments
Assignees

Comments

@kwalcock
Copy link
Member

Mihai said,

Add functionality to print a list of classes (or labels?) that is configurable
This might simplify the job too much…

@kwalcock
Copy link
Member Author

Right now a PrintVariables object is read from the configuration file.

case class PrintVariables(@BeanProperty var mentionLabel: String, @BeanProperty var mentionType: String, @BeanProperty var mentionExtractor: String) {

The Json printers (but not the TSV) use the mentionLabel to filter in mentions to include. The mentionType and mentionLabel are then used to look in the arguments and get a variable and a value pair. They are assumed to be there based on the Mention having the mentionLabel.

@maxaalexeeva, what needs to be added to this logic in order to do the neat things that each of those readers has to do?

@maxaalexeeva
Copy link
Contributor

@kwalcock, I was thinking along the lines of making it possible for every reader to have multiple printvariables and the writer would loop through those? Like if there are two sets of printvariables possible for beliefs ("Belief", "believer", "belief") and ("Belief", "beliefTheme", "belief"), the writer would do the writing for each of these in turn.

Although this still does not address the possibility of having non-binary events---we don't have them now, of course, but still). What if we do not have printvariables at all and just do "for every arguments in mention, print "? And to make sure that columns in the tsv are consistent, sort args by name before looping through them to print?

I can elaborate if that does not make sense.

@kwalcock
Copy link
Member Author

So we're printing things from a kind of Attachment called Context, which isn't involved here, and other things from Mention.arguments, that are involved. Those arguments have been called variable and value for some time. The information collected from the arguments is presently (in the newest PR) stored in a case class

case class ArgumentInfo(variableText: String, valueText: String, valueNorm: String)

Right now the information is not symmetric. It's different for the variable and value so it is necessary to know which is which. If that isn't necessary and we can print the variableNorm and it is calculated the same way as the valueNorm, then your second idea sounds good to me. It would be more up to the reader of the output to decide what to look at and the printer would be relieved of the job.

Especially for the TSV representation, there might need to have a third column that tells the name of the argument. The Printer will not know the name of all the arguments for all the files in advance in order to print their names in a header line (which seems to be missing now anyway). The printer can keep track of which is in which column and keep all the like values vertically aligned, though, if that is desired.

For the Json representation the keys might be the argument name like belief_text: "" and belief_norm: "" or it could be something more complicated like

"belief": {
    "text": "",
    "norm": ""
}

This is assuming that there are no name clashes with other things being printed like sentenceText, inputFilename, and anything coming from the Context. It's a reasonably controlled vocabulary so that it's not a terrible assumption.

The arguments are Map[String, Seq[Mention]] so that there could be multiple variables and value, but we are always just taking the head. That might be reasonable, but heads up. It could be accounted for in printing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants