Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

yield increase extraction issue #157

Open
IkeKobby opened this issue Jul 22, 2022 · 16 comments
Open

yield increase extraction issue #157

IkeKobby opened this issue Jul 22, 2022 · 16 comments

Comments

@IkeKobby
Copy link
Collaborator

yield increase being extracted as a total yield currently in habitus; eg ..... yield increased for about 1.5 t ha-1 in the ... season after the application of XYZ fertilizer. How do we best handle this yield increase from being extracted as total yield from sentences? @maxaalexeeva

@maxaalexeeva
Copy link
Contributor

@IkeKobby try to make a dependency (syntax-based) rule with increase trigger with a new label eg YieldIncrease, variable:Yield and value:Quantity. If doesn't work, we can try it together in our Monday meeting.

@IkeKobby
Copy link
Collaborator Author

Alright sure.

@IkeKobby
Copy link
Collaborator Author

I think this issue can be closed now @maxaalexeeva right?

@maxaalexeeva
Copy link
Contributor

@IkeKobby let's keep it open for a bit while I try to debug the issue with losing a yield amount and until we have this as a pr.

@maxaalexeeva
Copy link
Contributor

@MihaiSurdeanu @kwalcock a quick question: other than habitus actions, is there anything in the extraction system that could filter out mentions (e.g., based on overlap of two mentions)? The issue in a nutshell: I modify a rule a bit (adding a negative lookbehind), and one mention in a sentence with two mentions stops showing up. When it's a chunk of text without the other mention, the extraction is extracted correctly.

@IkeKobby
Copy link
Collaborator Author

@IkeKobby let's keep it open for a bit while I try to debug the issue with losing a yield amount and until we have this as a pr.

Alright sure.

@MihaiSurdeanu
Copy link
Contributor

Not that I recall. But I think we added such an action a while ago (that removes overlapping mentions) to habitus. Do you see it?

@kwalcock
Copy link
Member

This code sounds suspicious:

  /** Keeps a belief mention only if it is not contained in another */
  def keepLongestMentions(mentions: Seq[Mention]): Seq[Mention] = {

removeRedundantVariableMentions could filter out mentions.

@maxaalexeeva
Copy link
Contributor

@MihaiSurdeanu @kwalcock thanks! I tried to disable those but the test in question still fails (described below)---although the extraction seems fine in the shell.

Here's some more details on the issue:

In this rule, when I add the negative lookbehind at the beginning, this test fails (the yields > 8 t ha-1 part stops being extracted as a mention---note the plural in yields, the value should not be attached to the first yield---that yield variable and the 8 t ha-1 value are too far apart and will get filtered out).

The negative lookbehind there is to make sure this test passes - there should only be a yield increase mention there, not yield amount.

@kwalcock
Copy link
Member

I can trace through the code and try to find out what is happening, but probably not until this afternoon. Thanks for the detailed links to specific lines of code in specific commits. It's greatly appreciated.

@maxaalexeeva
Copy link
Contributor

@kwalcock thank you! It might be clear from the commits, but this is the name of the branch context-based-eval

@kwalcock
Copy link
Member

I haven't been able to find the problem yet. The sentence is

"The highest yield ( 9.3 t ha-1 ) is obtained by Brodt et al. ( 2014 ) in California with only 170 kg N ha-1 ; followed by Xu et al. ( 2020 ) and Zhang et al. ( 2021 ) , both with yields > 8 t ha-1 in Hubei Province ( China ) .",

With the simplified rule

    @variable:Yield [!lemma = /^(increase|decrease)$/]+? @value:Quantity+ ([]*? (","|"and")? @value:Quantity)* (?! [word="more"])

two mentions are found.
When the rule starts with "highest"

    [word = /(highest)/] @variable:Yield [!lemma = /^(increase|decrease)$/]+? @value:Quantity+ ([]*? (","|"and")? @value:Quantity)* (?! [word="more"])

just the first is found.
When the rule starts with "with"

    [word = /(with)/] @variable:Yield [!lemma = /^(increase|decrease)$/]+? @value:Quantity+ ([]*? (","|"and")? @value:Quantity)* (?! [word="more"])

the second is found. When it can start with either,

    [word = /(highest|with)/] @variable:Yield [!lemma = /^(increase|decrease)$/]+? @value:Quantity+ ([]*? (","|"and")? @value:Quantity)* (?! [word="more"])

only the first is found again instead of both.

@maxaalexeeva
Copy link
Contributor

@kwalcock hm... I experimented with adding things before yield (neg look behind, token with lemma not equal increase/decrease, and just any token) and all of those eliminated the second extraction. I thought it had to be something to do with some overlap I create by adding another token to the mention, but a) I can't find where it matters and b) negative look behind doesn't really add a token ..

@kwalcock
Copy link
Member

I haven't been able to find the problem, either. I don't think it's anything in the actions. I've done things like replace most words with "fill" and change the sentence length. The lookbehind at the beginning seems to work if that ([]*? (","|"and")? @value:Quantity)* is removed, but having both is a problem. It seems like this would require some backtracking, maybe multiple levels of it, and maybe something goes wrong. I'd like to hear if Marco has an opinion before looking further. It seems like a good knowledge of the inner workings of Odin is necessary.

@maxaalexeeva
Copy link
Contributor

@kwalcock, ah, those (replace most words with "fill" and change the sentence length) are good tests. Shall we tag Marco? Or shall I email?

@kwalcock
Copy link
Member

kwalcock commented Jul 31, 2022

@marcovzla, with all your knowledge of the inner workings of Odin, do you have a guess as to why this might not be working as intended? I edited the test file (TestVariableReaderLabelBased.scala) in the branch kwalcock/debugRule so that only the failing test runs, in case that helps. I'm about desperate enough to try to hot wire org.clulab.odin.impl.Result (or something similar) to include a record of how matches are found so that it can be inspected afterwards. I have a feeling you can point out something to which we'll have to say, "Well, duh, that should have been obvious."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants