Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variable reader tests #117

Open
IkeKobby opened this issue Apr 26, 2022 · 21 comments
Open

variable reader tests #117

IkeKobby opened this issue Apr 26, 2022 · 21 comments

Comments

@IkeKobby
Copy link
Collaborator

IkeKobby commented Apr 26, 2022

Hi @maxaalexeeva , @MihaiSurdeanu here are the links to the variables tests files TestVariableReaderLabelBased and TestVariableReaderEntities.

@MihaiSurdeanu
Copy link
Contributor

Thanks @IkeKobby !

@IkeKobby
Copy link
Collaborator Author

You're welcome @MihaiSurdeanu

@MihaiSurdeanu
Copy link
Contributor

A couple of comments:

  • I think the TestVariableReaderEntities should be reorganized similarly to the other one.
  • Can you please count: a) total number of tests to be written based on Vijaya's annotations, and (b) time to completion based on a realistic pace for test writing.
    Thanks!

@IkeKobby
Copy link
Collaborator Author

  • Okay, concerning the TestVariableReaderEntities, I will check with @maxaalexeeva the best way to do that.
  • Also, about the second comment,I will give you feedback on it. There were 7 papers and two have been worked on so far.

@IkeKobby
Copy link
Collaborator Author

A couple of comments:

  • I think the TestVariableReaderEntities should be reorganized similarly to the other one.
  • Can you please count: a) total number of tests to be written based on Vijaya's annotations, and (b) time to completion based on a realistic pace for test writing. Response: Hi @MihaiSurdeanu, apparently, there are 49 sentences(from 8 papers) according to the annotations from VIjaya, however it is subject to change, maybe more or less due to the differences in sentence length(some are overly long and some too title of tables).
    Thanks!

@IkeKobby
Copy link
Collaborator Author

Hi @MihaiSurdeanu, @maxaalexeeva please, TestVariableReaderLabelBased is updated.

  • I have shared the entire sentences and also some specific ones I have some issues with. I am hoping to meet with @maxaalexeeva to discuss further into it.

@maxaalexeeva
Copy link
Contributor

@IkeKobby @MihaiSurdeanu, I have two main comments based on this set of event tests:

  1. I think we should get away from Entities with a label "Variable" (especially since we decided to mainly test the variable argument based on the label). E.g., in the test below, the label Variable is not very informative about what is extracted as the variable argument; the label for seeding could be something like "Planting" instead (would need a rule to extract words /plant|seed|sow/ as entities with label Planting).
    VariableTest(
      "sent15", "Seeding dates ranged from 22 August to 26 September in 2011WS, from 29 February to 1 April in the 2012DS, and from 5 to 23 March in the 2013DS",
      Seq(
        ("Variable", Seq(("from 29 February to 1 April", "XXXX-02-29 -- XXXX-04-01"))),
        ("Variable", Seq(("from 5 to 23 March", "XXXX-03-05 -- XXXX-03-23"))),
        ("Variable", Seq(("from 22 August to 26 September", "XXXX-08-22 -- XXXX-09-26")))
      )
    ),
  1. Do we want to keep extracting cultivars as events?
 VariableTest(
      "sent21", "seeds of the rice variety Sahel 108 are sown, a short- cycle variety (around 125 days) ",
      Seq(
        ("Variable", Seq(("Sahel 108", ""))),
      )
    )

I see two options for handling these:

  1. have them added to the lexicon if they are not there yet and just extract them as entities (label Crop) to be used as variable args in Assignment events
  2. for those that are not in the lexicon, we extract them as an assignment event (hopefully there is some good syntax in the sentence between the word /variety|cultivar|crop/ and the name of it, e.g., Sahel 108), but change them into a Crop TextBoundMention using an action applied to the rule (+ maybe a filter to get rid of duplicates in case some cultivar was extracted both as a lexicon entity and with a rule+action).

@MihaiSurdeanu
Copy link
Contributor

  • I think we need to move from the generic labels we used so far for both entities (Variable, Value) and events (Assignment) to labels that capture the semantics of what we're extracting. This way we impose better type constraints than what we can do now. You had more examples in the google doc.
  • I agree with your take on cultivars.

@IkeKobby
Copy link
Collaborator Author

IkeKobby commented May 1, 2022

Dear @maxaalexeeva, please kindly have a look at this

@maxaalexeeva
Copy link
Contributor

@IkeKobby if I understand correctly, you only allow one mention of each type in test. I wonder if it would be better to have a structure like this:

val desired = Map(
      Label1 -> Seq(Value1, Value 2), // e.g., here, this could be two values with a label crop
      Label2 -> Seq(Value3), // and this one could be one Yield mention 
      Label3 -> Seq(Value4, Value5, Value6) // and three fertilizer (Imaging this is a veeeery long sentence :)  ) 
      )

I think I have seen this structure in some other projects. And you could go through key, value pairs in the map and check if they are present in extracted mentions (filter extracted mentions by the label, check size (should match the length of the sequence with values for that label) and then check the overlap in text between extracted and desired values. There could be issues with this idea, but I have not tried it.

@kwalcock
Copy link
Member

kwalcock commented May 1, 2022

BTW the class and file names (entitiesTest -> EntitiesTest) need to be uppercase. Git with Windows with case issues leads to problems :-( so we have to be extra careful about it.

@IkeKobby
Copy link
Collaborator Author

IkeKobby commented May 1, 2022

Okay @maxaalexeeva. Please I think we should discuss this on call tomorrow if that is okay?

@IkeKobby
Copy link
Collaborator Author

IkeKobby commented May 1, 2022

Okay @kwalcock. I will do that please.

@kwalcock
Copy link
Member

kwalcock commented May 1, 2022

I'm not quite sure what this means, but Masha can almost surely help with it if you two talk.

        // I think I would need a better way to map ONLY extracted values. For now, here is what I am able to do.
        // I do not know how to use "append" like in python, I could have append ONLY extracted
        // entities to `targetMentionsLabelTexts` from the `for loop above` and check for their existence in the last line.

@IkeKobby
Copy link
Collaborator Author

IkeKobby commented May 1, 2022

I'm not quite sure what this means, but Masha can almost surely help with it if you two talk.

        // I think I would need a better way to map ONLY extracted values. For now, here is what I am able to do.
        // I do not know how to use "append" like in python, I could have append ONLY extracted
        // entities to `targetMentionsLabelTexts` from the `for loop above` and check for their existence in the last line.

Okay sure.

@maxaalexeeva
Copy link
Contributor

@IkeKobby yes, I messaged you about my availability, and @kwalcock yes, I can help.

@maxaalexeeva
Copy link
Contributor

@MihaiSurdeanu before I (or someone else) go changing rules to stop using generic Assignment label, do these look like reasonable labels/todos to you for the old set of tests (starting line 64)? 86c1fee

@MihaiSurdeanu
Copy link
Contributor

  • I think SowingDate and PlantingDate are the same thing, so they should be merged.
  • I propose to no longer use the Assignment label, because it is too ambiguous, and we can't add semantic constraints on argument types. For example, we could use CropAssignment for events that pick up crops.

@IkeKobby
Copy link
Collaborator Author

IkeKobby commented May 2, 2022 via email

@maxaalexeeva
Copy link
Contributor

@MihaiSurdeanu @IkeKobby I will adjust these.

@IkeKobby
Copy link
Collaborator Author

IkeKobby commented May 2, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants