variable reader tests #117

IkeKobby · 2022-04-26T20:28:20Z

Hi @maxaalexeeva , @MihaiSurdeanu here are the links to the variables tests files TestVariableReaderLabelBased and TestVariableReaderEntities.

MihaiSurdeanu · 2022-04-26T20:58:49Z

Thanks @IkeKobby !

IkeKobby · 2022-04-26T21:04:23Z

You're welcome @MihaiSurdeanu

MihaiSurdeanu · 2022-04-26T21:07:00Z

A couple of comments:

I think the TestVariableReaderEntities should be reorganized similarly to the other one.
Can you please count: a) total number of tests to be written based on Vijaya's annotations, and (b) time to completion based on a realistic pace for test writing.
Thanks!

IkeKobby · 2022-04-26T21:11:35Z

Okay, concerning the TestVariableReaderEntities, I will check with @maxaalexeeva the best way to do that.
Also, about the second comment,I will give you feedback on it. There were 7 papers and two have been worked on so far.

IkeKobby · 2022-04-28T09:08:09Z

A couple of comments:

I think the TestVariableReaderEntities should be reorganized similarly to the other one.

Can you please count: a) total number of tests to be written based on Vijaya's annotations, and (b) time to completion based on a realistic pace for test writing. Response: Hi @MihaiSurdeanu, apparently, there are 49 sentences(from 8 papers) according to the annotations from VIjaya, however it is subject to change, maybe more or less due to the differences in sentence length(some are overly long and some too title of tables).
Thanks!

IkeKobby · 2022-04-29T16:19:13Z

Hi @MihaiSurdeanu, @maxaalexeeva please, TestVariableReaderLabelBased is updated.

I have shared the entire sentences and also some specific ones I have some issues with. I am hoping to meet with @maxaalexeeva to discuss further into it.

maxaalexeeva · 2022-04-29T18:02:04Z

@IkeKobby @MihaiSurdeanu, I have two main comments based on this set of event tests:

I think we should get away from Entities with a label "Variable" (especially since we decided to mainly test the variable argument based on the label). E.g., in the test below, the label Variable is not very informative about what is extracted as the variable argument; the label for seeding could be something like "Planting" instead (would need a rule to extract words /plant|seed|sow/ as entities with label Planting).

    VariableTest(
      "sent15", "Seeding dates ranged from 22 August to 26 September in 2011WS, from 29 February to 1 April in the 2012DS, and from 5 to 23 March in the 2013DS",
      Seq(
        ("Variable", Seq(("from 29 February to 1 April", "XXXX-02-29 -- XXXX-04-01"))),
        ("Variable", Seq(("from 5 to 23 March", "XXXX-03-05 -- XXXX-03-23"))),
        ("Variable", Seq(("from 22 August to 26 September", "XXXX-08-22 -- XXXX-09-26")))
      )
    ),

Do we want to keep extracting cultivars as events?

 VariableTest(
      "sent21", "seeds of the rice variety Sahel 108 are sown, a short- cycle variety (around 125 days) ",
      Seq(
        ("Variable", Seq(("Sahel 108", ""))),
      )
    )

I see two options for handling these:

have them added to the lexicon if they are not there yet and just extract them as entities (label Crop) to be used as variable args in Assignment events
for those that are not in the lexicon, we extract them as an assignment event (hopefully there is some good syntax in the sentence between the word /variety|cultivar|crop/ and the name of it, e.g., Sahel 108), but change them into a Crop TextBoundMention using an action applied to the rule (+ maybe a filter to get rid of duplicates in case some cultivar was extracted both as a lexicon entity and with a rule+action).

MihaiSurdeanu · 2022-04-29T21:42:30Z

I think we need to move from the generic labels we used so far for both entities (Variable, Value) and events (Assignment) to labels that capture the semantics of what we're extracting. This way we impose better type constraints than what we can do now. You had more examples in the google doc.
I agree with your take on cultivars.

IkeKobby · 2022-05-01T13:11:57Z

Dear @maxaalexeeva, please kindly have a look at this

maxaalexeeva · 2022-05-01T19:26:19Z

@IkeKobby if I understand correctly, you only allow one mention of each type in test. I wonder if it would be better to have a structure like this:

val desired = Map(
      Label1 -> Seq(Value1, Value 2), // e.g., here, this could be two values with a label crop
      Label2 -> Seq(Value3), // and this one could be one Yield mention 
      Label3 -> Seq(Value4, Value5, Value6) // and three fertilizer (Imaging this is a veeeery long sentence :)  ) 
      )

I think I have seen this structure in some other projects. And you could go through key, value pairs in the map and check if they are present in extracted mentions (filter extracted mentions by the label, check size (should match the length of the sequence with values for that label) and then check the overlap in text between extracted and desired values. There could be issues with this idea, but I have not tried it.

kwalcock · 2022-05-01T19:43:54Z

BTW the class and file names (entitiesTest -> EntitiesTest) need to be uppercase. Git with Windows with case issues leads to problems :-( so we have to be extra careful about it.

IkeKobby · 2022-05-01T19:45:19Z

Okay @maxaalexeeva. Please I think we should discuss this on call tomorrow if that is okay?

IkeKobby · 2022-05-01T19:45:48Z

Okay @kwalcock. I will do that please.

kwalcock · 2022-05-01T19:53:28Z

I'm not quite sure what this means, but Masha can almost surely help with it if you two talk.

        // I think I would need a better way to map ONLY extracted values. For now, here is what I am able to do.
        // I do not know how to use "append" like in python, I could have append ONLY extracted
        // entities to `targetMentionsLabelTexts` from the `for loop above` and check for their existence in the last line.

IkeKobby · 2022-05-01T19:55:00Z

I'm not quite sure what this means, but Masha can almost surely help with it if you two talk.

        // I think I would need a better way to map ONLY extracted values. For now, here is what I am able to do.
        // I do not know how to use "append" like in python, I could have append ONLY extracted
        // entities to `targetMentionsLabelTexts` from the `for loop above` and check for their existence in the last line.

Okay sure.

maxaalexeeva · 2022-05-02T02:54:01Z

@IkeKobby yes, I messaged you about my availability, and @kwalcock yes, I can help.

maxaalexeeva · 2022-05-02T05:48:28Z

@MihaiSurdeanu before I (or someone else) go changing rules to stop using generic Assignment label, do these look like reasonable labels/todos to you for the old set of tests (starting line 64)? 86c1fee

MihaiSurdeanu · 2022-05-02T17:40:36Z

I think SowingDate and PlantingDate are the same thing, so they should be merged.
I propose to no longer use the Assignment label, because it is too ambiguous, and we can't add semantic constraints on argument types. For example, we could use CropAssignment for events that pick up crops.

IkeKobby · 2022-05-02T18:42:42Z

Okay. Well noted.

maxaalexeeva · 2022-05-02T21:14:57Z

@MihaiSurdeanu @IkeKobby I will adjust these.

IkeKobby · 2022-05-02T21:15:49Z

Okay @maxaalexeeva

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

variable reader tests #117

variable reader tests #117

IkeKobby commented Apr 26, 2022 •

edited

MihaiSurdeanu commented Apr 26, 2022

IkeKobby commented Apr 26, 2022

MihaiSurdeanu commented Apr 26, 2022

IkeKobby commented Apr 26, 2022

IkeKobby commented Apr 28, 2022

IkeKobby commented Apr 29, 2022

maxaalexeeva commented Apr 29, 2022

MihaiSurdeanu commented Apr 29, 2022

IkeKobby commented May 1, 2022

maxaalexeeva commented May 1, 2022

kwalcock commented May 1, 2022

IkeKobby commented May 1, 2022

IkeKobby commented May 1, 2022

kwalcock commented May 1, 2022

IkeKobby commented May 1, 2022

maxaalexeeva commented May 2, 2022

maxaalexeeva commented May 2, 2022

MihaiSurdeanu commented May 2, 2022

IkeKobby commented May 2, 2022 via email •

edited

maxaalexeeva commented May 2, 2022

IkeKobby commented May 2, 2022 via email •

edited

variable reader tests #117

variable reader tests #117

Comments

IkeKobby commented Apr 26, 2022 • edited

MihaiSurdeanu commented Apr 26, 2022

IkeKobby commented Apr 26, 2022

MihaiSurdeanu commented Apr 26, 2022

IkeKobby commented Apr 26, 2022

IkeKobby commented Apr 28, 2022

IkeKobby commented Apr 29, 2022

maxaalexeeva commented Apr 29, 2022

MihaiSurdeanu commented Apr 29, 2022

IkeKobby commented May 1, 2022

maxaalexeeva commented May 1, 2022

kwalcock commented May 1, 2022

IkeKobby commented May 1, 2022

IkeKobby commented May 1, 2022

kwalcock commented May 1, 2022

IkeKobby commented May 1, 2022

maxaalexeeva commented May 2, 2022

maxaalexeeva commented May 2, 2022

MihaiSurdeanu commented May 2, 2022

IkeKobby commented May 2, 2022 via email • edited

maxaalexeeva commented May 2, 2022

IkeKobby commented May 2, 2022 via email • edited

IkeKobby commented Apr 26, 2022 •

edited

IkeKobby commented May 2, 2022 via email •

edited

IkeKobby commented May 2, 2022 via email •

edited