context-related issues #142

maxaalexeeva · 2022-06-15T05:04:54Z

These are two issues related to each other.

we have a rudimentary process "classifier" here which is right now based on string match, which does not account for conflicting key terms within one sentence (but this has also not been crucial/used by anyone in a while). Any suggestions on how to improve that?
We are trying to extract planting areas (PlantingArea). Planting areas are not always explicitly marked (they are just referred to as areas). Other types of areas, on the other hand, frequently have indicators, e.g., irrigation and weed areas below.

- Four scores , corresponding to four classes of surface area coverage by weeds , were distinguished : 0 for none ; 1 for weak ( less than 10 % of surface area coverage by weeds ) ; 2 for strong ( between 10 and 30 % ) ; 3 for very strong ( more than 30 % of surface area coverage by weeds ) .
- This type of irrigation scheme ( or perimeter ) , with an area of below 50 ha and cultivated by farmers from a single village , covers about 25 % of the irrigated area on the two banks of the Senegal River ( SAED , 1997 ; SONADER , 1998 ) .

One thing we could try is to extract any available areas (with label AreaAssignment or smth like that) and distinguish between them using the context attachment. Thoughts?

make a separate class for processing lemmas as recommended by @kwalcock:

  case class ProcessToLemmas(process: String, lemmas: Set[String]) {
    def this(processAndLemmas: (String, Set[String])) = this(processAndLemmas._1, processAndLemmas._2)
  }

  // These are prioritized highest to lowest because there can be multiple matches.
  val processToLemmasMap = Seq(
    "planting"         -> Set("plant", "sow", "cultivate", "cultivation", "grow"),
    "harvest"          -> Set("harvest", "yield"),
    "credit"           -> Set("credit", "finance", "value"),
    "irrigation"       -> Set("irrigation", "irrigate"),
    "weeds"            -> Set("weed"),
    "natural_disaster" -> Set("flood", "bird", "attack")
  ).map(new ProcessToLemmas(_)) // just for convenience

  def getProcess(mention: Mention): String = {
    val lemmas = mention.sentenceObj.lemmas.get
    val process = processToLemmasMap
        .find { processToLemmas =>
          lemmas.exists(processToLemmas.lemmas)
        }
        .map(_.process)
        .getOrElse("UNK")

    process
  }

The text was updated successfully, but these errors were encountered:

maxaalexeeva · 2022-06-15T05:05:36Z

@IkeKobby fyi

MihaiSurdeanu · 2022-06-15T05:51:56Z

Why do we need the process classifier again?
On planting areas: to handle the ambiguous phrases, we could extract generic "area" in the grammar, and then have an action that checks if some meaningful keywords appear anywhere in the sentence, e.g., "sow*", "cultivate*", etc.

maxaalexeeva · 2022-06-15T06:00:12Z

@MihaiSurdeanu Allegra needed it at some point to distinguish between events/sentences related to harvesting, planting, etc. Not sure how crucial it is now.

On planting areas: to handle the ambiguous phrases, we could extract generic "area" in the grammar, and then have an action that checks if some meaningful keywords appear anywhere in the sentence, e.g., "sow*", "cultivate*", etc.'

ok

kwalcock · 2022-06-15T15:55:18Z

I can imagine wanting some other algorithm entirely, but for getProcess() it should be easy enough to count how many matches there are for each key and use the key of the maximum value. The result would be dependent on the ordering only in the case of ties.

import scala.collection.immutable.ListMap

  val processToLemmas = ListMap(
    "planting" -> Seq("plant", "sow", "cover", "cultivate", "grow"),
    "harvesting" -> Seq("harvest", "yield"),
    "credit" -> Seq("credit", "finance", "value", "correspond"),
    "natural_disaster" -> Seq("flood", "bird", "attack")
  )

  def getProcess(mention: Mention): String = {
    val lemmas = mention.sentenceObj.lemmas.get
    val process = processToLemmas
        .mapValues(_.count(lemmas.contains))
        .maxBy(_._2)
        ._1

    process
  }

maxaalexeeva · 2022-06-15T18:32:34Z

@kwalcock I like that! Thank you!

kwalcock · 2022-07-18T23:20:09Z

There are two versions of getProcess above. I guess the algorithm is in flux. In the top one there is a helper class that I decided was not necessary and I also forgot about ListMap. The code could look more like the bottom one.

import scala.collection.immutable.ListMap

  // These are prioritized highest to lowest.
  val processToLemmas = ListMap(
    "planting"         -> Set("plant", "sow", "cultivate", "cultivation", "grow"),
    "harvesting"       -> Set("harvest", "yield"),
    "credit"           -> Set("credit", "finance", "value"),
    "irrigation"       -> Set("irrigation", "irrigate"),
    "weeds"            -> Set("weed"),
    "natural_disaster" -> Set("flood", "bird", "attack")
  )

  def getProcess(mention: Mention): String = {
    val sentenceLemmas = mention.sentenceObj.lemmas.get
    val process = processToLemmas
        .find { case (_, processLemmas) =>
          sentenceLemmas.exists(processLemmas)
        }
        .map { case (process, _) => process }
        .getOrElse("UNK")

    process
  }

maxaalexeeva · 2022-07-19T03:51:35Z

@kwalcock thanks!

maxaalexeeva assigned MihaiSurdeanu and kwalcock Jun 15, 2022

maxaalexeeva mentioned this issue Jul 18, 2022

Fixing var reader issues #154

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context-related issues #142

context-related issues #142

maxaalexeeva commented Jun 15, 2022 •

edited

maxaalexeeva commented Jun 15, 2022

MihaiSurdeanu commented Jun 15, 2022

maxaalexeeva commented Jun 15, 2022 •

edited

kwalcock commented Jun 15, 2022

maxaalexeeva commented Jun 15, 2022

kwalcock commented Jul 18, 2022

maxaalexeeva commented Jul 19, 2022

context-related issues #142

context-related issues #142

Comments

maxaalexeeva commented Jun 15, 2022 • edited

maxaalexeeva commented Jun 15, 2022

MihaiSurdeanu commented Jun 15, 2022

maxaalexeeva commented Jun 15, 2022 • edited

kwalcock commented Jun 15, 2022

maxaalexeeva commented Jun 15, 2022

kwalcock commented Jul 18, 2022

maxaalexeeva commented Jul 19, 2022

maxaalexeeva commented Jun 15, 2022 •

edited

maxaalexeeva commented Jun 15, 2022 •

edited