New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
context-related issues #142
Comments
@IkeKobby fyi |
|
@MihaiSurdeanu Allegra needed it at some point to distinguish between events/sentences related to harvesting, planting, etc. Not sure how crucial it is now.
ok |
I can imagine wanting some other algorithm entirely, but for getProcess() it should be easy enough to count how many matches there are for each key and use the key of the maximum value. The result would be dependent on the ordering only in the case of ties. import scala.collection.immutable.ListMap
val processToLemmas = ListMap(
"planting" -> Seq("plant", "sow", "cover", "cultivate", "grow"),
"harvesting" -> Seq("harvest", "yield"),
"credit" -> Seq("credit", "finance", "value", "correspond"),
"natural_disaster" -> Seq("flood", "bird", "attack")
)
def getProcess(mention: Mention): String = {
val lemmas = mention.sentenceObj.lemmas.get
val process = processToLemmas
.mapValues(_.count(lemmas.contains))
.maxBy(_._2)
._1
process
} |
@kwalcock I like that! Thank you! |
There are two versions of getProcess above. I guess the algorithm is in flux. In the top one there is a helper class that I decided was not necessary and I also forgot about ListMap. The code could look more like the bottom one. import scala.collection.immutable.ListMap
// These are prioritized highest to lowest.
val processToLemmas = ListMap(
"planting" -> Set("plant", "sow", "cultivate", "cultivation", "grow"),
"harvesting" -> Set("harvest", "yield"),
"credit" -> Set("credit", "finance", "value"),
"irrigation" -> Set("irrigation", "irrigate"),
"weeds" -> Set("weed"),
"natural_disaster" -> Set("flood", "bird", "attack")
)
def getProcess(mention: Mention): String = {
val sentenceLemmas = mention.sentenceObj.lemmas.get
val process = processToLemmas
.find { case (_, processLemmas) =>
sentenceLemmas.exists(processLemmas)
}
.map { case (process, _) => process }
.getOrElse("UNK")
process
} |
@kwalcock thanks! |
These are two issues related to each other.
we have a rudimentary process "classifier" here which is right now based on string match, which does not account for conflicting key terms within one sentence (but this has also not been crucial/used by anyone in a while). Any suggestions on how to improve that?
We are trying to extract planting areas (
PlantingArea
). Planting areas are not always explicitly marked (they are just referred to asarea
s). Other types of areas, on the other hand, frequently have indicators, e.g., irrigation and weed areas below.One thing we could try is to extract any available areas (with label
AreaAssignment
or smth like that) and distinguish between them using the context attachment. Thoughts?The text was updated successfully, but these errors were encountered: