Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balaur tests need to pass before it can be merged into master. #767

Open
kwalcock opened this issue Dec 12, 2023 · 7 comments
Open

Balaur tests need to pass before it can be merged into master. #767

kwalcock opened this issue Dec 12, 2023 · 7 comments

Comments

@kwalcock
Copy link
Member

See also #766. The tests are...

@kwalcock
Copy link
Member Author

  • TestUniversalEnhancedDependencies
    • should parse some basic sentences correctly
      • doc.sentences.head.universalBasicDependencies.get.hasEdge(2, 0, "nsubj") should be(true)
    • should propagate subjects and objects in conjoined verbs
      • doc.sentences.head.universalEnhancedDependencies.get.hasEdge(6, 0, "nsubj") should be(true)
    • should create xsubj dependencies
      • doc.sentences.head.universalEnhancedDependencies.get.hasEdge(0, 15, "nsubj:xsubj") should be(true)
  • TestNumericEntityRecognition
    • should recognize shared units
      • norms(i) should be(goldNorm)
    • should not recognize preposition in as inch
      • norms(i) should be(goldNorm)
  • TestCluProcessor (later renamed TestProcessor)
    • should parse MWEs correctly
      • doc.sentences.head.universalEnhancedDependencies.get.hasEdge(1, 5, "advmod_due_to") should be (true)

@MihaiSurdeanu
Copy link
Contributor

Thanks! Which branch did you use for this?

@kwalcock
Copy link
Member Author

That which will result when #766 is merged.

@MihaiSurdeanu
Copy link
Contributor

@kwalcock: I "fixed" all unit tests except this:

TestNumericEntityRecognition
should not recognize preposition in as inch
norms(i) should be(goldNorm)

I remember we added code somewhere to control for using "in" as a unit if it's followed by a location or date (or NNP?). But I can't find it now. Can you please dig into and see where it is? Did I remove it by mistake?

Thank you!

Also, the parser is making a couple of dumb mistakes, which is disappointing. I will look into more tricks to get it to behave better...

@kwalcock
Copy link
Member Author

Does this code look familiar?

removeOneEntityBeforeAnother(doc, "B-LOC", "MEASUREMENT-LENGTH")
}
def removeOneEntityBeforeAnother(doc: Document, triggerEntity: String, toBeRemovedShortened: String): Unit = {
// removes entities and norms for unallowable entity sequences, e.g., don't extract 'in' as 'inch' before B-LOC in '... Sahal 108 in Senegal'
// toBeRemovedShortened is entity without BIO-
for(s <- doc.sentences) {
val zippedEntities = s.entities.get.zipWithIndex
for ((e, i) <- zippedEntities) {
if (i > 0 && e == triggerEntity && s.entities.get(i-1).endsWith(toBeRemovedShortened)) {
s.entities.get(i - 1) = "O"
// go in reverse replacing indices and norms in the immediate preceding mention
breakable {
for ((en, j) <- zippedEntities.slice(0, i ).reverse) {
if (en.endsWith(toBeRemovedShortened)) {
s.entities.get(j) = "O"
s.norms.get(j) = ""
} else break()
}
}
}
}
}
}

The Senegal test,

ensure(sentence = "released as Sahel 108 in Senegal in 1994", Interval(3,5), goldEntity="O", goldNorm="")

is passing, though, and it's

ensure(sentence = "92% grew Sahel 108 in 2012DS", Interval(3,5), goldEntity="O", goldNorm="")

that is not.

The tests and some code were added around 2b935c6 and 66ee8a4.

@MihaiSurdeanu
Copy link
Contributor

It is not familiar but it does look like code I would write :)
Thanks @kwalcock ! I will take it from here.

@kwalcock
Copy link
Member Author

I think Masha wrote that. If it turns out not to be the right place, I can look further, of course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants