TestLemmatizer failing with BalaurProcessor #741

MihaiSurdeanu · 2023-08-23T20:51:25Z

In the balaur branch, TestLemmatizer fails as follows:

sbt 'testOnly org.clulab.processors.TestLemmatizer'
...
[info] - should not crash when processing this weird file *** FAILED ***
[info]   java.lang.ArrayIndexOutOfBoundsException: -2
[info]   at org.clulab.struct.DirectedGraph$.$anonfun$mkOutgoing$1(DirectedGraph.scala:241)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at org.clulab.struct.DirectedGraph$.mkOutgoing(DirectedGraph.scala:239)
[info]   at org.clulab.struct.DirectedGraph.<init>(DirectedGraph.scala:34)
[info]   at org.clulab.processors.clu.BalaurProcessor.assignDependencyLabels(BalaurProcessor.scala:276)
[info]   at org.clulab.processors.clu.BalaurProcessor.$anonfun$annotate$1(BalaurProcessor.scala:139)
[info]   at org.clulab.processors.clu.BalaurProcessor.$anonfun$annotate$1$adapted(BalaurProcessor.scala:131)
[info]   at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
[info]   at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
[info]   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
[info]   ...
[info] Run completed in 2 minutes, 56 seconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***
[error] Failed tests:
[error]         org.clulab.processors.TestLemmatizer

@kwalcock: can you please look into this? Thank you!

The text was updated successfully, but these errors were encountered:

MihaiSurdeanu · 2023-08-23T20:58:07Z

Most likely unrelated, this also fails:

sbt 'corenlp/testOnly org.clulab.processors.TestParallel'
...
[info] - should match processing documents serially *** FAILED ***
[info]   java.util.NoSuchElementException: None.get
[info]   at scala.None$.get(Option.scala:529)
[info]   at scala.None$.get(Option.scala:527)
[info]   at org.clulab.utils.ToEnhancedDependencies$.$anonfun$collapsePrepositionsUniversalDueTo$1(ToEnhancedDependencies.scala:195)
[info]   at org.clulab.utils.ToEnhancedDependencies$.$anonfun$collapsePrepositionsUniversalDueTo$1$adapted(ToEnhancedDependencies.scala:194)
[info]   at scala.collection.immutable.List.foreach(List.scala:431)
[info]   at scala.collection.generic.TraversableForwarder.foreach(TraversableForwarder.scala:38)
[info]   at scala.collection.generic.TraversableForwarder.foreach$(TraversableForwarder.scala:38)
[info]   at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:47)
[info]   at org.clulab.utils.ToEnhancedDependencies$.collapsePrepositionsUniversalDueTo(ToEnhancedDependencies.scala:194)
[info]   at org.clulab.utils.ToEnhancedDependencies$.collapsePrepositionsUniversal(ToEnhancedDependencies.scala:145)
[info]   ...
[info] Run completed in 5 seconds, 968 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 1, canceled 0, ignored 0, pending 0
[info] *** 1 TEST FAILED ***
[error] Failed tests:
[error]         org.clulab.processors.TestParallel

kwalcock · 2023-08-24T00:45:50Z

@MihaiSurdeanu, the processors branch balaur is not compiling for me. Is there a commit that hasn't made it to github? The latest there is 48a754 and it looks incomplete.

MihaiSurdeanu · 2023-08-24T04:00:42Z

Sorry! Yes, the headAndLabels branch in scala-transformers must be published locally first.

kwalcock · 2023-08-24T06:59:48Z

I had done that, but the problem seems to be in the processors code :-(

MihaiSurdeanu · 2023-08-24T15:53:58Z

Hmm. Then I'm not sure. Can you please paste in the error?

kwalcock · 2023-08-24T18:32:55Z

I've seen this on two separate computers now, so I don't think it's a fluke. The problem might be that there are two Eisner files and they are getting mixed up.

[error] C:\Users\kwa\MyData\Projects\clulab\processors-project\processors\main\src\main\scala\org\clulab\processors\clu\CluProcessor.scala:452:34: value ensembleParser is not a member of org.clulab.processors.clu.Eisner
[error]     val headsWithLabels = eisner.ensembleParser(
[error]
[warn] C:\Users\kwa\MyData\Projects\clulab\processors-project\processors\main\src\main\scala\org\clulab\processors\clu\CluProcessor.scala:19:93: imported `Eisner` is permanently hidden by definition of object Eisner in package clu
[warn] import org.clulab.dynet.{AnnotatedSentence, ConstEmbeddingParameters, ConstEmbeddingsGlove, Eisner, Metal, ModifierHeadPair}
[warn]                                                                                             ^
[warn] C:\Users\kwa\MyData\Projects\clulab\processors-project\processors\main\src\main\scala\org\clulab\processors\clu\CluProcessor.scala:4:93: imported `Eisner` is permanently hidden by definition of object Eisner in package clu
[warn] import org.clulab.dynet.{AnnotatedSentence, ConstEmbeddingParameters, ConstEmbeddingsGlove, Eisner, Metal, ModifierHeadPair, Utils}
[warn]                                                                                             ^
[warn] two warnings found

MihaiSurdeanu · 2023-08-25T03:01:09Z

Interesting. I don't see this error on my Mac or a linux box.
In any case, I renamed one of the classes to avoid the conflict.
Can you please pull the branch and try again?
Thanks!

kwalcock · 2023-08-25T17:25:19Z

On the first one, there is an edge from -2 to 83. In EisnerEnsembleParser.generateOutput there is a case where Eisner fails, so there is a reversion to the greedy inference. A bestDep of -47 is found when i = 45. head ends up being -2. The programming here probably needs to be more defensive. Let me know if I should mess with the code.

kwalcock · 2023-08-25T17:36:37Z

On the second problem, it looks like ToEnhancedDependencies.collapsePrepositionsUniversalDueTo is failing because it needs lemmas and lemmatize has apparently not yet been run on the document.

MihaiSurdeanu · 2023-08-25T19:12:31Z

Please go ahead and "mess with the code" :)
The greedy inference is supposed to choose only valid heads that are either -1 (root of the tree) or between 0 and sentence length:

https://github.com/clulab/processors/blob/balaur/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala#L256

kwalcock · 2023-08-25T20:46:25Z

OK. That line 25 is just before depHead - 1, which results in the -2.

kwalcock · 2023-08-25T23:30:48Z

I'm trying to understand the nearby code. In PostProcessor.parserPostProcessing it looks like you are trying to account for "due to" and wanting to change the headsWithLabels(i + 1) to (i, "mwe"), but only if that's not already the case. If so, the condition at

processors/main/src/main/scala/org/clulab/processors/clu/PostProcessor.scala

Lines 42 to 43 in 23436a4

    
           headsWithLabels(i + 1)._1 != i && 
        
           headsWithLabels(i + 1)._2 != "mwe") {

might need to have an || instead:

(headsWithLabels(i + 1)._1 != i || headsWithLabels(i + 1)._2 != "mwe")

MihaiSurdeanu · 2023-08-26T00:05:00Z

I think you're right. Nice catch!
Of course, we need parens around the disjunction.

kwalcock · 2023-08-28T23:17:57Z

The code is complicated enough to be hiding problems. For example, the value

processors/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala

Line 291 in 23436a4

val relHead = sortedHeads(j).toInt

is already an int not needing conversion and the exception handler

processors/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala

Line 304 in 23436a4

    
           case _: NumberFormatException => // Ok to skip these, since the parser may predict <START> and <STOP> labels

should probably be guarding the line all the way over here:

processors/main/src/main/scala/org/clulab/processors/clu/BalaurProcessor.scala

Line 233 in 23436a4

val relHead = headLabels(i)(j)._1.toInt

kwalcock · 2023-08-28T23:31:32Z

@MihaiSurdeanu, when BalaurProcessor.assignDependencyLabels gets called, are the original headLabels coming out of the neural network using absolute or relative values? It looks like relative because of

processors/main/src/main/scala/org/clulab/processors/clu/BalaurProcessor.scala

Line 233 in 23436a4

val relHead = headLabels(i)(j)._1.toInt

but then absolute because of

processors/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala

Line 234 in 23436a4

if (dep.head == 0) 0 else dep.head - dep.mod

MihaiSurdeanu · 2023-08-29T00:49:48Z

Yes, the ML predicts relative head positions (the classifier generalizes much better with relative positions). The code in the processor then converts them to absolute values, and then construct a DependencyGraph.

MihaiSurdeanu · 2023-08-29T00:50:22Z

The code is complicated enough to be hiding problems. For example, the value

processors/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala

Line 291 in 23436a4

val relHead = sortedHeads(j).toInt

is already an int not needing conversion and the exception handler

processors/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala

Line 304 in 23436a4

case _: NumberFormatException => // Ok to skip these, since the parser may predict <START> and <STOP> labels

should probably be guarding the line all the way over here:

processors/main/src/main/scala/org/clulab/processors/clu/BalaurProcessor.scala

Line 233 in 23436a4

val relHead = headLabels(i)(j)._1.toInt

I think you're right. Thanks for catching it!

kwalcock · 2023-08-29T01:22:20Z

@MihaiSurdeanu, the model 0.0.4 should be available now. Next time I will be daring and number it 0.1.0!

MihaiSurdeanu · 2023-08-29T01:42:38Z

Thank you!

kwalcock · 2023-08-29T05:59:45Z

I hope to ask tomorrow about some of this code, because there are too many words (for me) for different stages of the same numbers. Here's a code question, though. In

processors/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala

Lines 294 to 295 in 23436a4

    
           val head = if (relHead == 0) 0 else mod + relHead // +1 offset from mod propagates in the head here 
        
           if (head >= 0 && head < length) { // we may predict a head outside of sentence boundaries

relHead being 0 has special meaning, because something can't be its own head. However, mod + relHead can also be zero, I think, and that doesn't mean quite the same thing: mod + relHead == 0 => i + 1 + relHead == 0 => i + relHead == -1 => absHead == -1 which is invalid. It might be OK to say that in that case it goes to root just like an actual relHead of 0 would. However, then if mod + relHead == -1 => absHead == -2, the same going to root option is not taken because the condition in the second line is not satisfied.

If that is the case, I have an urge to rearrange some things so as not to get into that situation.

MihaiSurdeanu · 2023-08-29T16:34:44Z

Let's discuss in our meeting today!

kwalcock · 2023-08-29T23:16:29Z

Yet another question... AFAICT, the data coming into assignDependencyLabels for the heads and labels have heads all originating from the tasks.3.labels file. These are 160 values ranging from -127 to +77 with some gaps. There are no duplicates. That seems like a constant. I don't think there should be any issue converting any of the heads to integers so that the exception handling is not necessary. If the values are distinct, then code like

processors/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala

Line 296 in 23436a4

    
           if(dependencies(mod)(head) == null || dependencies(mod)(head).score < score) { // there might be a dependency already with a higher score

would not seem necessary, either. For this particular task it looks like the scores are already delivered sorted so that

processors/main/src/main/scala/org/clulab/processors/clu/EisnerEnsembleParser.scala

Line 283 in 23436a4

    
           val sortedPreds = scores(i).sortBy(- _._3) // sort in descending order of scores

is similarly unnecessary. I'll see if it works without these things.

MihaiSurdeanu · 2023-08-29T23:54:20Z

The older classifier based on dynet could generate some strings labels. But this is no longer true with transformers. So that exception is no longer needed.
You're probably correct on the sortBy as well. At some point I wasn't sorting in scala_transformers. But now we do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TestLemmatizer failing with BalaurProcessor #741

TestLemmatizer failing with BalaurProcessor #741

MihaiSurdeanu commented Aug 23, 2023

MihaiSurdeanu commented Aug 23, 2023

kwalcock commented Aug 24, 2023

MihaiSurdeanu commented Aug 24, 2023

kwalcock commented Aug 24, 2023

MihaiSurdeanu commented Aug 24, 2023

kwalcock commented Aug 24, 2023

MihaiSurdeanu commented Aug 25, 2023

kwalcock commented Aug 25, 2023

kwalcock commented Aug 25, 2023

MihaiSurdeanu commented Aug 25, 2023

kwalcock commented Aug 25, 2023

kwalcock commented Aug 25, 2023

MihaiSurdeanu commented Aug 26, 2023

kwalcock commented Aug 28, 2023

kwalcock commented Aug 28, 2023

MihaiSurdeanu commented Aug 29, 2023

MihaiSurdeanu commented Aug 29, 2023

kwalcock commented Aug 29, 2023

MihaiSurdeanu commented Aug 29, 2023

kwalcock commented Aug 29, 2023

MihaiSurdeanu commented Aug 29, 2023

kwalcock commented Aug 29, 2023

MihaiSurdeanu commented Aug 29, 2023

TestLemmatizer failing with BalaurProcessor #741

TestLemmatizer failing with BalaurProcessor #741

Comments

MihaiSurdeanu commented Aug 23, 2023

MihaiSurdeanu commented Aug 23, 2023

kwalcock commented Aug 24, 2023

MihaiSurdeanu commented Aug 24, 2023

kwalcock commented Aug 24, 2023

MihaiSurdeanu commented Aug 24, 2023

kwalcock commented Aug 24, 2023

MihaiSurdeanu commented Aug 25, 2023

kwalcock commented Aug 25, 2023

kwalcock commented Aug 25, 2023

MihaiSurdeanu commented Aug 25, 2023

kwalcock commented Aug 25, 2023

kwalcock commented Aug 25, 2023

MihaiSurdeanu commented Aug 26, 2023

kwalcock commented Aug 28, 2023

kwalcock commented Aug 28, 2023

MihaiSurdeanu commented Aug 29, 2023

MihaiSurdeanu commented Aug 29, 2023

kwalcock commented Aug 29, 2023

MihaiSurdeanu commented Aug 29, 2023

kwalcock commented Aug 29, 2023

MihaiSurdeanu commented Aug 29, 2023

kwalcock commented Aug 29, 2023

MihaiSurdeanu commented Aug 29, 2023