Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It should work with new versions of corenlp #639

Open
kwalcock opened this issue Jun 9, 2022 · 8 comments
Open

It should work with new versions of corenlp #639

kwalcock opened this issue Jun 9, 2022 · 8 comments

Comments

@kwalcock
Copy link
Member

kwalcock commented Jun 9, 2022

It is failing some tests in TestCoreNLPProcessor and TestFastNLPProcessor. Details will be pasted below.

@kwalcock
Copy link
Member Author

kwalcock commented Jun 9, 2022

Below are tests for TestCoreNLPProcessor. Search for "*** FAILED ***"

sbt:root> testOnly org.clulab.processors.TestCoreNLPProcessor
[info] compiling 1 Scala source to D:\Users\kwa\Documents\MyData\Projects\clulab\processors-project\processors\main\target\scala-2.12\test-classes ...
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for Test / testOnly
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for main / Test / testOnly
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for openie / Test / testOnly
Picked up _JAVA_OPTIONS: -Xmx20g -Dfile.encoding=UTF-8
[info] TestCoreNLPProcessor:
[info] CoreNLPProcessor
12:33:57.270 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
12:33:59.829 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [2.5 sec].
12:34:00.002 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
12:34:00.298 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
12:34:00.418 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - maxAdditionalKnownLCWords=0
12:34:04.495 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [4.0 sec].
12:34:06.263 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.8 sec].
12:34:07.929 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.7 sec].
12:34:07.939 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
12:34:08.963 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
12:34:11.493 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] DEBUG edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor - Ignoring inactive rule: null
12:34:11.494 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] DEBUG edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor - Ignoring inactive rule: temporal-composite-8:ranges
12:34:17.291 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.
12:34:17.329 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4867 unique entries out of 4867 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.
12:34:17.330 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585572 unique entries from 2 files
12:34:24.515 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - numeric classifiers: true; SUTime: true [no docDate]; fine grained: true
12:34:26.829 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.parser.lexparser.LexicalizedParser - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [1.6 sec].
12:34:28.307 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - featureCountThresh=10
12:34:28.307 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - macro=true
12:34:28.307 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - featureFactory=org.clulab.processors.corenlp.chunker.ChunkingFeatureFactory
12:34:31.656 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog
12:34:31.753 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator openie
12:34:31.872 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.naturalli.ClauseSplitter - Loading clause splitter from edu/stanford/nlp/models/naturalli/clauseSearcherModel.ser.gz ... done [0.094 seconds]
12:34:39.278 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator dcoref
12:34:49.331 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.pipeline.CorefMentionAnnotator - Using mention detector type: dependency
[info] - should extract relations correctly with OpenIE
[info] CoreNLPProcessor
[info] - should tokenize raw text correctly
[info] - should tokenize a list of sentences correctly
[info] - should tokenize sequences of tokens correctly
[info] - should POS tag correctly *** FAILED ***
[info]   "[IN]" was not equal to "[TO]" (TestCoreNLPProcessor.scala:126)
[info] - should lemmatize correctly
Universal dependencies for the sentence "John Doe went to China":
roots: 2
outgoing:renlp / Test / testOnly 58s
        0:
        1: (0,compound)
        2: (1,nsubj) (4,obl)
        3:
        4: (3,case)
incoming:
        0: (1,compound)
        1: (2,nsubj)
        2:
        3: (4,case)
        4: (2,obl)

[info] - should run the constituent parser correctly *** FAILED ***
[info]   false was not true (TestCoreNLPProcessor.scala:167)
[info] - should run the coreference resolver correctly
[info] - should assign head words to constituent phrases correctly
[info] - should create document text correctly
12:34:51.152 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] ERROR edu.stanford.nlp.pipeline.StanfordCoreNLP - Attempted to fetch annotator "parse" but the annotator pool does not store any such type!
12:34:52.186 [pool-1-thread-1-ScalaTest-running-TestCoreNLPProcessor] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ... done [1.0 sec].
[info] - should handle colons in dependencies
roots: 20,10
outgoing:renlp / Test / testOnly 62s
        0:
        1:
        2: (0,det) (1,compound) (7,nmod)
        3:
        4:
        5:
        6:
        7: (3,case) (4,nummod) (5,amod) (6,amod)
        8:
        9:
        10: (2,nsubj:pass) (8,aux:pass) (9,advmod) (13,obl)
        11:
        12:
        13: (11,case) (12,det) (19,nmod)
        14:
        15:
        16:
        17:
        18: (17,compound)
        19: (14,case) (15,amod) (16,amod) (18,compound)
        20:
incoming:
        0: (2,det)
        1: (2,compound)
        2: (10,nsubj:pass)
        3: (7,case)
        4: (7,nummod)
        5: (7,amod)
        6: (7,amod)
        7: (2,nmod)
        8: (10,aux:pass)
        9: (10,advmod)
        10:
        11: (13,case)
        12: (13,det)
        13: (10,obl)
        14: (19,case)
        15: (19,amod)
        16: (19,amod)
        17: (18,compound)
        18: (19,compound)
        19: (13,nmod)
        20:

[info] - should run the constituent parser correctly on texts with parentheses *** FAILED ***
[info]   false was not true (TestCoreNLPProcessor.scala:255)
[info] Run completed in 1 minute, 4 seconds.
[info] Total number of tests run: 12
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 9, failed 3, canceled 0, ignored 0, pending 0
[info] *** 3 TESTS FAILED ***
[error] Failed tests:
[error]         org.clulab.processors.TestCoreNLPProcessor
[error] (corenlp / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 72 s (01:12), completed Jun 9, 2022, 12:34:56 PM
sbt:root>

@kwalcock
Copy link
Member Author

kwalcock commented Jun 9, 2022

Below are tests for TestFastNLPProcessor. Search for "*** FAILED ***"

sbt:root> testOnly org.clulab.processors.TestFastNLPProcessor
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for main / Test / testOnly
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for Test / testOnly
[info] Passed: Total 0, Failed 0, Errors 0, Passed 0
[info] No tests to run for openie / Test / testOnly
Picked up _JAVA_OPTIONS: -Xmx20g -Dfile.encoding=UTF-8
12:40:50.478 [pool-1-thread-1] DEBUG org.clulab.dynet.Utils - Initializing DyNet...
[dynet] Checking D:\Users\kwa\Documents\MyData\Projects\clulab\processors-project\processors\corenlp for dynet_swig.dll... => corenlp / Test / testOnly 3s
[dynet] Checking C:\Users\kwa for dynet_swig.dll...
[dynet] Extracting resource dynet_swig.dll to E:\Users\kwa\tmp\dynet_swig-10529609992309681928.dll...
[dynet] Loading DyNet from E:\Users\kwa\tmp\dynet_swig-10529609992309681928.dll...
[dynet] random seed: 2522620396
[dynet] allocating memory: 512MB
[dynet] memory allocation done.
12:40:51.145 [pool-1-thread-1] DEBUG org.clulab.dynet.Utils - DyNet initialization complete.
[info] TestFastNLPProcessor:
[info] FastNLPProcessor
12:40:52.623 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator pos
12:40:55.040 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.tagger.maxent.MaxentTagger - Loading POS tagger from edu/stanford/nlp/models/pos-tagger/english-left3words-distsim.tagger ... done [2.3 sec].
12:40:55.220 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
12:40:55.478 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
12:40:55.582 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - maxAdditionalKnownLCWords=0
12:40:59.117 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [3.5 sec].
12:41:01.005 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [1.9 sec].
12:41:02.808 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.ie.AbstractSequenceClassifier - Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [1.8 sec].
12:41:02.819 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.time.JollyDayHolidays - Initializing JollyDayHoliday for SUTime from classpath edu/stanford/nlp/models/sutime/jollyday/Holidays_sutime.xml as sutime.binder.1.
12:41:04.166 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.time.TimeExpressionExtractorImpl - Using following SUTime rules: edu/stanford/nlp/models/sutime/defs.sutime.txt,edu/stanford/nlp/models/sutime/english.sutime.txt,edu/stanford/nlp/models/sutime/english.holidays.sutime.txt
12:41:06.459 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor - Ignoring inactive rule: null
12:41:06.460 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG edu.stanford.nlp.ling.tokensregex.CoreMapExpressionExtractor - Ignoring inactive rule: temporal-composite-8:ranges
12:41:12.745 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 580705 unique entries out of 581864 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_caseless.tab, 0 TokensRegex patterns.
12:41:12.789 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 4867 unique entries out of 4867 from edu/stanford/nlp/models/kbp/english/gazetteers/regexner_cased.tab, 0 TokensRegex patterns.
12:41:12.790 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.TokensRegexNERAnnotator - ner.fine.regexner: Read 585572 unique entries from 2 files
12:41:20.161 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.NERCombinerAnnotator - numeric classifiers: true; SUTime: true [no docDate]; fine grained: true
12:41:24.628 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG edu.stanford.nlp.parser.nndep.DependencyParser - Read in dep parse matrices:
12:41:24.629 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG edu.stanford.nlp.parser.nndep.DependencyParser -    E: 3865050
12:41:24.629 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG edu.stanford.nlp.parser.nndep.DependencyParser -   b1: 1000
12:41:24.629 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG edu.stanford.nlp.parser.nndep.DependencyParser -   W1: 2400000
12:41:24.629 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG edu.stanford.nlp.parser.nndep.DependencyParser -   W2: 83000
12:41:24.666 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Loading depparse model: edu/stanford/nlp/models/parser/nndep/english_UD.gz ... Time elapsed: 3.6 sec
12:41:31.108 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.parser.nndep.Classifier - PreComputed 20000 vectors, elapsed Time: 6.44 sec
12:41:31.108 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.parser.nndep.DependencyParser - Initializing dependency parser ... done [10.0 sec].
12:41:31.477 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - featureCountThresh=10
12:41:31.478 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - macro=true
12:41:31.479 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.sequences.SeqClassifierFlags - featureFactory=org.clulab.processors.corenlp.chunker.ChunkingFeatureFactory
12:41:35.197 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator natlog
12:41:35.313 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator openie
12:41:35.459 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] INFO edu.stanford.nlp.naturalli.ClauseSplitter - Loading clause splitter from edu/stanford/nlp/models/naturalli/clauseSearcherModel.ser.gz ... done [0.116 seconds]
12:41:54.583 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG org.clulab.dynet.Utils - DyNet re-initialization skipped.
12:41:54.622 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG org.clulab.dynet.Metal - Loading MTL model from org/clulab/processors/clu/mtl-en-srla-avg-e3e5e2e6e4...
12:42:04.404 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG org.clulab.dynet.Metal - Loading MTL model from org/clulab/processors/clu/mtl-en-srla-avg-e3e5e2e6e4 complete.
12:42:04.410 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG org.clulab.dynet.Metal - Loading MTL model from org/clulab/processors/clu/mtl-en-pos-chunk-srlp-avg-e6e1e11e5e7...
12:42:15.227 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG org.clulab.dynet.Metal - Loading MTL model from org/clulab/processors/clu/mtl-en-pos-chunk-srlp-avg-e6e1e11e5e7 complete.
12:42:15.390 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG org.clulab.dynet.Metal - Loading MTL model from org/clulab/processors/clu/mtl-en-ner-avg-e10e12e8e3e2...
12:42:20.229 [pool-1-thread-1-ScalaTest-running-TestFastNLPProcessor] DEBUG org.clulab.dynet.Metal - Loading MTL model from org/clulab/processors/clu/mtl-en-ner-avg-e10e12e8e3e2 complete.
[info] - should generate correct dependencies in test sentence 1 *** FAILED ***
[info]   false was not true (TestFastNLPProcessor.scala:22)
[info] FastNLPProcessor
[info] - should generate correct dependencies in test sentence 2 *** FAILED ***
[info]   false was not true (TestFastNLPProcessor.scala:38)
[info] FastNLPProcessor
[info] - should have NER unaffected by state
roots: 10
outgoing:renlp / Test / testOnly 99s
        0:
        1:
        2: (0,det) (1,compound) (4,nmod) (6,appos)
        3:
        4: (3,case)
        5:
        6: (5,punct) (7,punct)
        7:
        8:
        9:
        10: (2,nsubj:pass) (8,aux:pass) (9,advmod) (13,obl) (20,punct)
        11:
        12:
        13: (11,case) (12,det) (16,nmod)
        14:
        15:
        16: (14,case) (15,compound) (18,appos)
        17:
        18: (17,punct) (19,punct)
        19:
        20:
incoming:
        0: (2,det)
        1: (2,compound)
        2: (10,nsubj:pass)
        3: (4,case)
        4: (2,nmod)
        5: (6,punct)
        6: (2,appos)
        7: (6,punct)
        8: (10,aux:pass)
        9: (10,advmod)
        10:
        11: (13,case)
        12: (13,det)
        13: (10,obl)
        14: (16,case)
        15: (16,compound)
        16: (13,nmod)
        17: (18,punct)
        18: (16,appos)
        19: (18,punct)
        20: (10,punct)

[info] - should run the dependency parser correctly on texts with parentheses *** FAILED ***
[info]   false was not true (TestFastNLPProcessor.scala:81)
[info] - should recognize semantic roles correctly
[info] - should create semantic dependencies of the correct length
[info] Run completed in 1 minute, 41 seconds.
[info] Total number of tests run: 6
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 3, failed 3, canceled 0, ignored 0, pending 0
[info] *** 3 TESTS FAILED ***
[error] Failed tests:
[error]         org.clulab.processors.TestFastNLPProcessor
[error] (corenlp / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 103 s (01:43), completed Jun 9, 2022, 12:42:28 PM
sbt:root>

@MihaiSurdeanu
Copy link
Contributor

Thanks!
These are not too bad. I'll look into these.

@MihaiSurdeanu
Copy link
Contributor

@kwalcock : which branch should I use for these?
Thanks!

@kwalcock
Copy link
Member Author

This went with #640. Not sure why I split it up. It is now the modernize branch.

@MihaiSurdeanu
Copy link
Contributor

I pushed some changes to the unit tests, which should make them pass.
The changes are pretty easy and seem deterministic. In particular:

  • "dobj" becomes "obj"
  • "nmod*" becomes "obl*"

For the record, it seems the CoreNLP POS tagger does not follow the UD v2 tags. Neither does the constituent parser. The only that seems to consistently follow them is the dependency tagger that is wrapped in FastNLPProcessor.
I think this is ready for a PR?

@kwalcock
Copy link
Member Author

The updated PR is #643 and is under test now. A decision has to be made about the version numbering. Additionally, it might be useful for the Scala code to be able to detect which version of stanford-corenlp it is being "linked" against so that the only difference between the code in a processors v8 and v9 is version.sbt and one line in a build.sbt that defines corenlpV. The tests that have been adjusted would look more like

if (stanford.major < 4)
    doc.sentences(0).tags.get(3) should be ("TO")
else
    doc.sentences(0).tags.get(3) should be ("IN")

Then one might not need to go back and forth between branches or commits to see what has changed, keep things in sync, etc. I'm not sure how long that would be sustainable, but it doesn't seem difficult to drop if it doesn't work out. If the stanford library does not supply a version number anywhere, then sbt could supply it via the buildinfo plugin.

@kwalcock
Copy link
Member Author

I finally noticed the comment you left

// TODO: this used to be "TO" in older CoreNLP versions (< 4)

and it made me think even more that this is the way to go. The PR #644 shows how it might work. The change would also be made to the old version 8 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants