You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for making the great library!
When parsing long documents, the syntactic_head_ID will sometimes reference a token in the previous sentence. For example, in the parsing output in the attached file (dKDD.csv):
0 2 0 28 she she 122 125 PRON PRP nsubj 29 O
0 2 1 29 's be 126 128 AUX VBZ ROOT 29 O
0 2 2 30 not not 129 132 PART RB neg 29 O
0 2 3 31 the the 133 136 DET DT det 32 O
0 2 4 32 one one 137 140 NOUN NN attr 29 O
0 2 5 33 to to 141 143 PART TO aux 34 O
0 2 6 34 write write 144 149 VERB VB relcl 32 O
0 2 7 35 . . 150 151 PUNCT . punct 29 O
0 3 0 36 Yeah yeah 152 156 INTJ UH intj 35 O
0 3 1 37 . . 157 158 PUNCT . punct 36 O
The syntactic_head_ID of token 36 (in sentence 3) is token 35 (sentence 2), which doesn't seem to make sense.
The same happens with tokens 62, 68, 91, 202, 276, 327, 328, 344, 376, 378, 385, 387, 433, 434, 499, 503, 516, 550, 556, 557, 558, 566, 589, 725, 751, 755, 813, 818, 843, 845, 853, 876, 880, 1450, 1502, 1563, 1756, 1881, 1882, 1902, 1926, 1972, 1993, 2054, 2058, 2059, 2086, 2097, 2103, 2488, 2489, 2511.
Is there a way to fix this? dKDD.csv dKDD.txt
The text was updated successfully, but these errors were encountered:
Thanks for the note! Yes that does seems weird -- if this is with a version <1.0.7, try upgrading and see if it still happens (I'm running 1.0.7 and the "big" model and not seeing that issue with dKDD.txt.)
Thanks for the note! Yes that does seems weird -- if this is with a version <1.0.7, try upgrading and see if it still happens (I'm running 1.0.7 and the "big" model and not seeing that issue with dKDD.txt.)
I tried again and discovered that this bug only occurs when using en_core_web_lg as the parsing model (which I prefer as it appears to give more accurate results). Any idea why this is happening?
Hi, thank you for making the great library!
When parsing long documents, the syntactic_head_ID will sometimes reference a token in the previous sentence. For example, in the parsing output in the attached file (dKDD.csv):
The syntactic_head_ID of token 36 (in sentence 3) is token 35 (sentence 2), which doesn't seem to make sense.
The same happens with tokens 62, 68, 91, 202, 276, 327, 328, 344, 376, 378, 385, 387, 433, 434, 499, 503, 516, 550, 556, 557, 558, 566, 589, 725, 751, 755, 813, 818, 843, 845, 853, 876, 880, 1450, 1502, 1563, 1756, 1881, 1882, 1902, 1926, 1972, 1993, 2054, 2058, 2059, 2086, 2097, 2103, 2488, 2489, 2511.
Is there a way to fix this?
dKDD.csv
dKDD.txt
The text was updated successfully, but these errors were encountered: