Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twokenize runs into NullPointerException for conll output format, with provided example (casual.txt) #23

Open
vatsan opened this issue Aug 13, 2013 · 1 comment

Comments

@vatsan
Copy link

vatsan commented Aug 13, 2013

me$ java -jar ark-tweet-nlp-0.3.2.jar --output-format conll --just-tokenize /tmp/casual.txt
Detected text input format
Exception in thread "main" java.lang.NullPointerException
at cmu.arktweetnlp.RunTagger.outputJustTagging(RunTagger.java:245)
at cmu.arktweetnlp.RunTagger.runTagger(RunTagger.java:130)
at cmu.arktweetnlp.RunTagger.main(RunTagger.java:364)

Tagger works on the same input though.

me$ java -jar ark-tweet-nlp-0.3.2.jar --output-format conll /tmp/casual.txt
Detected text input format
@Cwallll @ 0.9989
@diddy_dance @ 0.9986
ikr ! 0.8143
smh G 0.9406
he O 0.9963
asked V 0.9979
fir P 0.5545
yo D 0.6272
last A 0.9871
name N 0.9998
so P 0.9838
he O 0.9981
can V 0.9997
add V 0.9997
u O 0.9978
on P 0.9426
fb ^ 0.9453
lololol ! 0.9664

:o E 0.9387
:/ E 0.9983
:'( E 0.9975

:o E 0.9964
(: E 0.9994
:) E 0.9997
.< E 0.9952
XD E 0.9938
-__- E 0.9956
o.O E 0.9899
;D E 0.9995
:-) E 0.9992
@_@ E 0.9964
:P E 0.9996
8D E 0.9961
: E 0.6925
1 $ 0.9194
:( E 0.9715
:D E 0.9996
=| E 0.9963
" , 0.6125
) , 0.9078
: , 0.6272
E 0.4920
.... , 0.8882

@brendano
Copy link
Owner

ah, the conll output format isn't very well tested i'm afraid...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants