-segment_numbers option #510

qutie75 · 2018-02-01T01:52:03Z

Hello!

I want to ask about -segment_numbers option.

If i put this option when i tokenize, can i check it in my output file?

This is my command,

th tools/tokenize.lua -case_feature true -segment_case true -segment_numbers true -joiner_annotate true < input_test_en.txt > test.tok
and the output is like below.

the￨C convention￨L in￨L 1912￨N led￨L to￨L a￨L split￨L republican￨C party￨C ￭.￨N
I expected 1912 segmented like 1 9 1 2 but there is no change…

Please help me.
Thank you.

jsenellart · 2018-02-01T07:59:53Z

hi @qutie75 - yes this is a known issue. -segment_numbers only works with -mode aggressive (so you can use that for the moment) - we will fix that (or block use of the option in non-aggressive mode because it is more in the spirit of "aggressive" than "conversative" tokenization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

-segment_numbers option #510

-segment_numbers option #510

qutie75 commented Feb 1, 2018

jsenellart commented Feb 1, 2018

-segment_numbers option #510

-segment_numbers option #510

Comments

qutie75 commented Feb 1, 2018

jsenellart commented Feb 1, 2018