Skip to content
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.

-segment_numbers option #510

Open
qutie75 opened this issue Feb 1, 2018 · 1 comment
Open

-segment_numbers option #510

qutie75 opened this issue Feb 1, 2018 · 1 comment

Comments

@qutie75
Copy link

qutie75 commented Feb 1, 2018

Hello!

I want to ask about -segment_numbers option.

If i put this option when i tokenize, can i check it in my output file?

This is my command,

th tools/tokenize.lua -case_feature true -segment_case true -segment_numbers true -joiner_annotate true < input_test_en.txt > test.tok
and the output is like below.

the│C convention│L in│L 1912│N led│L to│L a│L split│L republican│C party│C ■.│N
I expected 1912 segmented like 1 9 1 2 but there is no change…

Please help me.
Thank you.

@jsenellart
Copy link
Contributor

hi @qutie75 - yes this is a known issue. -segment_numbers only works with -mode aggressive (so you can use that for the moment) - we will fix that (or block use of the option in non-aggressive mode because it is more in the spirit of "aggressive" than "conversative" tokenization.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

2 participants