You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 10, 2021. It is now read-only.
If i put this option when i tokenize, can i check it in my output file?
This is my command,
th tools/tokenize.lua -case_feature true -segment_case true -segment_numbers true -joiner_annotate true < input_test_en.txt > test.tok
and the output is like below.
the│C convention│L in│L 1912│N led│L to│L a│L split│L republican│C party│C ■.│N
I expected 1912 segmented like 1 9 1 2 but there is no change…
Please help me.
Thank you.
The text was updated successfully, but these errors were encountered:
hi @qutie75 - yes this is a known issue. -segment_numbers only works with -mode aggressive (so you can use that for the moment) - we will fix that (or block use of the option in non-aggressive mode because it is more in the spirit of "aggressive" than "conversative" tokenization.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello!
I want to ask about -segment_numbers option.
If i put this option when i tokenize, can i check it in my output file?
This is my command,
th tools/tokenize.lua -case_feature true -segment_case true -segment_numbers true -joiner_annotate true < input_test_en.txt > test.tok
and the output is like below.
the│C convention│L in│L 1912│N led│L to│L a│L split│L republican│C party│C ■.│N
I expected 1912 segmented like 1 9 1 2 but there is no change…
Please help me.
Thank you.
The text was updated successfully, but these errors were encountered: