You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When calling KyTea with a raw input file, KyTea stops processing at the first newline character.
Some sentences (particularly headlines) are delimited by a newline rather than any punctuation. If you remove the newline following a headline in a blog post or a newspaper article the headline sentence will continue into the first sentence in the article.
On the other hand, KyTea seems to quit without doing any processing if the raw input file doesn't contain any newline character.
Moreover, KyTea doesn't seem to do word segmentation when input is tokenised ($ kytea infile.tok -in tok), seemingly making the -nows flag redundant.
The text was updated successfully, but these errors were encountered:
Thanks for the report. I can't confirm this behavior.
Can you give me more information about the version of KyTea you are using, the exact command you ran, the environment you are running it in, and an example file that causes the problem?
Thanks for the quick reply. I run OSX El Capitan 10.11.3 and Ubuntu 12.10 both with kytea-0.4.7 compiled from source.
I did some more tests and it appears that I over interpreted my initial test case. Kytea doesn't stop processing at the first newline character, but at the last newline character.
If you create a file in unix (using the > operator on the command line or with nano), it will automatically append a newline at the end of the file, but files generated with Python, Perl or Sublime text doesn't necessarily have a newline at the end of the file, and this is how I stumbled upon the problem.
$ echo "社長兼業務部長" > infile.txt
$ cat infile.txt
社長兼業務部長。
$ kytea infile.txt
社長/名詞/しゃちょう 兼/名詞/けん 業務/名詞/ぎょうむ 部長/名詞/ぶちょう 。/補助記号/。
$ perl -pi -e 'chomp if eof' infile.txt // Deletes the trailing \n from infile.txt
$ cat infile.txt
社長兼業務部長。%
$ kytea infile.txt // hangs for a while, then doesn't give any output
$
When calling KyTea with a raw input file, KyTea stops processing at the first
newline
character.Some sentences (particularly headlines) are delimited by a
newline
rather than any punctuation. If you remove thenewline
following a headline in a blog post or a newspaper article the headline sentence will continue into the first sentence in the article.On the other hand, KyTea seems to quit without doing any processing if the raw input file doesn't contain any
newline
character.Moreover, KyTea doesn't seem to do word segmentation when input is tokenised (
$ kytea infile.tok -in tok
), seemingly making the-nows
flag redundant.The text was updated successfully, but these errors were encountered: