You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think if i want to use a specific tokenizer (for processing language such as CJK) to build corpus with part of speech,
i should implement my own tokenstream and set it to CorpusData object
and call encode method to format it.
And with the help of decode function in https://github.com/PolMine/polmineR
i can perform CQP on my own corpus .
(then it is only require install cwbtools and polmineR without need the help from http://cwb.sourceforge.net/devs.php)
I want to know if i am right ?
And if the lexer use to parse CQP can also match the “pos” i defined by my own specific tokenizer ?
The text was updated successfully, but these errors were encountered:
I think if i want to use a specific tokenizer (for processing language such as CJK) to build corpus with part of speech,
i should implement my own tokenstream and set it to CorpusData object
and call encode method to format it.
And with the help of decode function in https://github.com/PolMine/polmineR
i can perform CQP on my own corpus .
(then it is only require install cwbtools and polmineR without need the help from
http://cwb.sourceforge.net/devs.php)
I want to know if i am right ?
And if the lexer use to parse CQP can also match the “pos” i defined by my own specific tokenizer ?
The text was updated successfully, but these errors were encountered: