Build my own corpus by a specific language with its part of speech #41

svjack · 2021-07-12T09:39:23Z

I think if i want to use a specific tokenizer (for processing language such as CJK) to build corpus with part of speech,
i should implement my own tokenstream and set it to CorpusData object
and call encode method to format it.
And with the help of decode function in https://github.com/PolMine/polmineR
i can perform CQP on my own corpus .
(then it is only require install cwbtools and polmineR without need the help from
http://cwb.sourceforge.net/devs.php)

I want to know if i am right ?

And if the lexer use to parse CQP can also match the “pos” i defined by my own specific tokenizer ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build my own corpus by a specific language with its part of speech #41

Build my own corpus by a specific language with its part of speech #41

svjack commented Jul 12, 2021 •

edited

Build my own corpus by a specific language with its part of speech #41

Build my own corpus by a specific language with its part of speech #41

Comments

svjack commented Jul 12, 2021 • edited

svjack commented Jul 12, 2021 •

edited