You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This comes from quanteda/quanteda.sentiment#11, which is a more general question about how a function can return the set of original tokens matching a dictionary lookup, not just using tokens_select(), but rather returning the matches along with each key.
In the issue referred to above, a data.frame output was requested, although this could of course be a list, or a list by document.
Here's how I cobbled together a means - but it would be more efficient to consider building this in as a function.
library("quanteda")
## Package version: 2.9.9000## Unicode version: 10.0## ICU version: 61.1## Parallel computing: 12 of 12 threads used.## See https://quanteda.io for tutorials and examples.dict<- dictionary(list(
positive= c("good", "not bad"),
negative="not good"
))
toks<- tokens(c(
d1="The good test was not good",
d2="It's not good to be not bad"
))
toks2<-toks %>%
tokens_replace(rep(names(dict), lengths(dict)), unlist(dict, use.names=FALSE)) %>%
tokens_select(dict) %>%
tokens_compound(dict, concatenator="")
data.frame(
key= as.character(tokens_lookup(toks2, dict, nested_scope="dictionary")),
token= as.character(toks2)
)
## key token## 1 positive good## 2 negative not good## 3 negative not good## 4 positive not bad
Created on 2021-02-21 by the reprex package (v1.0.0)
The text was updated successfully, but these errors were encountered:
That's a good and quick ("kwic"? 😄) solution! But how would we deal with the nested dictionary issue, so that in d1, we don't match "not good" as "pattern = positive, keyword = good"?
This comes from quanteda/quanteda.sentiment#11, which is a more general question about how a function can return the set of original tokens matching a dictionary lookup, not just using
tokens_select()
, but rather returning the matches along with each key.In the issue referred to above, a data.frame output was requested, although this could of course be a list, or a list by document.
Here's how I cobbled together a means - but it would be more efficient to consider building this in as a function.
Created on 2021-02-21 by the reprex package (v1.0.0)
The text was updated successfully, but these errors were encountered: