-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating fcm value weighting #2149
Comments
I have the methods and they work efficiently. I think they should be integrated instead of me creating a new package! |
Can you provide an example so we have a clearer idea of what these do and what sort of output is generated? We have some efficient association methods already used in |
Hi, basically I talk about optimized implementations of those methods: https://tm4ss.github.io/docs/Tutorial_5_Co-occurrence.html (I'm one of the authors) The visualization is pretty much the same as in Quanteda and recently a started to use Quantedas method. This part can be ignored. You can read about them also in https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.471.5863&rep=rep1&type=pdf What you get is an association between words based on the log-likelihood, PMI, DICE and Poisson Significance or weighting schemes. They are not all strict significance measures, but they are very helpful in order to find relevant association between words. Additionally, they are working well in situations where Chi^2 is problematic (Rare cases etc.). |
I wrote a small function to compute PMI using FCM while ago. Do you want to add something like this? > toks <- tokens(c("a b c", "a b d e"))
> fcmt <- fcm(toks)
>
> fcm_pmi <- function(x) {
+ m <- x@meta$object$margin
+ x <- as(x, "dgTMatrix")
+ x@x <- log(x@x / (m[x@i + 1] * m[x@j + 1]) * sum(m))
+ x@x[x@x < 0] <- 0
+ as.fcm(x)
+ }
>
> fcmt
Feature co-occurrence matrix of: 5 by 5 features.
features
features a b c d e
a 0 2 1 1 1
b 0 0 1 1 1
c 0 0 0 0 0
d 0 0 0 0 1
e 0 0 0 0 0
> fcm_pmi(fcmt)
Feature co-occurrence matrix of: 5 by 5 features.
features
features a b c d e
a 0 1.252763 1.252763 1.252763 1.252763
b 0 0.000000 1.252763 1.252763 1.252763
c 0 0 0.000000 0 0
d 0 0 0 0.000000 1.945910
e 0 0 0 0 0.000000 You are welcome to issue a pull request! |
Yes, this is exactly what I'm proposing, but I want to add more association measures since they all have different properties w.r.t. to research questions and researcher requirements. Since I'm using them all the time, I thought an integration to Quanteda would be nice for the whole community. I have a background in CSS and a PhD in Computer Science. The proposed measures are our standard repertoire when it comes to semantic interpretation of text resources. I work in the Computational Humanities group in Leipzig University. |
Why don't you start a branch to add a new function called |
Nice, will do! |
Probably better in quanteda.textstats since that's where the association statistics code already lives, and since this is a textual statistic. |
I wrote |
Basically I do analyse networks, search for synonyms, mine for semantic chances and all that sort of things. |
Requested feature
I want to integrate significance calculations based on Log-Likelihood, PMI, DICE and Poisson to the fcm object.
Use case
Co-occurrences can be weighted by statistical significance. Which gives more semantic representations.
The text was updated successfully, but these errors were encountered: