Integrating fcm value weighting #2149

eisioriginal · 2021-11-03T13:26:48Z

Requested feature

I want to integrate significance calculations based on Log-Likelihood, PMI, DICE and Poisson to the fcm object.

Use case

Co-occurrences can be weighted by statistical significance. Which gives more semantic representations.

eisioriginal · 2021-11-03T13:27:43Z

I have the methods and they work efficiently. I think they should be integrated instead of me creating a new package!

kbenoit · 2021-11-03T14:34:28Z

Can you provide an example so we have a clearer idea of what these do and what sort of output is generated? We have some efficient association methods already used in quanteda.textstats::textstat_keyness() and might be able to adapt these if we knew exactly what sort of association statistics you are interested in generating.

eisioriginal · 2021-11-03T16:13:44Z

Hi, basically I talk about optimized implementations of those methods: https://tm4ss.github.io/docs/Tutorial_5_Co-occurrence.html (I'm one of the authors) The visualization is pretty much the same as in Quanteda and recently a started to use Quantedas method. This part can be ignored.

You can read about them also in https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.471.5863&rep=rep1&type=pdf

What you get is an association between words based on the log-likelihood, PMI, DICE and Poisson Significance or weighting schemes. They are not all strict significance measures, but they are very helpful in order to find relevant association between words. Additionally, they are working well in situations where Chi^2 is problematic (Rare cases etc.).

koheiw · 2021-11-04T02:22:55Z

I wrote a small function to compute PMI using FCM while ago. Do you want to add something like this?

> toks <- tokens(c("a b c", "a b d e"))
> fcmt <- fcm(toks)
> 
> fcm_pmi <- function(x) {
+   m <- x@meta$object$margin
+   x <- as(x, "dgTMatrix")
+   x@x <- log(x@x / (m[x@i + 1] * m[x@j + 1]) * sum(m))
+   x@x[x@x < 0] <- 0
+   as.fcm(x)
+ }
> 
> fcmt
Feature co-occurrence matrix of: 5 by 5 features.
        features
features a b c d e
       a 0 2 1 1 1
       b 0 0 1 1 1
       c 0 0 0 0 0
       d 0 0 0 0 1
       e 0 0 0 0 0
> fcm_pmi(fcmt)
Feature co-occurrence matrix of: 5 by 5 features.
        features
features a        b        c        d        e
       a 0 1.252763 1.252763 1.252763 1.252763
       b 0 0.000000 1.252763 1.252763 1.252763
       c 0 0        0.000000 0        0       
       d 0 0        0        0.000000 1.945910
       e 0 0        0        0        0.000000

You are welcome to issue a pull request!

eisioriginal · 2021-11-04T09:59:33Z

Yes, this is exactly what I'm proposing, but I want to add more association measures since they all have different properties w.r.t. to research questions and researcher requirements. Since I'm using them all the time, I thought an integration to Quanteda would be nice for the whole community.

I have a background in CSS and a PhD in Computer Science. The proposed measures are our standard repertoire when it comes to semantic interpretation of text resources. I work in the Computational Humanities group in Leipzig University.

koheiw · 2021-11-04T12:00:08Z

Why don't you start a branch to add a new function called fcm_weight() with additional measures? I am happy to assist.

eisioriginal · 2021-11-04T12:50:16Z

Nice, will do!

kbenoit · 2021-11-04T12:53:16Z

Probably better in quanteda.textstats since that's where the association statistics code already lives, and since this is a textual statistic.

koheiw · 2021-11-04T23:33:15Z

I wrote fcm_pmi() for pre-processing for SVD, so I though should be in the main package. If it is for network analysis, textstats would be a better place. @eisioriginal how do you want to use the output.

eisioriginal · 2021-11-08T07:57:33Z

Basically I do analyse networks, search for synonyms, mine for semantic chances and all that sort of things.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating fcm value weighting #2149

Integrating fcm value weighting #2149

eisioriginal commented Nov 3, 2021

eisioriginal commented Nov 3, 2021

kbenoit commented Nov 3, 2021

eisioriginal commented Nov 3, 2021 •

edited

koheiw commented Nov 4, 2021 •

edited

eisioriginal commented Nov 4, 2021 •

edited

koheiw commented Nov 4, 2021

eisioriginal commented Nov 4, 2021

kbenoit commented Nov 4, 2021

koheiw commented Nov 4, 2021

eisioriginal commented Nov 8, 2021

Integrating fcm value weighting #2149

Integrating fcm value weighting #2149

Comments

eisioriginal commented Nov 3, 2021

Requested feature

Use case

eisioriginal commented Nov 3, 2021

kbenoit commented Nov 3, 2021

eisioriginal commented Nov 3, 2021 • edited

koheiw commented Nov 4, 2021 • edited

eisioriginal commented Nov 4, 2021 • edited

koheiw commented Nov 4, 2021

eisioriginal commented Nov 4, 2021

kbenoit commented Nov 4, 2021

koheiw commented Nov 4, 2021

eisioriginal commented Nov 8, 2021

eisioriginal commented Nov 3, 2021 •

edited

koheiw commented Nov 4, 2021 •

edited

eisioriginal commented Nov 4, 2021 •

edited