Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement createJSON() from LDAvis #233

Open
dselivanov opened this issue Jan 9, 2018 · 3 comments
Open

Reimplement createJSON() from LDAvis #233

dselivanov opened this issue Jan 9, 2018 · 3 comments

Comments

@dselivanov
Copy link
Owner

Seems that LDAvis package doesn't actively maintained and won't be updated on CRAN in near future. In particular we need option to not reorder topics and fixes for NaN in jensenShannon (see cpsievert/LDAvis#56):

  1. fix issue #56 cpsievert/LDAvis#77
  2. Do division in log space cpsievert/LDAvis#80
@manuelbickel
Copy link
Contributor

manuelbickel commented Jan 16, 2018

With respect to the Jensen Shannon divergence I think that the fix proposed by Maren-Eckhoff and pending as open pull request already solves the problem. See adapted function and test below.

There was one last comment in above mentioned issue 56 about still getting NaN, however, without providing an example. At least to my understanding, there should be no NaNs as far as the input data is fine - which it should be at this point. (please correct me if I am wrong)

#adapted jensenShannon
jensenShannon <- function(x, y) {
    m <- 0.5*(x + y)
    #introduced fix proposed by Maren-Eckhoff to avoid log(0)
    #https://github.com/cpsievert/LDAvis/issues/56
    0.5*(sum(ifelse(x==0,0,x*log(x/m)))+sum(ifelse(y==0,0,y*log(y/m))))
}
#create phi for testing
p <-     c(0.25,   0, 0.25, 0,0.5)
q <-     c(   0,0.25, 0.25, 0,0.5)
zeros <- c(   0,   0,    0, 0,  0) #this does not make sense, since row should some up to one, just for demo
phi <- rbind(p, q, qrev = rev(q), prev = rev(p), zeros)
#       [,1] [,2] [,3] [,4] [,5]
# p     0.25 0.00 0.25 0.00 0.50
# q     0.00 0.25 0.25 0.00 0.50
# qrev  0.50 0.00 0.25 0.25 0.00
# prev  0.50 0.00 0.25 0.00 0.25
# zeros 0.00 0.00 0.00 0.00 0.00
dist.mat <- proxy::dist(x = phi, method = jensenShannon)
pca.fit <- stats::cmdscale(dist.mat, k = 2)
# [,1]       [,2]
# p      4.600278e-02 -0.1037688
# q      2.600304e-01 -0.0176260
# qrev  -2.600304e-01 -0.0176260
# prev  -4.600278e-02 -0.1037688
# zeros  2.073058e-16  0.2427896

@dselivanov
Copy link
Owner Author

True, but

  1. PR was not merged yet
  2. I doubt maintainer will upload updated package on CRAN in near-future

@manuelbickel
Copy link
Contributor

manuelbickel commented Jan 17, 2018

Maybe my comment was misleading, sorry. I agree that LDAvis will have to be reimplemented, just wanted to confirm that the fix works for this purpose. Hence, in the first step a modified copy of createJSON might quickly solve the issues raised above in terms of creating the data for visualization. Another thing is, of course, the potential reimplementation of visualization itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants