Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stm/prevalence issue #272

Open
yuanyuan0105 opened this issue May 19, 2022 · 5 comments
Open

stm/prevalence issue #272

yuanyuan0105 opened this issue May 19, 2022 · 5 comments

Comments

@yuanyuan0105
Copy link

I tried to run a stm function as below, but got an error message:
"Error in stm(documents = out$documents, vocab = out$vocab, K = 0, data = out$meta, : number of observations in content covariate (1) prevalence covariate (20263) and documents (20263) are not all equal."

the code I have is like this:

stmfit <- stm(documents = out$documents, vocab = out$vocab,
K = 0 ,data = out$meta, prevalence =~ timenum,
max.em.its = 75,seed=24601,
init.type = "Spectral", verbose = FALSE,
control <- list(tSNE_init.dims=80))

I did not specify "content =" argument in my code as I see some examples only have "prevalence" as well.
So I would like to know what causes this error and how to solve it?

Many thanks

@santoroma
Copy link

santoroma commented May 19, 2022 via email

@yuanyuan0105
Copy link
Author

Hi @santoroma,

I attached the dataset and my code below

https://docs.google.com/spreadsheets/d/1eStIhewnnMxmYG0MEDgYz3euJThRsjPV4YELpldatlk/edit?usp=sharing

library(stm)
processed <- textProcessor(data_english$text, metadata = data_english)
out <- prepDocuments(processed$documents, processed$vocab, processed$meta)
First_STM <- stm(documents = out$documents, vocab = out$vocab,
K = 0,data = out$meta, prevalence =~ s(timenum),
init.type = "Spectral", verbose = FALSE,
control <- list(tSNE_init.dims=80))

Thanks much for your help in advance!

@bfisseler
Copy link

bfisseler commented Aug 25, 2022

It's very likely that you got missings in your covariates. STM currently cannot handle missing values: "6Note that the model does not permit estimation when there are variables used in the model that have missing values. As such, it can be helpful to subset data to observations that do not have missing values for metadata that will be used in the STM model."

Roberts, M. E., Stewart, B. M. & Tingley, D. (2019). stm: An R Package for Structural Topic Models. Journal of Statistical Software, 91, 1–40. https://doi.org/10.18637/jss.v091.i02

@JvH13
Copy link

JvH13 commented Jun 6, 2023

I am having the same issue. I followed the instructions in #144, but I don't have missing values. What puzzles me is why it throws an error about the content covariate (1), while I do not have a content covariate in my model. The prevalence covariate and document covariate have equal lengths and no missing values.

@vandytripp
Copy link

I am having the same issue. I followed the instructions in #144, but I don't have missing values. What puzzles me is why it throws an error about the content covariate (1), while I do not have a content covariate in my model. The prevalence covariate and document covariate have equal lengths and no missing values.

I am having the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants