Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning message in convert(., to = "stm") wrong #2346

Open
AdaemmerP opened this issue Feb 13, 2024 · 1 comment
Open

Warning message in convert(., to = "stm") wrong #2346

AdaemmerP opened this issue Feb 13, 2024 · 1 comment

Comments

@AdaemmerP
Copy link

Describe the bug

convert(., to = "stm") correctly drops empty documents, but the warning message suggests that all documents are dropped.

Reproducible code

# documents with one empty document
docs <- c("",  
           "not empty",
           "also not empty")

# tokens -> dfm -> convert to stm format
docs |> 
  tokens() |> 
  dfm() |> 
  convert(to = "stm")


> Warning message:
> In dfm2stm(x, docvars, omit_empty = TRUE) : Dropped 3 empty document(s)

Expected behavior

The warning should state that 1 and not 3 empty documents are dropped.
It might even be better to throw an error so that the empty documents have to be dropped beforehand.

## System information

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] quanteda_3.3.1

Additional info

Please add any other information about the issue.

@AdaemmerP
Copy link
Author

empty_docs <- rowSums(x) == 0

warning("Dropped ", format(length(empty_docs), big.mark = ","),

I assume length(empty_docs) has to be changed to sum(empty_docs)? If so the same holds for length(empty_feats) in line 328.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant