Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frq() problem with NA value and weight with percent calculation #128

Open
caayala opened this issue Jan 7, 2020 · 3 comments
Open

frq() problem with NA value and weight with percent calculation #128

caayala opened this issue Jan 7, 2020 · 3 comments

Comments

@caayala
Copy link

caayala commented Jan 7, 2020

This issue could be a regression from #77.

df <- data.frame(val = c(1, 2, NA),
                 wgt = c(1, 1, 1))

sjmisc::frq(df, val)
#> 
#> val <numeric>
#> # total N=3  valid N=2  mean=1.50  sd=0.71
#> 
#>  val frq raw.prc valid.prc cum.prc
#>    1   1   33.33        50      50
#>    2   1   33.33        50     100
#>   NA   1   33.33        NA      NA
sjmisc::frq(df, val, weights = wgt)
#> 
#> val <numeric>
#> # total N=2  valid N=2  mean=1.50  sd=0.71
#> 
#>  val frq raw.prc valid.prc cum.prc
#>    1   1      50        50      50
#>    2   1      50        50     100
#>   NA   0       0        NA      NA

Created on 2020-01-07 by the reprex package (v0.3.0)

I expect that both frequencies tables are equal.

  • sjmisc [2.8.2]
@frankcsliu
Copy link

I have the same question: I was expecting that the number of NA in the output of using weights should be remained consistent with that before using weights. However, the manual help page says that "the weight will be applied to weight all observations," which implies those labelled with NA maybe moved to other categories?

My replicable dataset is here: https://github.com/frankcsliu/R4surveyresearch/blob/master/tscs2013.rda
I was comparing these two results and felt confuesed.
frq(tscs2013$v65r)
frq(tscs2013$v65r, weights = tscs2013$wr)

Thank you.

@strengejacke
Copy link
Owner

I'm not sure how to consistently do this. When there's just one NA, the weighting would work. But multiple NA values, possibly with missing weights as well, no longer works. How to best treat those cases?

df <- data.frame(val = c(1, 2, NA, 5),
                 wgt = c(.7, 1.2, 1.1, NA))

df
#>   val wgt
#> 1   1 0.7
#> 2   2 1.2
#> 3  NA 1.1
#> 4   5  NA
xtabs(wgt~val, data = df, addNA = TRUE)
#> val
#>    1    2    5 <NA> 
#>  0.7  1.2       1.1


df <- data.frame(val = c(1, 2, NA, 5, NA),
                 wgt = c(.7, 1.2, 1.1, NA, NA))

df
#>   val wgt
#> 1   1 0.7
#> 2   2 1.2
#> 3  NA 1.1
#> 4   5  NA
#> 5  NA  NA
xtabs(wgt~val, data = df, addNA = TRUE)
#> val
#>    1    2    5 <NA> 
#>  0.7  1.2

Created on 2021-11-27 by the reprex package (v2.0.1)

@strengejacke
Copy link
Owner

Maybe only consider complete cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants