row_means() proportion of datapoints #144

florisvanvugt · 2021-01-22T20:00:18Z

row_means() has an argument n which allows us to specify the proportion of values required per row to return a mean. For example, n=.75 in my understanding is supposed to return a mean only if at least 75% of values in that row are non-NA. The following behaviour is therefore contrary to what I expected:

> df<-data.frame(q1=c(1,2),q2=c(2,NA),q3=c(1,1))
> df
  q1 q2 q3
1  1  2  1
2  2 NA  1
> sjmisc::row_means(df,n=.75)
  q1 q2 q3 rowmeans
1  1  2  1 1.333333
2  2 NA  1 1.500000

I had expected the second entry of the rowmeans column to be NA, because only 2 out of 3 values in that column are non-NA, i.e. 67% which is less than 75%. I realize I might be missing something about the intended behaviour of this function.

> packageVersion('sjmisc')
[1] ‘2.8.6’

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

The text was updated successfully, but these errors were encountered:

strengejacke · 2021-11-26T10:19:14Z

> 3 * .75
[1] 2.25

0.75 of 3 columns is approx. 2 columns. so 75% of 3 columns is closer to 2 than to 3 columns.

florisvanvugt · 2021-11-26T13:07:13Z

Hi strengejacke, thanks for your reply on this issue. I see the logic of rounding but I think in a scientific context we need flooring. When calculating the average of say questionnaire responses in psychology, we want to have a value only if at least X% of the responses are valid. So the current function does not allow that if I understand correctly. I'm wondering if we could add an argument to the function that allows the user to choose between rounding or flooring behaviour when calculating the number of valid responses.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

row_means() proportion of datapoints #144

row_means() proportion of datapoints #144

florisvanvugt commented Jan 22, 2021 •

edited

strengejacke commented Nov 26, 2021 •

edited

florisvanvugt commented Nov 26, 2021

row_means() proportion of datapoints #144

row_means() proportion of datapoints #144

Comments

florisvanvugt commented Jan 22, 2021 • edited

strengejacke commented Nov 26, 2021 • edited

florisvanvugt commented Nov 26, 2021

florisvanvugt commented Jan 22, 2021 •

edited

strengejacke commented Nov 26, 2021 •

edited