Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

row_means() proportion of datapoints #144

Open
florisvanvugt opened this issue Jan 22, 2021 · 2 comments
Open

row_means() proportion of datapoints #144

florisvanvugt opened this issue Jan 22, 2021 · 2 comments

Comments

@florisvanvugt
Copy link

florisvanvugt commented Jan 22, 2021

row_means() has an argument n which allows us to specify the proportion of values required per row to return a mean. For example, n=.75 in my understanding is supposed to return a mean only if at least 75% of values in that row are non-NA. The following behaviour is therefore contrary to what I expected:

> df<-data.frame(q1=c(1,2),q2=c(2,NA),q3=c(1,1))
> df
  q1 q2 q3
1  1  2  1
2  2 NA  1
> sjmisc::row_means(df,n=.75)
  q1 q2 q3 rowmeans
1  1  2  1 1.333333
2  2 NA  1 1.500000

I had expected the second entry of the rowmeans column to be NA, because only 2 out of 3 values in that column are non-NA, i.e. 67% which is less than 75%. I realize I might be missing something about the intended behaviour of this function.

> packageVersion('sjmisc')
[1] ‘2.8.6’

> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS
@strengejacke
Copy link
Owner

strengejacke commented Nov 26, 2021

> 3 * .75
[1] 2.25

0.75 of 3 columns is approx. 2 columns. so 75% of 3 columns is closer to 2 than to 3 columns.

@florisvanvugt
Copy link
Author

Hi strengejacke, thanks for your reply on this issue. I see the logic of rounding but I think in a scientific context we need flooring. When calculating the average of say questionnaire responses in psychology, we want to have a value only if at least X% of the responses are valid. So the current function does not allow that if I understand correctly. I'm wondering if we could add an argument to the function that allows the user to choose between rounding or flooring behaviour when calculating the number of valid responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants