Suggestion of new function: `describe_missing()` #454

rempsyc · 2023-09-02T16:43:23Z

When writing (psychology) scientific papers, great care must be taken in reporting the state of item-level missing data for each psychological questionnaire. For example, Parent (2013) writes:

I recommend that authors (a) state their tolerance level for missing data by scale or subscale (e.g., “We calculated means for all subscales on which participants gave at least 75% complete data”) and then (b) report the individual missingness rates by scale per data point (i.e., the number of missing values out of all data points on that scale for all participants) and the maximum by participant (e.g., “For Attachment Anxiety, a total of 4 missing data points out of 100 were observed, with no participant missing more than a single data point”).

In order to comply with this recommandation, I have developed the function nice_na(), which nicely summarizes NA values according to those guidelines. The function describes both absolute and percentage values of specified column lists and supports specifying scales through regex. Reprex:

library(rempsyc)

# If the questionnaire items start with the same name, e.g.,
set.seed(15)
fun <- function() {
  c(sample(c(NA, 1:10), replace = TRUE), NA, NA, NA)
}
df <- data.frame(
  ID = c("idz", NA),
  open_1 = fun(), open_2 = fun(), open_3 = fun(),
  extrovert_1 = fun(), extrovert_2 = fun(), extrovert_3 = fun(),
  agreeable_1 = fun(), agreeable_2 = fun(), agreeable_3 = fun()
)

head(df, 3)
#>     ID open_1 open_2 open_3 extrovert_1 extrovert_2 extrovert_3 agreeable_1
#> 1  idz      4     NA      1           5           6           1           7
#> 2 <NA>      9      4      3           1          10          NA           7
#> 3  idz      1      4      1           9           2          NA           8
#>   agreeable_2 agreeable_3
#> 1           7           9
#> 2           7           2
#> 3           7           8

# One can list the scale names directly:
nice_na(df, scales = c("ID", "open", "extrovert", "agreeable"))
#>                       var items na cells na_percent na_max na_max_percent
#> 1                   ID:ID     1  7    14      50.00      1            100
#> 2           open_1:open_3     3 11    42      26.19      3            100
#> 3 extrovert_1:extrovert_3     3 17    42      40.48      3            100
#> 4 agreeable_1:agreeable_3     3 10    42      23.81      3            100
#> 5                   Total    10 45   140      32.14     10            100
#>   all_na
#> 1      7
#> 2      3
#> 3      3
#> 4      3
#> 5      2

^{Created on 2023-09-02 with reprex v2.0.2}

Would you like this function to migrate from `rempsyc` to `datawizard`?

For the name, I was thinking data_missing_items or just data_missing since it also works without scale items and it is similar to our other data_ functions like data_duplicated. It could also be describe_missing in line with describe_distribution (actually that one makes more sense I think).

The text was updated successfully, but these errors were encountered:

DominiqueMakowski · 2023-09-03T09:38:47Z

describe_missing() is good I think. + a report() method in report to have a text version would be neat

rempsyc added the feature idea 🔥 label Sep 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion of new function: `describe_missing()` #454

Suggestion of new function: `describe_missing()` #454

rempsyc commented Sep 2, 2023 •

edited

DominiqueMakowski commented Sep 3, 2023

Suggestion of new function: describe_missing() #454

Suggestion of new function: describe_missing() #454

Comments

rempsyc commented Sep 2, 2023 • edited

Would you like this function to migrate from rempsyc to datawizard?

DominiqueMakowski commented Sep 3, 2023

Suggestion of new function: `describe_missing()` #454

Suggestion of new function: `describe_missing()` #454

rempsyc commented Sep 2, 2023 •

edited

Would you like this function to migrate from `rempsyc` to `datawizard`?