Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cutoff argument in remove_empty() is being implemented confusingly #568

Open
matanhakim opened this issue Feb 26, 2024 · 2 comments · May be fixed by #569
Open

cutoff argument in remove_empty() is being implemented confusingly #568

matanhakim opened this issue Feb 26, 2024 · 2 comments · May be fixed by #569

Comments

@matanhakim
Copy link
Contributor

In remove_empty(), the documentation of the cutoff argument states:

What fraction (>0 to <=1) of rows or columns must be empty to be removed?

The bold on "empty" is mine. This suggests that the higher cutoff is, more rows are supposed to be kept, since the condition to remove them is stricter.

Nevertheless, when you use different values of cutoff, the opposite behavior is observed; the higher cutoff is, less rows are kept. Please see the reprex for evidence.

I see two optional solutions:

  1. Change the behavior of cutoff, so that it would behave as described in the documentation. This has the major drawback of being a drastically breaking change. The source for this behavior lies in rows 60 and 74 of remove_empty.R, and a possible solution might be to drop ! from the calculation.
  2. Change the documentation of cutoff, so it would be clear what behavior is expected.

Happy to hear your thoughts on this!

library(tidyr)
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

billboard |>
  nrow()
#> [1] 317

billboard |> 
  janitor::remove_empty("rows", cutoff = 0.9) |> 
  nrow()
#> [1] 0

billboard |> 
  janitor::remove_empty("rows", cutoff = 0.1) |> 
  nrow()
#> [1] 293

Created on 2024-02-26 with reprex v2.0.2

@billdenney
Copy link
Collaborator

Thanks for the report, and I see what you mean. In general, we have a strong preference to avoid breaking changes. When documentation is out of line with behavior, let's fix the documentation (unless it's a honest-to-goodness bug or the behavior is catastrophically bad or ...).

Can you please suggest revised wording that would be clearer? (Feel free to either do it here in the issue or via PR.)

matanhakim added a commit to matanhakim/janitor that referenced this issue Mar 2, 2024
Fixes sfirke#568 

Feel free to edit or change, it is a bit difficult to write a clear description.
Added a clarification for lower and higher values.
@matanhakim matanhakim linked a pull request Mar 2, 2024 that will close this issue
@matanhakim
Copy link
Contributor Author

Thanks, I added a PR #569
I hope it's clear enough now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants