Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tabyl: percent vs share #300

Open
cstepper opened this issue May 17, 2019 · 2 comments
Open

tabyl: percent vs share #300

cstepper opened this issue May 17, 2019 · 2 comments

Comments

@cstepper
Copy link

Hi,

I'm excited about discovering the janitor package - especially for the tabyl function.

I just have one remark - not sure if they can/want to implement it:

Feature requests

When calling tabyl on one variable, it returns a data.frame with columns

  • "var"
  • n
  • percent
  • valid_percent (if any NA are present)

In my opinion, the percent and valid_percent columns do not show percent values, as they do not sum up to 100. They rather show shares (which I do prefer over percent).

WRT consistent naming, IMO these variables should be named something like share and valid_share.

Not a big deal to rename these afterwards, but annoying to do it again and again. It'll be fantastic if you would consider changing the names.

library(tidyverse)
#> Registered S3 method overwritten by 'rvest':
#>   method            from
#>   read_xml.response xml2
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

tab_hc = dplyr::starwars %>% 
  tabyl(hair_color)

tab_hc %>% select(percent, valid_percent) %>% colSums(na.rm = TRUE)
#>       percent valid_percent 
#>             1             1

Created on 2019-05-17 by the reprex package (v0.2.1.9000)

@sfirke
Copy link
Owner

sfirke commented May 17, 2019

It's issue 300!

Someone else had previously lamented that percent was not technically a correct name. I think it was a Twitter conversation I was only an observer in. Maybe she suggested proportion or prop as a better name.

I don't disagree, but at this point I believe the cost of making this change - moderate cost to me of updating code, potentially large annoyance to current users as their existing code breaks - outweighs the benefit of a potentially more-clear name.

If anyone ever experiences the problem that a reader incorrectly interprets percent = 0.37 as 0.37% instead of the correct 37%, as a result of this naming in tabyl, I deeply apologize for the inconvenience 😔

Glad you like the package otherwise!

@sfirke
Copy link
Owner

sfirke commented Mar 2, 2022

I opened a discussion re: the merits of renaming here: #474

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants