Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ratings dataset] Ambiguity in the "category" column description. #35

Open
OmaymaS opened this issue Oct 31, 2018 · 1 comment
Open

Comments

@OmaymaS
Copy link

OmaymaS commented Oct 31, 2018

I think that the description category column in the rating dataset might be ambiguous .

> levels(ratings$category)
 [1] "Aged 18-29"         "Aged 30-44"         "Aged 45+"           "Aged under 18"      "Females"           
 [6] "Females Aged 18-29" "Females Aged 30-44" "Females Aged 45+"   "Females under 18"   "IMDb staff"        
[11] "IMDb users"         "Males"              "Males Aged 18-29"   "Males Aged 30-44"   "Males Aged 45+"    
[16] "Males under 18"     "Non-US users"       "Top 1000 voters"    "US users"    

Because there could be questions like:

  • Is the Males under 18 a subset of all Males, and if not, how do the categories differ?
  • Is there any intersection between the categories?
  • If the number of respondents in 'Females Aged 18-29'+'Females Aged 30-44'+'Females Aged 45+'+'Females under 18' are less that the number of respondents in the Female category. Is the gap due to respondents with unknown age?

I checked an example on IMDB, but I am not sure how things sum up in the dataset.

demo

@rudeboybert
Copy link
Owner

Hey @OmaymaS, thanks for the heads up. fivethirtyeight::ratings is simply a repackaging of the original data shared by 538 on their GitHub data page here, so I think it would make more sense to get this addressed upstream first, and then update the downstream package accordingly. Could you create an issue on fivethirtyeight/data and tag me @rudeboybert?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants