Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dummy variable coefficients for ordered factors comes out wonky in regression tables #28

Open
rudeboybert opened this issue Apr 4, 2018 · 0 comments
Assignees

Comments

@rudeboybert
Copy link
Owner

rudeboybert commented Apr 4, 2018

Might need to unconvert all ordered = TRUE factors to unordered.

suppressPackageStartupMessages(library(tidyverse))
library(fivethirtyeight)
library(moderndive)

# clean_test is ordered factor
bechdel$clean_test[1:5]
#> [1] notalk ok     notalk notalk men   
#> Levels: nowomen < notalk < men < dubious < ok

# weird output for dummy variables in regression table
lm(domgross~clean_test, data = bechdel) %>% 
  get_regression_table()
#> Warning: package 'bindrcpp' was built under R version 3.4.4
#> # A tibble: 5 x 7
#>   term           estimate std_error statistic p_value   conf_low conf_high
#>   <chr>             <dbl>     <dbl>     <dbl>   <dbl>      <dbl>     <dbl>
#> 1 intercept     70491451.  2412903.    29.2    0.      65759017. 75223886.
#> 2 clean_test.L  -3804412.  5244171.    -0.725  0.468  -14089823.  6480999.
#> 3 clean_test.Q  -9916737.  5398245.    -1.84   0.0660 -20504334.   670861.
#> 4 clean_test.C    750590.  5353012.     0.140  0.889   -9748292. 11249471.
#> 5 clean_test^4  -9507099.  5580759.    -1.70   0.0890 -20452662.  1438464.
lm(domgross~clean_test, data = bechdel) %>% 
  summary()
#> 
#> Call:
#> lm(formula = domgross ~ clean_test, data = bechdel)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -79345264 -52417225 -25031559  22438057 691533349 
#> 
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  70491451    2412903  29.214   <2e-16 ***
#> clean_test.L -3804412    5244171  -0.725   0.4683    
#> clean_test.Q -9916736    5398245  -1.837   0.0664 .  
#> clean_test.C   750590    5353012   0.140   0.8885    
#> clean_test^4 -9507099    5580759  -1.704   0.0886 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 80100000 on 1772 degrees of freedom
#>   (17 observations deleted due to missingness)
#> Multiple R-squared:  0.008974,   Adjusted R-squared:  0.006737 
#> F-statistic: 4.012 on 4 and 1772 DF,  p-value: 0.003042

# should look like
bechdel %>% 
  mutate(clean_test = factor(clean_test, ordered = FALSE)) %>% 
  lm(domgross~clean_test, data = .) %>% 
  get_regression_table()
#> # A tibble: 5 x 7
#>   term            estimate std_error statistic p_value  conf_low conf_high
#>   <chr>              <dbl>     <dbl>     <dbl>   <dbl>     <dbl>     <dbl>
#> 1 intercept         6.62e7  6793664.     9.75   0.        5.29e7 79547620.
#> 2 clean_testnot…    1.31e7  7663750.     1.72   0.0870   -1.89e6 28172609.
#> 3 clean_testmen     2.75e6  8910344.     0.309  0.758    -1.47e7 20226986.
#> 4 clean_testdub…    9.79e6  9573562.     1.02   0.307    -8.99e6 28562779.
#> 5 clean_testok     -4.34e6  7364354.    -0.589  0.556    -1.88e7 10106207.
bechdel %>% 
  mutate(clean_test = factor(clean_test, ordered = FALSE)) %>% 
  lm(domgross~clean_test, data = .) %>% 
  summary()
#> 
#> Call:
#> lm(formula = domgross ~ clean_test, data = .)
#> 
#> Residuals:
#>       Min        1Q    Median        3Q       Max 
#> -79345264 -52417225 -25031559  22438057 691533349 
#> 
#> Coefficients:
#>                   Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)       66223181    6793664   9.748   <2e-16 ***
#> clean_testnotalk  13141668    7663750   1.715   0.0866 .  
#> clean_testmen      2751095    8910344   0.309   0.7575    
#> clean_testdubious  9786117    9573562   1.022   0.3068    
#> clean_testok      -4337528    7364354  -0.589   0.5559    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 80100000 on 1772 degrees of freedom
#>   (17 observations deleted due to missingness)
#> Multiple R-squared:  0.008974,   Adjusted R-squared:  0.006737 
#> F-statistic: 4.012 on 4 and 1772 DF,  p-value: 0.003042

Created on 2018-04-04 by the reprex package (v0.2.0).

@rudeboybert rudeboybert self-assigned this Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant