/
HW6_Markdown.Rmd
194 lines (164 loc) · 6.03 KB
/
HW6_Markdown.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
title: "HW6_Markdown"
output: pdf_document
date: "2024-03-07"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Manay Divatia
## md46245
## 1
# a
```{r a}
justices = read.csv("~/Downloads/justices.csv")
library(dplyr)
median_ideal_by_year <- justices %>%
group_by(term) %>%
summarize(median_ideal_point = median(idealpt, na.rm = TRUE))
median_ideal_by_year
library(ggplot2)
ggplot(median_ideal_by_year, aes(x = term, y = median_ideal_point)) +
geom_line() +
labs(title = "Median Ideal Point Over Time",
x = "Year",
y = "Median Ideal Point")
```
# b
```{r b}
merged_data <- merge(justices, median_ideal_by_year, by = "term")
justice_with_median <- merged_data %>%
filter(idealpt == median_ideal_point) %>%
group_by(term) %>%
slice(which.min(abs(idealpt - median_ideal_point))) %>%
ungroup()
justice_counts <- table(justice_with_median$justice)
most_common_justice <- names(justice_counts)[which.max(justice_counts)]
print(justice_with_median)
print(justice_counts)
print(paste("Justice with the most appearances:", most_common_justice))
```
White was the justice that had the median ideal point the most. This justice
served on the Court for 13 terms. His average ideal point over his entire
tenure was .772.
# c
```{r c}
democratic_presidents <- unique(justices$pres[justices$pparty == "D"])
republican_presidents <- unique(justices$pres[justices$pparty == "R"])
democratic_ideology_shifts <- numeric(0)
republican_ideology_shifts <- numeric(0)
```
# d
```{r d}
for(president in democratic_presidents) {
president_data <- justice_with_median %>% filter(pres == president)
ideology_shift <- last(president_data$median_ideal_point) - first(president_data$median_ideal_point)
democratic_ideology_shifts <- c(democratic_ideology_shifts, ideology_shift)
}
for(president in republican_presidents) {
president_data <- justice_with_median %>% filter(pres == president)
ideology_shift <- last(president_data$median_ideal_point) - first(president_data$median_ideal_point)
republican_ideology_shifts <- c(republican_ideology_shifts, ideology_shift)
}
print("Democratic Ideology Shifts:")
print(democratic_ideology_shifts)
print("Republican Ideology Shifts:")
print(republican_ideology_shifts)
```
# e
```{r e}
mean(democratic_ideology_shifts)
sd(democratic_ideology_shifts)
mean(republican_ideology_shifts)
sd(republican_ideology_shifts)
```
Reagan had the largest conservative shift on the Court and Kennedy had the
largest liberal shift on the Court.
# f
```{r f}
plot <- ggplot(median_ideal_by_year, aes(x = term, y = median_ideal_point)) +
geom_line(color = "black", linewidth = 2) + # Black line for overall median ideal point
labs(title = "Median Supreme Court Ideal Point Over Time",
x = "Term",
y = "Median Ideal Point")
plot <- plot +
geom_line(data = justice_with_median, aes(x = term, y = idealpt, color = pparty), size = 1) +
geom_text(data = justice_with_median %>% distinct(justice, .keep_all = TRUE),
aes(x = term, y = idealpt, label = justice, color = pparty),
vjust = -0.5, hjust = 0.5, size = 3) +
scale_color_manual(values = c("blue", "red"), guide = FALSE) # Blue for Democratic, Red for Republican
# Show the plot
print(plot)
```
The plot has a lot of data within it. What we can see though is how the median
ideal point follows one party or the other. We can see how vast the difference
is between democratic and republican justices.
## 2
# a
```{r 2a}
mother_df = read.csv("~/Downloads/yu2017sample.csv")
length(unique(mother_df$PUBID))
summary(mother_df)
length(unique(mother_df$PUBID)) / 15
```
One advantage of using a contemporary cohort of women rather than an older
cohort is that we can make better claims about our current time rather than a
previous time. One disadvantage, though, is that we are limited in the data
that we have since the cohort is contemporary.
# b
```{r 2b}
children_counts <- table(mother_df$numChildren)
```
We see from the table that the majority of women have no children. From there,
there is less and less observations as the number of children increase. This
could be some bias in our data of who are who aren't mothers.
# c
```{r 2c}
mother_df$isMother <- ifelse(mother_df$numChildren > 0, 1, 0)
has_children_count <- table(mother_df$isMother)
```
Again we can see that there are far more women who aren't mothers than those
who are which could be something to consider.
# d
```{r 2d}
mother_df$logwage <- log(mother_df$wage)
wage_plot <- ggplot(mother_df, aes(x = factor(educ), y = wage)) +
geom_boxplot() +
labs(title = "Boxplot of Wage by Educational Level",
x = "Educational Level",
y = "Wage")
wage_plot
logwage_plot <- ggplot(mother_df, aes(x = factor(educ), y = logwage)) +
geom_boxplot() +
labs(title = "Boxplot of Wage by Educational Level",
x = "Educational Level",
y = "Wage")
logwage_plot
```
From the plots, we can see how as the women gain more and more education,
their wage increases with the median seemingly linearly increasing.
# e
```{r 2e}
logwage_plot <- ggplot(mother_df, aes(x = year, y = logwage, color = factor(isMother))) +
geom_line(stat = "summary", fun = "mean", size = 1, linetype = "solid") +
labs(title = "Mean Logwage Over Time for Mothers and Non-Mothers",
x = "Year",
y = "Mean Logwage",
color = "Mother Status") +
scale_color_manual(values = c("blue", "red"), labels = c("Non-Mothers", "Mothers")) +
theme_minimal()
```
Until 2005, the mean logwage seemed to be somewhat equal between non-mothers
and mothers. However, after that the data shows that non-mothers make more
money than mothers. In 2012, non-mothers had a mean logwage of 7.5 while
mothers had 7.25.
# f
```{r 2f}
woman_fe <- as.factor(mother_df$PUBID)
year_fe <- as.factor(mother_df$year)
fixed_effects_model <- lm(logwage ~ numChildren + fullTime + multipleLocations + unionized + industry, data = mother_df)
#summary(fixed_effects_model)
```
The coefficient of numChildren is 0.0157. I think this is small but it
makes sense considering there are so many variables that could play a
role in logwage.