/
06-conditional-probability.Rmd
366 lines (252 loc) · 21 KB
/
06-conditional-probability.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
# Conditional Probability
`r newthought("The")` chances of crashing your car are pretty low, but they're considerably higher if you're drunk. Probabilities change depending on the conditions.
We symbolize this idea by writing $\p(A \given B)$, the probability that $A$ is true *given* that $B$ is true. For example, to say the probability of $A$ given $B$ is 30%, we write:
$$ \p(A \given B) = .3. $$
We call this kind of probability ***conditional probability***. But how do we calculate conditional probabilities?
## Calculating Conditional Probability
```{r echo=FALSE, fig.show='hold', fig.margin=TRUE, fig.cap="Conditional probability in a fair die roll"}
die1 <- readPNG("img/die/die1.png") %>% rasterGrob()
die2 <- readPNG("img/die/die2.png") %>% rasterGrob()
die3 <- readPNG("img/die/die3.png") %>% rasterGrob()
die4 <- readPNG("img/die/die4.png") %>% rasterGrob()
die5 <- readPNG("img/die/die5.png") %>% rasterGrob()
die6 <- readPNG("img/die/die6.png") %>% rasterGrob()
rect1 <- geom_rect(aes(xmin = 0, ymin = 0, xmax = 3, ymax = 1),
size = 1, fill = bookred)
rect2 <- geom_rect(aes(xmin = 1.05, ymin = 0.05, xmax = 2.95, ymax = .95),
size = 1, fill = bookpurple)
p <- ggplot() +
theme_void() + coord_fixed() +
xlim(0, 3) + ylim(0, 2) +
rect1 +
annotation_custom(die1, xmin = 0, xmax = 1, ymin = 1, ymax = 2) +
annotation_custom(die3, xmin = 1, xmax = 2, ymin = 1, ymax = 2) +
annotation_custom(die5, xmin = 2, xmax = 3, ymin = 1, ymax = 2) +
annotation_custom(die2, xmin = 0, xmax = 1, ymin = 0, ymax = 1) +
annotation_custom(die4, xmin = 1, xmax = 2, ymin = 0, ymax = 1) +
annotation_custom(die6, xmin = 2, xmax = 3, ymin = 0, ymax = 1)
p
p$layers <- append(p$layers, rect2, 3)
p
```
Suppose I roll a fair, six-sided die behind a screen. You can't see the result, but I tell you it's an even number. What's the probability it's also a "high" number: either a $4$, $5$, or $6$?
Maybe you figured the correct answer: $2/3$. But why is that correct? Because, out of the three even numbers ($2$, $4$, and $6$), two of them are high ($4$ and $6$). And since the die is fair, we expect it to land on a high number $2/3$ of the times it lands on an even number.
This hints at a formula for $\p(A \given B)$.
Conditional Probability
:
$$ \p(A \given B) = \frac{\p(A \wedge B)}{\p(B)}. $$
In the die-roll example, we considered how many of the $B$ possibilities were also $A$ possibilities. Which means we divided $\p(A \wedge B)$ by $\p(B)$.
In fact, this formula is our official definition for the concept of conditional probability. When we write the sequence of symbols $\p(A \given B)$, it's really just shorthand for the fraction $\p(A \wedge B) / \p(B)$.
```{r condprob, echo=FALSE, fig.margin=TRUE, fig.cap="Conditional probability is the size of the $A \\wedge B$ region compared to the entire $B$ region."}
x <- seq(-.75, .75, 0.01)
upper <- function(x) {
a <- sqrt(1.5^2 - (x[x < 0] - .75)^2)
b <- sqrt(1.5^2 - (x[x >= 0] + .75)^2)
c(a,b)
}
ggplot() +
coord_fixed() + theme_void() +
xlim(-3,3) + ylim(-2,2) +
geom_circle(aes(x0 = -.75, y0 = 0, r = 1.5), fill = bookred) +
geom_circle(aes(x0 = .75, y0 = 0, r = 1.5), fill = bookblue) +
geom_ribbon(aes(x = x, ymin = upper(x), ymax = -upper(x)),
fill = bookpurple, colour = "black") +
geom_text(aes(x = c(-2.25, 2.25), y = c(1, 1), label = c("A", "B")),
fontface = "italic", size = 7) +
theme(panel.border = element_rect(colour = "black", fill = NA, size = 1))
```
In terms of an Euler diagram (Figure \@ref(fig:condprob)), the definition of conditional probability compares the size of the purple $A \wedge B$ region to the size of the whole $B$ region, purple and blue together. If you don't mind getting a little colourful with your algebra:
$$
\p(A \given B) = \frac{\color{bookpurple}{\blacksquare}}{\color{bookpurple}{\blacksquare} + \color{bookblue}{\blacksquare}}.
$$
So the definition works because, informally speaking, $\p(A \wedge B)/\p(B)$ is the proportion of the $B$ outcomes that are also $A$ outcomes.
`r newthought("Dividing")` by zero is a common pitfall with conditional probability. Notice how the definition of $\p(A \given B)$ depends on $\p(B)$ being larger than zero. If $\p(B) = 0$, then the formula
```{marginfigure}
The comedian Steven Wright once quipped that "black holes are where God divided by zero."
```
$$ \p(A \given B) = \frac{\p(A \wedge B)}{\p(B)} $$
doesn't even make any sense. There is no number that results from the division on the right hand side.[^alternatesystems]
[^alternatesystems]: There are alternative mathematical systems of probability, where conditional probability is defined differently to avoid this problem. But in this book we'll stick to the standard system. In this system, there's just no such thing as "the probability of $A$ given $B$" when $B$ has zero probability.
In such cases we say that $\p(A \given B)$ is *undefined*. It's not zero, or some special number. It just isn't a number.
## Conditional Probability & Trees
We already encountered conditional probabilities informally, when we used a tree diagram to solve [the Monty Hall problem][The Monty Hall Problem].
In a tree diagram, each branch represents a possible outcome. The number placed on that branch represents the chance of that outcome occurring. But that number is based on the assumption that all branches leading up to it occur. So the probability on that branch is conditional on all previous branches.
For example, suppose there are two urns of coloured marbles.
- Urn X contains 3 black marbles, 1 white.
- Urn Y contains 1 black marble, 3 white.
I flip a fair coin to decide which urn to draw from, heads for Urn X and tails for Urn Y. Then I draw one marble at random.
```{r urntree, echo=FALSE, message=FALSE, fig.cap="Tree diagram for an urn problem", fig.margin=TRUE}
g <- data.frame(from = c(1, 1, 2, 2, 3, 3),
to = c(2, 3, 4, 5, 6, 7)) %>%
graph_from_data_frame()
E(g)$weight <- c("1/2", "1/2", "3/4", "1/4", "1/4", "3/4")
vertex_attr(g, "name") <- c(NA, "italic(H)", "italic(T)",
"italic(B)~~~~~~bold('3/8')", "italic(W)~~~~~~bold('1/8')",
"italic(B)~~~~~~bold('1/8')", "italic(W)~~~~~~bold('3/8')")
ggraph(g, layout = "tree") +
geom_edge_link(aes(label = weight),
label_size = 7,
angle_calc = "along",
label_dodge = unit(.2, "inches")) +
geom_node_label(aes(label = name, filter = !is.na(name)),
size = 7,
parse = TRUE,
label.padding = unit(.5, "lines"),
label.size = 0,
hjust = c(rep(.5, 2), rep(0, 4))) +
scale_y_reverse(expand = expansion(add = c(.05, .5))) +
scale_x_reverse() +
theme_void() +
coord_flip()
```
In Figure \@ref(fig:urntree), the probability of drawing a black marble on the top path is $3/4$ because we are assuming the coin landed heads, and thus I'm drawing from Urn X. If the coin lands tails instead, and I draw from Urn Y, then the chance of a black marble is instead $1/4$. So these quantities are conditional probabilities:
$$
\begin{aligned}
\p(B \given H) &= 3/4,\\
\p(B \given T) &= 1/4.
\end{aligned}
$$
Notice, though, the first branch in a tree diagram is different. In the $H$-vs.-$T$ branch, the probabilities are *un*conditional, since there are no previous branches for them to be conditional on.
## More Examples
Imagine an urn contains marbles of three different colours: 20 are red, 30 are blue, and 40 are green. I draw a marble at random. What is $\p(R \given \neg B)$, the probability it's red given that it's not blue?
$$
\begin{aligned}
\p(R \given \neg B) &= \frac{\p(R \wedge \neg B)}{\p(\neg B)}\\
&= \frac{\p(R)}{\p(\neg B)}\\
&= \frac{20/90}{60/90}\\
&= 1/3.
\end{aligned}
$$
This calculation relies on the fact that $R \wedge \neg B$ is logically equivalent to $R$. A red marble is automatically not blue, so $R$ is true under exactly the same circumstances as $R \wedge \neg B$. The [Equivalence Rule][Tautologies, Contradictions, and Equivalent Propositions] thus tells us $\p(R \wedge \neg B) = \p(R)$.
`r newthought("Suppose")` a university has 10,000 students. Each is studying under one of four broad headings: Humanities, Social Sciences, STEM, or Professional. Under each of these categories, the number of students with an average grade of A, B, C, or D is listed in the following table. What is the probability a randomly selected student will have an A average, given that they are studying either Humanities or Social Sciences?
```{r studentgrades, echo=FALSE, echo=FALSE}
df <- data.frame(
`Humanities` = c(200, 500, 250, 50),
`Social Sciences` = c(600, 800, 400, 200),
`STEM` = c(400, 1600, 1500, 500),
`Professional` = c(900, 900, 750, 450)
)
colnames(df) <- c("Humanities", "Social Sciences", "STEM", "Professional")
rownames(df) <- c("A", "B", "C", "D")
knitr::kable(df, align = "c")
```
$$
\begin{aligned}
\p(A \given H \vee S) &= \frac{\p(A \wedge (H \vee S))}{\p(H \vee S)}\\
&= \frac{800/10,000}{3,000/10,000}\\
&= 4/15.
\end{aligned}
$$
What about the reverse probability, that a student is studying either Humanities or Social Sciences given that they have an A average?
$$
\begin{aligned}
\p(H \vee S \given A) &= \frac{\p((H \vee S) \wedge A)}{\p(A)}\\
&= \frac{800/10,000}{2,100/10,000}\\
&= 8/21.
\end{aligned}
$$
Notice how we get a different number now.
## Order Matters
In general, the probability of $A$ given $B$ will be different from the probability of $B$ given $A$. These are different concepts.
For example, university students are usually young, but young people aren't usually university students. Most aren't even old enough to be in university. So the probability someone is young given they are in university is high. But the probability someone is in university given that they are young is low. So $\p(Y \given U) \neq \p(U \given Y)$.
Once in a while we do find cases where $\p(A \given B) = \p(B \given A)$. For example, suppose we throw a dart at random at a circular board, divided into four quadrants. The chance the dart will land on the left half given that it lands on the top half is the same as the chance it lands on the top half given it lands on the left. Both probabilities are $1/2$.
But this kind of thing is the exception rather than the rule. Usually, $\p(A \given B)$ will be a different number from $\p(B \given A)$. So it's important to remember how order matters.
```{block, type='warning'}
When we write $\p(A \given B)$, we are discussing the probability of $A$. But we are discussing it under the assumption that $B$ is true.
```
## Declaring Independence
We explained independence informally back in [Chapter 4][Independence]: $A$ and $B$ are independent if the truth of one doesn't change the probability of the other. Now that we've formally defined conditional probability, we can formally define independence too.
Independence
: $A$ is independent of $B$ if $\p(A \given B) = \p(A)$ and $\p(A) > 0$.
In other words, they're independent if $A$'s probability is the same after $B$ is given as it was before (and not just for the silly reason that there was no chance of $A$ being true to begin with).
Now we can establish three useful facts about independence.
`r newthought("The first")` is summed up in the mantra "independence means multiply." This actually has two parts.
We already learned the first part with the Multiplication Rule: if $A$ is independent of $B$, then $\p(A \wedge B) = \p(A)\p(B)$. Except now we can see why this rule holds, using the definition of conditional probability and some algebra:
$$
\begin{aligned}
\p(A \given B) &= \frac{\p(A \wedge B)}{\p(B)} & \mbox{by definition}\\
\p(A \given B)\p(B) &= \p(A \wedge B) & \mbox{by algebra}\\
\p(A)\p(B) &= \p(A \wedge B) & \mbox{by independence}.
\end{aligned}
$$
The second part of the "independence means multiply" mantra is new though. It basically says that the reverse also holds. As long as $\p(A) > 0$ and $\p(B) > 0$, if $\p(A \wedge B) = \p(A)\p(B)$, then $A$ is independent of $B$.
Bottom line: as long as there are no zeros to worry about, independence is the same thing as $\p(A \wedge B) = \p(A)\p(B)$.
`r newthought("Second,")` independence is symmetric. If $A$ is independent of $B$, then $B$ is independent of $A$. Informally speaking, if $B$ makes no difference to $A$'s probability, then $A$ makes no difference to $B$'s probability.
This is why we often say "$A$ and $B$ are independent," without specifying which is independent of which. Since independence goes both ways, they're automatically independent of each other.
`r newthought("Third,")` independence extends to negations. If $A$ is independent of $B$, then it's also independent of $\neg B$ (as long as $\p(\neg B) > 0$, so that $\p(A \given \neg B)$ is well-defined).
Notice, this also means that if $A$ is independent of $B$, then $\neg A$ is independent of $\neg B$ (as long as $\p(\neg A) > 0$).
`r newthought("So far")` our definition of independence only applies to two propositions. We can extend it to three as follows.
Three-way Independence
: $A$, $B$, and $C$ are independent if
i. $A$ is independent of $B$, $A$ is independent of $C$, and $B$ is independent of $C$, and
ii. $\p(A \wedge B \wedge C) = \p(A)\p(B)\p(C)$.
In other words, a trio of propositions is independent if each pair of them is independent, and the multiplication rule applies to their conjunction. The same idea can be extended to define independence for four propositions, five, etc.
You may be wondering: is part (ii) of this definition really necessary? If each pair of propositions is independent, then the multiplication rule applies to any two of them. Doesn't this guarantee that it applies to all three together?
Curiously, the answer is no. Suppose someone has equal chances of having been born in the spring, summer, fall, or winter. Let $A$ say that they were born in either spring or summer; let $B$ say they were born in either summer or fall; and let $C$ say they were born in either fall or spring. Pairwise these propositions are all independent. But $\p(A \wedge B \wedge C) = 0$ by the Contradiction rule, since there is no season shared by all three propositions; they cannot all be true. And yet $\p(A)\p(B)\p(C) = 1/8$.
## Exercises {-#ch6ex}
#. Answer each of the following:
a. On a fair die with six sides, what is the probability of rolling a low number (1, 2, or 3) given that you roll an even number.
#. On a fair die with eight sides, what is the probability of rolling an even number given that you roll a high number (5, 6, 7, or 8)?
#. Suppose $\p(B) = 4/10$, $\p(A) = 7/10$, and $\p(B \wedge A) = 2/10$.
What are each of the following probabilities?
a. $\p(A \given B)$
#. $\p(B \given A)$
#. Five percent of tablets made by the company Ixian have factory defects. Ten percent of the tablets made by their competitor company Guild do. A computer store buys $40\%$ of its tablets from Ixian, and $60\%$ from Guild.
```{marginfigure, echo=TRUE}
This exercise and the next one are based on very similar exercises from Ian Hacking's wonderful book, *An Introduction to Probability and Inductive Logic*.
```
Draw a probability tree to answer the following questions.
a. What is the probability a randomly selected tablet in the store is made by Ixian and has a factory defect?
#. What is the probability a randomly selected tablet in the store has a factory defect?
#. What is the probability a tablet from this store is made by Ixian, given that it has a factory defect?
#. In the city of Elizabeth, the neighbourhood of Southside has lots of chemical plants. $2\%$ of Elizabeth's children live in Southside, and $14\%$ of those children have been exposed to toxic levels of lead. Elsewhere in the city, only $1\%$ of the children have toxic levels of exposure.
Draw a probability tree to answer the following questions.
a. What is the probability that a randomly chosen child from Elizabeth lives in Southside and has toxic levels of lead exposure?
b. What is the probability that a randomly chosen child from Elizabeth has toxic levels of lead exposure?
c. What is the probability that a randomly chosen child from Elizabeth who has toxic levels of lead exposure lives in Southside?
#. Imagine 100 prisoners are sentenced to death. 70 of them are housed in cell block A, the other 30 are in cell block B. Of the prisoners in cell block A, 9 are innocent. Only 1 prisoner in cell block B is innocent.
The law requires that one prisoner be pardoned. The lucky prisoner will be selected by flipping a fair coin to choose either cell block A or B. Then a fair lottery will be used to select a random prisoner from the chosen cell block.
What is the probability the pardoned prisoner comes from cell block A if she is innocent? Answer each of the following to find out.
$I$ = The pardoned prisoner is innocent.\
$A$ = The pardoned prisoner comes from cell block A.
a. What is $Pr(I \given A)$?
b. What is $Pr(A \wedge I)$?
c. What is $Pr(I \given B)$?
d. What is $Pr(B \wedge I)$?
e. What is $Pr(I)$?
f. What is $Pr(A \given I)$?
g. Draw a probability tree to visualize and verify your calculations.
#. Suppose $A$, $B$, and $C$ are independent, and they each have the
same probability: $1/3$. What is $\p(A \wedge B \given C)$?
#. If $A$ and $B$ are mutually exclusive, what is $\p(A \given B)$? Justify your answer using the definition of conditional probability.
#. Which of the following situations is impossible? Justify your answer.
a. $\p(A) = 1/2, \p(A \given B) = 1/2, \p(B \given A) = 1/2$.
b. $\p(A) = 1/2, \p(A \given B) = 1, \p(A \given \neg B) = 1$.
#. Is the following statement true or false: if $A$ and $B$ are mutually exclusive, then $Pr(A \vee B \given C) = Pr(A \given C) + Pr(B \given C)$. Justify your answer.
#. Justify the second part of the "independence means multiply" mantra: if $\p(A) > 0$, $\p(B) > 0$, and $\p(A \wedge B) = \p(A) \p(B)$, then $A$ is independent of $B$.
Hint: start by supposing $\p(A) > 0$, $\p(B) > 0$, and $\p(A \wedge B) = \p(A)\p(B)$. Then apply some algebra and the definition of conditional probability.
#. Justify the claim that independence is symmetric: if $A$ is independent of $B$, then $B$ is independent of $A$.
Hint: start by supposing that $A$ is independent of $B$. Then write out $\p(A \given B)$ and apply the definition of conditional probability.
#. Suppose $A$, $B$, and $C$ are independent. Is it possible that $\p(A \wedge B \wedge C) = 0$? If yes, give an example where this happens. If no, prove that it cannot happen.
#. Suppose we have $4$ apples and $10$ buckets. We place each apple in a random bucket; the placement of each apple is independent of the others. Let $B_{ij}$ be the proposition that apples $i$ and $j$ were placed in the same bucket.
a. Is $B_{12}$ independent of $B_{34}$?
#. Is $B_{12}$ independent of $B_{23}$?
#. Is every pair of $B_{ij}$ propositions independent?
#. Is every trio of $B_{ij}$ propositions independent?
#. Suppose we have a coin whose bias we want to learn, so we're going to flip it $3$ times. We start out by assigning the same probability to each possible sequence of heads and tails. For example, the sequences $HTH$ and $TTT$ are equally likely, as are all other sequences.
a. Before we do our $3$ flips, what is the probability of $HTH$?
#. What is the probability of heads on the third flip, given that the first two flips land heads?
#. Prove that if $A$ logically entails $B$, then $\p(B \given A) = 1$.
#. Suppose the following three conditions hold:
i. $\p(A) = \p(\neg A)$,
ii. $\p(B \given A) = \p(B \given \neg A)$,
iii. $\p(B) > 0$.
Must the following be true then?
iv. $\p(A \given B) = \p(A \given \neg B) = 1/2$?
If yes, prove that (iv) must hold. If no, give a counterexample: draw an Euler diagram where conditions (i)--(iii) hold, but not (iv).
#. Prove that the equation $\p(A \given B) \p(B) = \p(B \given A) \p(A)$ always holds. (Assume both conditional probabilities are well-defined.)
#. Prove that the following equation always holds, assuming the conditional probabilities are well-defined:
$$ \frac{\p(A \given B)}{\p(B \given A)} = \frac{\p(A)}{\p(B)}. $$
#. Does the equation $\p(A \given B) = \p(\neg B \given \neg A)$ always hold, assuming both conditional probabilities are both well-defined? If yes, prove that it does. If no, draw an eikosogram where it fails to hold.
#. Suppose an urn contains 30 black marbles and 70 white. We randomly draw 5 marbles with replacement. Let $A$ be the proposition that $3$ of the first $4$ draws are black, and let $B$ be the proposition that the $5$^th^ draw is black. Calculate the following quantity:
$$ \frac{Pr(A \,\vert\, B)}{Pr(B \,\vert\, A)} \frac{Pr(B)}{Pr(A)}. $$