In this section we will talk about priors, what we believe prior to seeing the data. Sometimes Priors are very rigours, and based on previous research, and sometimes they are more diffuse.
As in the previous post:
The symbol "$\neg$" means not, thus
In March COVID19 tests were conserved for health care providers. My wife's coworker came down sick after coming home from a COVID19 hot zone, and subsequently other colleges got sick, along with my entire family. While we were recovering a partner of another coworker got sick and tested positive. So we were part of a chain were many people got sick, but only one person a few step removed from us were ever tested (and was positive).
What is the probability that my wife had COVID19?, my wife had all the symptoms of a moderate case of COVID19. Because of the positive test, the symptoms, and the contagion of the disease, I estimated that there were 85% chance that we had COVID19.
While 85% is somewhat arbitrary, it was the best inference I personally could make based on the then available "data", and in any case, it is better than picking 3% (the number of people in Denmark with antibodies).
In the previous post we learned that
Thus, by reorganizing
And, the same holds if we flip the variables
Obviously
And by dividing
In common parlance, the 4 parts of Bayes Theorem are called:
The Sensitivity ($P(D\mid{}\theta)$) and Specificity ($P(\neg{}D\mid{}\neg\theta)$) are
likelihoods which tells us how likely it is to observe the two datums
How was the data generated?, Here the example is one positive test. Because our variable is dichotomous, there are two ways to generate that data:
Either you have a positive test and have COVID19, or you have a positive test and you do not have COVID19. We do not have these joined probabilities at hand, but we know to convert between joined and conditional probabilities:
The above is very close to what we have, but we do not have
Knowing this we can reformulate the data generation process in terms of the 3 variables we have:
What we have is a prior ($P(\theta)$) and, two likelihoods:
So my wife took an antibody test, and it was negative. Thus we have to
reformulate Bayes Theorem and the generative process in terms of
First we describe how the data could be generated, the data here being the
Negative test (
Then the posterior (the probability of having COVID19 given a negative test) is:
Let's plug in the numbers for
And then for the posterior:
So there is a 17% chance that my wife has had COVID19.
Is this the best inference we could do with the available
data?. Can we now proclaim ourselves enlightened Bayesian. NO!, what we
have done is calculating a point estimates... True Bayesians think in
distributions, There is some uncertainly associated with the Data used to
calculate specificity and sensitivity, and there are also some uncertainly
around my point estimate of
The uncertainly associated with the test data is relatively easy to take into account, and is at the heart of Bayesian modeling, and will be the topic of the next post.