Thomas Bayes was an eighteen-century Presbyterian minister and gifted statistician. Two of his publications, Divine Benevolence, or an Attempt to Prove That the Principal End of the Divine Providence and Government is the Happiness of His Creatures (1731), and An Essay towards solving a Problem in the Doctrine ofChances (1763), are examples of how he wanted to use his intellect to prove the existence of God. His attempts were unsuccessful, otherwise Thomas Bayes would be a known to all and not just to mathematicians. However, his devotion is honored as his approach was so rigorous that it established a new field of mathematics, now named Bayesian Statistics.
Our current world-wide Covid-19 crisis provides an example of how Bayesian Statistics is applied.
First, here is the nomenclature for revised probabilities:
P(A) Probability of Event A
P(B) Probability of Event B
P(A|B) Probability of A, given B is true
Fictitious Covid-19 problem
Suppose a test for Covid-19 is 95% effective in detecting the disease when present in a patient. The test gives a false positive result 10% of time when the disease is not present. Approximately 4% of the population has the disease. Given a positive test result, what is the probability that a patient actually has the disease.
The three facts stated above can be expressed as:
A. P(D) = .04 4% of the population has the disease, and therefore P(N)=.96 or 96% of population doesn’t have the disease.
B. P(Pos|D) = .95 The test gives a positive result 95% of the time when the disease is present
C. P(Pos|N) = .10 Given no disease, 10% of those tested will falsely test positive
Now, a typical question is: given a positive test result, what is the probability a patient actually has the disease? Using our nomenclature, we are looking for P(D|Pos) or probability of D, given Pos. We did have P(Pos|D) = .95, but that is different than P(D|Pos). In Bayesian terms, this question is called an inverse probability problem.
Bayes Theorem provides a formula for determining this revised probability, but I think it easier to express our known and unknown probabilities in a table. Our first given facts are pre-filled in the table.
|
Disease Present (D) |
No Disease (N) |
Total |
Positive Test result (Pos) |
.95 x .04 = .038 (Fact B) |
.10 x .96 = .096 (Fact C) |
|
Negative Test result |
|
|
|
Total |
.04 (Fact A) |
.96 |
|
All the remaining entries in the table can be determined by simple addition and subtraction. Adding across the top row, the total proportion of the population with Positive Test results is .038 + .096 = .134; the proportion with Negative Test results when the disease is present is .04 - .038 = .002; the proportion having a Negative Test result with no disease is .96 - .096 = .864; the total proportion having Negative test results is 1.00 - .134 = .866 and with that last item we can complete all the entries in the table.
|
Disease Present (D) |
No Disease (N) |
Total |
Positive Test result (Pos) |
.95 x .04 = .038 |
.10 x .96 = .096 |
.134 |
Negative Test result |
.04 - .038 = .02 |
.96 - .096 = .864 |
.866 |
Total |
.04 |
.96 |
1.00 |
For our question of the probability of the Disease being present, given a Positive test result, we just need to focus on the top row of the table representing different ways to have a Positive test result.
.038 will test Positive when they actually have the disease
.096 will test Positive when they don’t have the disease (from false positive results)
In
total, the proportion of the population that will have Positive results is .134
and of those .038 actually have the disease. Therefore, P(D|Pos) = .038/.134 =
.284 so only 28.4% of those with a Positive test result will actually have the
disease. This often strikes people as being too low. This is a common situation
of imperfect tests and is known as the Paradox of the False Positive. From the
table, we see that many more people had false positives than people who
actually have the disease. Even though many medical tests have this False Positive error, the tests are still helpful. In our example, a person testing positive has an increase risk level compared to the general population (28.4% vs 4%) so the positive result would justify further testing and enhanced precautionary measures.
Returning to Bayes' Theorem, the inverse probability is given by:
P(D|Pos) = P(Pos|D) x P(D) / P(Pos)
The table given above provides the same answer as the Bayes' Theorem equation.
Additional Sources on Bayes:
CornellBlog: https://blogs.cornell.edu/info2040/2018/11/28/bayes-theorem-and-the-existence-of-god/
DanKopk article on Bayes and Price: https://qz.com/1315731/the-most-important-formula-in-data-science-was-first-used-to-prove-the-existence-of-god/
Image details: Public Domain
- File:Michelangelo - Creation of Adam (cropped).jpg
- Created: circa 1511 date QS:P571,+1511-00-00T00:00:00Z/9,P1480,Q5727902
No comments:
Post a Comment