The reference class problem has plagued thinkers for centuries, but as probability and statistics become part of everyday parlance, the issue becomes even more prescient. If we want to draw valuable conclusions, we must take the self out of statistics, writes Aubrey Clayton.
The COVID-19 pandemic has, sadly, made armchair epidemiologists and statisticians of us all. Just to understand the daily news, people with little or no previous expertise have had to quickly become conversant in technical argot including terms like “basic reproduction number (R0),” “positive predictive value,” and “case fatality rate,” among others. Predictably, the introduction of this vocabulary of ideas to the general public has also invited a number of rookie mistakes and elementary fallacies. For example, supposing a particular test for the novel coronavirus had a 99% specificity rate, meaning 99 out of every 100 truly virus-free people will test negative, what is the chance that someone who tests positive actually has the virus? If, before attempting an answer, you don't immediately think to ask what the overall incidence rate of the virus is in the population, then you have committed a well-worn fallacy called Base Rate Neglect.
The correct assembly guide for the relevant pieces of information, as always when reasoning about uncertainty, is Bayes’ Theorem, the simple equation that relates the prior probability of a statement (that is, the probability we assign before considering some new piece of evidence) to its posterior probability (the new probability we give it in light of the evidence). The rule dictates that the posterior probability is proportional to the prior probability and the conditional probability, representing the chance of making the given observation assuming the statement were true. In our above testing scenario, the conditional probability might reflect the accuracy rates of the test (how often it turns up positive if the person does or does not have the virus), and the posterior probability is the one we care about: how likely is a person to have the virus after having tested positive. Bayesian Reasoning 101 tells us that we also need to know the chance the person has the virus before considering the test, that is, the base incidence rate of the virus for that person’s population.
The correct assembly guide for the relevant pieces of information, as always when reasoning about uncertainty, is Bayes’ Theorem.
However, concealed in that last statement is a subtlety of probabilistic thinking not many people have adequately grappled with yet: How do we determine the relevant population to compare an individual to? Say I’m a 40-year-old man in Boston who’s been practicing moderately good social distancing and is not currently experiencing any symptoms of COVID-19. To establish my base rate of having the coronavirus for the purpose of interpreting any test result, should I take the rate from among other Bostonians (perhaps only in my neighborhood?), restricted to other 40-year-old men, only asymptomatic people, only those whose social behaviors exactly match my own, etc.? Or should I perhaps cast a conservatively wide net and compare myself to all Americans or all people in the world? The same considerations apply when interpreting the fatality rate of the virus. Given daunting statistics like 1 percent of infected people dying, before taking that figure to represent our own personal risk, we might first want to know the death rate broken down by factors like age, race, socioeconomic status, severity of symptoms, presence of other health conditions, access to medical care, and so on.
Call this Bayesian Reasoning 201.
Who are the people like us? What are the circumstances like our own? Who are any of us, really? Establishing who exactly we count as “like us” for the purposes of probability estimation is an old philosophical quandary known as the Reference Class Problem.
The name comes from Hans Reichenbach, who wrote in his Theory of Probability in 1949, "If we are asked to find the probability holding for an individual future event, we must first incorporate the event into a suitable reference class. An individual thing or event may be incorporated in many reference classes, from which different probabilities will result." Most discussions of the problem attribute the idea originally to John Venn, thanks to his 1866 work The Logic of Chance that established the “frequentist” school identifying probability with frequency of occurrence over the hypothetically infinite long-run. For example, in Venn’s view, the chance of a coin-flip coming up heads is 50 percent because this is the frequency with which it would happen over a long series of flips, and for no other reason. But in the course of developing that concept of probability, Venn noted that “every individual thing or event has an indefinite number of properties or attributes observable in it, and might therefore be considered as belonging to an indefinite number of different classes of things." This posed a special difficulty for Venn’s frequentist viewpoint because, if probability is defined by ratios obtained by repeatedly sampling from a population, it’s necessary first to decide exactly what that population is.
In fact the problem is much older than that, though. In the late 1600s, the mathematician Jacob Bernoulli first opened the door for probability between games of chance and the affairs of people. At the time of his writing, probability was understood in the “classical” sense as the number of ways an event could happen divided by the number of possible things that could happen. The probability of rolling an eight with two dice is 5/36 because there are five ways to do it out of 36 possible rolls, and so on. In his foundational work Ars Conjectandi (“The art of conjecturing”) published posthumously in 1713, Bernoulli articulated many of the purely mathematical ideas of probability including his greatest accomplishment, the Law of Large Numbers, which states that the true probability of an event will be approximately borne out as the frequency with which it occurs over a large number of trials. For Bernoulli, this was a way of measuring probability, not defining it, but the Reference Class Problem was present nonetheless.
The naive view that we can establish the needed probabilities conclusively by comparing ourselves to a population “like us” will always fall apart under scrutiny.
Bernoulli’s ambition was for probability to apply to all situations of reasoning under uncertainty, including what he referred to as civilibus, moralibus, and oeconomicis, the political, moral, and economic spheres where decisions are typically made with incomplete information. In such applications, there was often no feasible way to measure probability other than through frequency or proportion, leading directly to the question: Proportion of what? For example, Bernoulli considered the problem of determining a person’s chance of living ten more years by tallying up the results of men of the “same age and complexion” and “under similar circumstances:”
“It should be assumed that each phenomenon can occur and not occur in the same number of cases in which, under similar circumstances, it was previously observed to happen and not to happen. Actually, if, for example, it was formerly noted that, from among the observed three hundred men of the same age and complexion as Titius now is and has, two hundred died after ten years with the others still remaining alive, we may conclude with sufficient confidence that Titius also has twice as many cases for paying his debt to nature during the next ten years than for crossing this border.”
Around 1827, social science pioneer Adolphe Quetelet was beginning his project of applying the methods of probability and statistics he had learned as an astronomer to questions of social relevance. His goal was to establish a “social physics” that could rival Kepler’s laws and make as precise predictions about the trends in people’s lives and deaths as Kepler could about the motion of planets. But he was very nearly discouraged from the whole endeavor by the criticism of the Baron de Keverberg, who wrote,
“The law regulating mortality is composed of a large number of elements: it is different for towns and for the flatlands, for large opulent cities and for smaller and less rich villages, and depending on whether the locality is dense or sparsely populated. This law depends on… a multitude of local circumstances that would elude any a priori enumeration. It is nearly the same as regards the laws which regulate births. It must therefore be extremely difficult, not to say impossible, to determine in advance with any precision, based on incomplete and speculative knowledge, the combination of all of these elements that in fact exists.”
In short, Keverberg considered the Reference Class Problem to be a dealbreaker for social science. Fortunately for us today, Quetelet did not totally agree.
Not even people in the more recent “subjectivist” schools who completely reject the idea of defining or measuring probability with frequencies are free from the Reference Class Problem. For example, in this way of thinking, a probability of 50 percent for a coin-flip might reflect only the speaker’s confidence in (or willingness to bet on) the outcome. As Alan Hájek has argued, such assessments are always made conditionally on some assumed information, though, which requires that information to be specified. In other words, one would need to know exactly what one knows about a person before assessing their probability of getting sick, dying, etc. — enumerating these characteristics is perfectly equivalent to slotting that person into a reference class.
So, the Reference Class Problem has cropped up in numerous ways over centuries and caused headaches essentially anytime probability has been applied to people’s lives, no matter what that probability was understood to mean. The naive view that we can establish the needed probabilities conclusively by comparing ourselves to a population “like us” will always fall apart under scrutiny because human lives are just too complex. Human beings are not coin-flips or dice rolls. We are each, in fullest detail, a population of one. Adjusting numbers from any group to an individual therefore requires a model, which requires assumptions.
To counteract these biases, we need, essentially, to forget some particulars about ourselves and think in broad strokes.
Today, data scientists employ strategies for assigning probabilities using machine learning algorithms that go well beyond simply looking for a suitable reference class, like this calculator for the risk of contracting COVID-19 and this one for predicting COVID-19 mortality. In fact, these algorithms are, from one point of view, just a sophisticated response to the Reference Class Problem. A simpler example along these lines is the process known as “logistic regression,” which specifies the probability of an individual having a particular outcome as a function of the characteristics of that individual. If a person had two known characteristics, say age (X) and income (Y), a logistic model for their chance of contracting the coronavirus (p) might take the form
log(p/(1-p))=β_0+β_1 X+β_2 Y
Training the model then consists of fine-tuning the coefficients of that function until it best matches a set of observations. In this way, individual differences are preserved and allowed to contribute to the probability function but only in a particularly structured way. Models like this attempt to address the question: You may not be exactly like someone else, but how much do the differences matter?
Something like this approach is essential because otherwise we risk being blinded by individual exceptionalism. As Daniel Kahneman and Amos Tversky established in their famous work on prediction and decision-making, people have a tendency to understate their risks because they have an insider’s perspective on their situation that they feel makes them exempt from larger trends. Kahneman and Tversky called this the “planning fallacy,” which they identified as a “consequence of the tendency to neglect distributional data, and to adopt what may be termed an 'internal approach' to prediction, where one focuses on the constituents of the specific problem rather than on the distribution of outcomes in similar cases.” It’s okay for me to drive after a few drinks because I’m an especially good driver; my project will be completed on time because I can’t imagine what could delay it; and so on. Donald Gillies, in Philosophical Theories of Probability, called this the “Francesca argument,” after his teenage niece Francesca who wanted a motor scooter and argued that the alarming overall rate of scooter accidents didn’t apply to her because she would be a more careful rider than the average teenager.
The same misplaced optimism is the reason why weight-loss programs that track what food people actually consume are always more successful than those that rely on people’s memories. When asked what you ate some particular day weeks ago, it’s natural to remember the healthy salad with lean protein and forget about the afternoon brownie or bag of chips. So it goes with our assessment of our risk profile for COVID-19. When you think of your behaviors that might have affected your exposure to the virus, it’s easier to think of the positive steps you took to protect yourself, the times you wore a mask and stayed away from other people, instead of the occasional slip-ups — the delivery driver you handed a credit card to, the inadequately ventilated office, or the crowd of strangers who got too close.
To counteract these biases, we need, essentially, to forget some particulars about ourselves and think in broad strokes. In “To a Louse,” Rabbie Burns said that from the perspective of a louse, we are all the same; rich and poor, we’re all just potential hosts. As Burns wrote, this humbling realization could in fact be a great gift: “O wad some Pow'r the giftie gie us / To see oursels as ithers see us!” The same could be said of the coronavirus. To it, we’re all just collections of cells with the potential to help it reproduce, and our optimism about our special and unique lives is irrelevant. If we’re willing to accept it, then, perhaps this is the gift statistical models can give us in the time of COVID-19: to see ourselves as the virus sees us.