Changing How the World Thinks

An online magazine of big ideas

more

Phantom patterns

The big data delusion

20 08 24.smith.ata

Far from revealing the world to us, the modern deluge of data leaves us endlessly searching for meaningful patterns. Raw data leads us easily astray, making human wisdom and experience more important than ever, writes Gary Smith.

Compared to other animals, humans are not particularly strong, fast, or powerful. We don’t have sharp teeth, claws, or beaks. We don’t have sledgehammer horns, tusks, or tails. We don’t have body armor. We are not great swimmers or sprinters. How did our distant ancestors not only survive, but become masters of the planet?

One fuel that powered our ascent was our prowess at recognizing patterns that had survival and reproductive value. Our distant ancestors noticed that elephants can lead us to water. They remembered that zebra stampedes can signal predators. They identified thousands of useful patterns, and those who were the best at it came to dominate the gene pool.

You can take the human out of the Stone Age but you can’t take the Stone Age out of the human. Hard-wired to look for patterns, we are now inundated by a deluge of data that are far more abstract, complicated, and difficult to interpret than were the sights, sounds, and smells noticed by our distant ancestors. Instead of observing elephants and zebras, we now analyze abstract concepts like GDP, crime rates, and stock price indexes. Yet, we continue to be driven to seek patterns. Often overwhelmed by vast amounts of digital data, we turn our pattern searches over to computer algorithms in the mistaken belief that computers are smarter than us.

Hard-wired to look for patterns, we are now inundated by a deluge of data that are far more abstract, complicated, and difficult to interpret than were the sights, sounds, and smells noticed by our distant ancestors.

Artificial intelligence (AI) is still far more hype than reality in that computer algorithms are not intelligent in any meaningful sense because they have none of the commonsense and wisdom that humans have accumulated by living in the real world. Computers are very good at mindless tasks, like mathematical calculations and word searches, but do not understand what a number or word means. Computers can calculate the correlation between the daily high temperature in Curtin, Australia, and the closing price of the S&P 500 without knowing what a temperature or stock price is and whether there is a compelling reason for using one to predict the other. Computers can defeat the best humans have to offer at backgammon, checkers, chess, and Go but would lose to an amateur if there were a last-minute rule change.

Put the two together—using mindless algorithms to ransack enormous databases for patterns—and, Houston, we have a problem.

 

The scientific method tests theories with data. Data-mining computer algorithms dispense with theory and search through data for patterns, often aided and abetted by slicing, dicing, and otherwise mangling data to create patterns:

  • Stock prices can be predicted from Google searches for the word debt.
  • Stock prices can be predicted from the number of Twitter tweets that use “calm” words.
  • An unborn baby’s sex can be predicted by the amount of breakfast cereal the mother eats.
  • Bitcoin prices can be predicted from stock returns in the paperboard-containers-and-boxes industry.
  • Interest rates can be predicted from Trump tweets containing the words billion and great.

 

No, I am not making any of this nonsense up. These are all serious claims from reputable researchers—and they are just the tip of an iceberg of meaningless patterns that we embrace because of our inherited infatuation with patterns and our mistaken belief that computers are smarter than us.

Here are a few more. A computer algorithm for evaluating job applicants noticed that many good programmers visited a particular Japanese manga site and concluded that people who visit this site are likely to be good programmers. Another algorithm concluded that job applicants who went to all-women’s colleges are unlikely to be good software engineers because very few of the company’s current engineers went to all-women’s colleges. A car insurance company created an algorithm for evaluating applicants based on Facebook posts, including whether one likes Leonard Cohen. An algorithm for evaluating loan applications analyzed smart phone usage and concluded that keeping one’s smartphone charged made one creditworthy. An advocate of algorithmic criminology said that an algorithm “finds things that you wouldn’t anticipate,” like the size of wristbands people wear. An algorithm for scanning facial photos was reported to predict with 89.5 percent accuracy whether a person is a criminal.
Lacking wisdom or commonsense, computer algorithms have no way of assessing the plausibility of such fleeting patterns. Enamored of patterns and awed by computers, we are tempted to accept this rubbish as something useful.

Lacking wisdom or commonsense, computer algorithms have no way of assessing the plausibility of such fleeting patterns.

Computer-assisted pattern seeking is also undermining the credibility of science, creating what has come to be known as the replication crisis (or reproducibility crisis) in which attempts to replicate published studies often fail. For example, reserpine was a popular treatment for hypertension until researchers reported in 1974 that it substantially increased the chances of breast cancer. Later, after doctors had stopped prescribing reserpine, several attempts to replicate the 1974 report concluded that the association between reserpine and breast cancer was spurious. One of the original researchers later described the initial study as the “reserpine/breast cancer disaster.”

In retrospect, he recognized that they had been fooled by a coincidental pattern, which is inevitable if one looks hard enough:

We had carried out, quite literally, thousands of comparisons involving hundreds of outcomes and hundreds (if not thousands) of exposures. As a matter of probability theory, ‘statistically significant’ associations were bound to pop up and what we had described as a possibly causal association was really a chance finding.

John Ioannidis, author of an insightful paper with the scandalous name, “Why Most Published Research Findings Are False,” looked at 45 of the most widely respected medical studies that claimed to have demonstrated effective treatments for various ailments. He found that replication attempts had been done for 34 of these treatments, with the original conclusions confirmed in only 20 cases. The numbers are surely worse for ordinary research in ordinary journals.

The number of possible patterns that can be identified relative to the number that are genuinely useful has grown exponentially.

The Reproducibility Project launched by Brian Nosek enlisted 270 researchers to attempt to replicate 97 studies published in three leading psychology journals. Only 35 were confirmed and, even then, the effects were invariably smaller than originally reported. The Experimental Economics Replication Project attempted to replicate 18 experimental economics studies reported in two leading economics journals. Only 11 of the follow-up studies found significant effects in the same direction as originally reported.
It is tempting to believe that more data means more knowledge about the world we live in. However, the explosion in the number of things that are measured and recorded has magnified beyond belief the number of coincidental patterns and bogus statistical relationships waiting to deceive us. The number of possible patterns that can be identified relative to the number that are genuinely useful has grown exponentially—which means that the chances that a randomly discovered pattern is useful is rapidly approaching zero.

Three takeaways:

1. The paradox of big data is that the more data we ransack for patterns, the more likely it is that what we find will be worthless or worse.
2. The real problem today is not that computers are smarter than us, but that we think that computers are smarter than us and trust them to make decisions for us that they should not be trusted to make.
3. In the age of Big Data and powerful computers, human wisdom, commonsense, and expertise are needed more than ever.


 

Find out more about Gary Smith's work on his website or follow him on twitter @StandardDevs

 

Latest Releases
Join the conversation

Sign in to post comments or join now (only takes a moment). Don't have an account? Sign in with Facebook, Twitter or Google to get started: