Fortunately, if you can design an experiment, you don’t run into this problem. But there are many studies that would be unethical. We often hear about drugs, for instance, that have not passed all of the FDA’s gates for safety and efficacy but still are administered to patients. Unfortunately, these are all instances where the patient’s alternative is death. But imagine a condition like depression, where there are lots of alternatives. And imagine that we wanted to study the effectiveness of a treatment on mild to moderate depression—so that the alternative to failure of the intervention is not catastrophic. The new pill, however, has a 1 in 1000 chance of causing slow, painful, but certain death. It would be unethical to study this drug in a randomized controlled trial (RCT), which is considered to be the gold standard of research. Because we have alternatives, including doing nothing, that are better than dying a slow painful death, we are in a situation where we have to use existing data to figure out how to treat mild to moderate depression.
So, in the real world, where we have strong ethical controls on experiments, we will always be confronted with Simpson’s Paradox and no way for our data to tell us how to slice it up so that we’re assured that procedure A truly is better than procedure B.
In business, we’re even in a worse situation. We really have only data about the past. Other than what’s called A/B testing—where a company sends out two mailers with different fonts or text to entice potential customers to sign up for their credit card, or a website feeds people many different looks/feels to see which keep people on the site longest—we are stuck with using existing data to try to draw conclusions about the right way to proceed in the future. But Simpson’s Paradox tells us that quite literally, we can never be sure in advance whether we have analyzed our marketing, financial, and other data in the right way or not.
But there is good news. First, Judea Pearl of UCLA has worked out a calculus of causality, with tools to model it. That is, there is a way to use information from the past to tell us what causes what. We don’t have to sweat Simpson’s Paradox, we can know from past data whether procedure A or procedure B will (or is more likely) to produce our desired outcome. Second, if we do not need to make something happen in the real world, then data analysis—which will give us nice correlations—is good enough. Put another way, if we don’t need to cause people to buy a product, no problem. However, if we want to figure out how to get people to buy our product or use our service, we either need to have special intuition or we need to understand what causes someone to buy our product vs. all other options.
In future posts, I will begin to test ways of using both experimental methodologies and causal modeling in a business context. I want and need your feedback on this project, because I believe that businesses have been stuck with too much guess work for too little return. What is more, Simpson’s Paradox proves that while Big Data and AI can provide lots of correlational information, and are incredibly useful as a way to test causality, they cannot resolve problems when we want to make new things—like creating a successful business—happen in the real world.