Stop Fearing Incidental Findings

_{^{(Disclosure¹)}}

We have two options for the next hundred years of healthcare:

Gather as much data as possible, and make healthcare better (and false positives less common).
Don’t – because we’re afraid that people will get scared and act unwisely when they see an incidental finding.

Doctors in popular media are lining up to say that (1) is a Bad Idea.

From the New York Times (paywall, emphasis mine):

In April, the American College of Radiology released a statement . . . expressing concern that scans could lead to “nonspecific findings” that require extensive, expensive follow-up.

In fact, that isn’t quite what the American College of Radiology said. Their statement didn’t use the word require, but instead expressed concern that scans could lead to nonspecific findings that result in extensive, expensive follow-up.

The require vs result in mixup seems like a semantic point but it’s an important one. The ACR is not saying that follow-ups are necessary, only that they are common. These often spurious, dangerous, and expensive follow-ups are a symptom of broken processes in healthcare, not a predetermined consequence of an untargeted MRI.

More data is better

“If you scan more, you see more” is certainly true. But “if you see more, you do dangerous invasive follow-ups” doesn’t have to be. Unless we change the culture from “if you see something incidental, you must act” to “if you see something incidental, you must act if the disease is likelier to harm² you than the follow-up,” we’ll be stuck fearing incidental findings forever.

As a bonus, by gathering more data about you, we can regularly update our priors about whether each disease is likely to cause harm.

There’s no Platonic ideal of a human body. Instead of comparing your body to some ideal from which it will always deviate, we should be comparing it against your baseline. If you see something suspicious – but it hasn’t changed from your last scan – it’s likely not suspicious after all. If we do it right, more scanning should decrease the number of false positives, not increase them – but only if we’re good about not panicking when we see something outside of the “normal” range the first time we measure it³.

Philosophical point taken, but if I test positive for something, I’m still going to get the follow-up!

I’m certainly not arguing that everyone should get all of the tests all of the time (yet). They cost money and resources, and because medicine is so intervention-focused today, if you get a test, your doctors may very well feel compelled to act on it – if only to protect themselves from a lawsuit⁴.

But we should be working to change that. We should be working to make each piece of data as cheap and safe⁵ as possible to gather, and we should measure baselines to rule out items of concern that haven’t changed from exam to exam. We should be working to make healthcare more data-driven and less likely to skip straight to dangerous interventions. We should also be working to stop treating every piece of data as a binary “positive” or “negative” result and avoid punishing doctors for making the right statistical decisions.

Nikhil Krishnan has a great piece on why we don’t screen healthy people to catch diseases early. Cribbing directly from that piece (which is worth a read), let’s consider the extreme example of Nikhilitis, a disease that affects 1/1000 people. We have a cheap screen that is safe and 99% sensitive (if you have it, the test will be positive 99% of the time) and 90% specific (if you don’t have it, the test will be negative 90% of the time).

This means that if a random member of the population tests positive, they have a 1% chance of having Nikhilitis (math in footnote⁶). Subsequent health decisions should be based on that number, not on the scary word “positive.”

Let’s also imagine that the confirmatory diagnosis is a lobotomy with a 0.1% mortality rate. If the expected mortality rate of Nikhilitis is less than 10%, don’t get the freaking lobotomy! (The same footnote⁶ goes on to explain your expected mortality rate via confirmatory biopsy vs Nikhilitis.) But, if you’re in a special population where Nikhilitis is especially dangerous, the lobotomy might be the right choice.

Even Nikhil, whose post is otherwise excellent, implies that everyone who tests positive needs a biopsy. They don’t need a biopsy. The screen cannot ever tell you directly whether or not to get a biopsy; it just gives you more information about your risk for Nikhilitis. You only need a biopsy if that new information (plus other information about you) indicates that you’re safer to get the lobotomy than to let the possibility of Nikhilitis ride.

So if no one should get the follow-up, what was the point of the screen anyway?

Unlike the Nikhilitis screen, baseline measurements like blood panels or full-body MRIs have utility outside of a binary decision on a specific day. If you get a full-body MRI, you have information that is affirmatively useful for your healthcare down the line. How useful will depend on how many people get these MRIs and the quality of science we can do with the data. It’s a tricky cold-start problem but, in my view, it’s the most important one that we have to solve to stop people dying of preventable diseases.

But we live in the real world, where people don’t understand conditional probability and will get the follow-up anyway

A few (untested) suggestions that I think might marginally improve how we discuss preventative healthcare:

Report screening results as percentages rather than “positive” or “negative”

If your screen comes back positive for Nikhilitis with no other indicators, the report (and your doctor) should explain that “our data tells us that you have about a 1% chance of having Nikhilitis given everything else we know about you.” The words “positive,” “reactive,” “abnormal,” and similar should be reserved for extremely significant results. Doctors have trouble understanding this too, so the labs themselves should be careful about the language in their reports.

Stop calling full-body MRI a “screen”

We shouldn’t be thinking of full-body MRI as a “screen.” Instead, it’s just a series of measurements. In the same way that most of a normal panel of blood tests are not a “screen” so much as they are individual measurements of your cholesterol, your liver function, and so on, a full-body MRI is just a series of measurements of your body’s structure and function. While many companies in the space are marketing themselves as screeners, the responsible ones are quietly figuring out how to gather repeatable data as cheaply and easily as a blood test.

Unless you’re using an MRI to screen for specific diseases (in which case you should be giving careful Bayesian treatment as above) we should be using these technologies to gather baselines and understand change, rather than treating them as solely point-in-time mechanisms to catch specific diseases early.

Longitudinal data is useful

We should track changes over time. If something is stable, it’s probably not a problem. If something is changing, it might be. For example, I have a cyst in my pelvis. It hasn’t changed in years. In 20 years, when I’m getting an MRI for one reason or another, the doctor will likely want to do something about the cyst. But I will know that the cyst has been there, perfectly safe and unchanging, for decades. That information about one of many little ways I’m a bit abnormal may prevent me from going under the knife.

Incentive alignment is hard, so we have to keep driving costs down

What’s the incentive for you to get tests if they’re not immediately actionable? It’s hard to justify spending a big chunk of money (or a big chunk of shared healthcare bandwidth) on something with no immediate payoff. If the Nikhilitis test doesn’t give you any actionable insight, what was the point? As costs decrease and science improves, the value per datapoint gathered will increase while the cost to gather each datapoint will decrease until eventually the lines cross. To get to that point, we need to subsidize the cost of gathering data and make it as easy as possible to gather across healthy and unhealthy populations.

Finally

Don’t be afraid to get the full-body MRI – but before you look at the results, be prepared to (maybe) ignore (only maybe) scary findings. Remember, without the test, you wouldn’t have found out about these findings anyway! Armed with new information, you can track if there is change over time; and if there’s something obvious and immediately actionable, you’ve caught it early.

We should be encouraging data gathering at scale. The alternative – keeping our heads in the sand – will keep us dying in the dark as people continue to suffer from diseases we would have caught early if we’d been brave enough to look.

Thanks to Ben Weems and Elisabeth Ostrow for their feedback on this post!

I used to run the software engineering team at Q Bio, a company that works on full-body MRI. We didn’t think of our scans as screening for specific diseases so much as a way to gather a ton of data about a person’s health over time, and to use that data to make that person’s healthcare better. That said, it’s always hard to gather data about a person and tell them “don’t do anything about it!” ↩
Including financially, emotionally, or however else you define harm. ↩
It isn’t currently permissible within our ethical framework, but imagine a situation where everything about you got measured every year from, say, 18-35 and kept hidden from you and all of your doctors for the first 15 years. Think of all the useful data you would have when you got knee pain at 35! “Oh, this cyst in my knee has been here the whole time; it’s probably not that.” ↩
Malpractice law is not set up for this kind of thinking. I don’t have an easy solution for this. ↩
Note also that, unlike a CT scan, MRI is near-100% safe (modulo getting accidentally smashed by an oxygen canister that somebody accidentally brings into the room,) so \(P(\text{test harms you})\) is near 0. So, we should be giving people MRIs regularly, (if they’re cheap enough), and updating our priors about our health with the information they provide. Just don’t do a biopsy or an X-ray unless those priors are high enough! ↩
First, let’s calculate the probability that you have Nikhilitis given a positive test. We know that:
\[\begin{align*} P(\text{positive} | \text{nikhilitis}) &= 0.99 \\ P(\text{nikhilitis}) &= 0.001 \\ P(\text{positive}) &= P(\text{positive} | \text{nikhilitis}) P(\text{nikhilitis}) + P(\text{positive}|\neg \text{nikhilitis}) P(\neg \text{nikhilitis}) \\ &= 0.99 \times 0.001 + 0.1 \times 0.999 \approx 0.101 \\ \end{align*}\]
Thus, applying Bayes’ theorem
\[\begin{align*} P(\text{nikhilitis} | \text{positive}) &= \frac{P(\text{positive} | \text{nikhilitis}) P(\text{nikhilitis})}{P(\text{positive})} \\ &\approx \frac{0.99 \times 0.001}{0.101}\\ &\approx 0.01\\ \end{align*}\]
Now, if the baseline mortality rate of Nikhilitis is 1% and assuming that the test sensitivity is independent from the mortality rate, if you have a positive test, your posterior probability of dying from Nikhilitis is 1% (chance of having it) * 1% (its mortality rate) = 0.01%. If the mortality rate of the confirmatory diagnosis or other follow-ups are themselves greater than 0.01%, (or otherwise have financial/psychological/quality-of-life harm equivalent to that risk) you shouldn’t follow up.

Concretely, in our example, the confirmatory test has a 0.1% mortality rate. You shouldn’t get the confirmatory test, since 0.1% > 0.01%! If you’re in a population where Nikhilitis is >10% fatal, though, you probably should – because all of a sudden, your chance of dying from Nikhilitis is at least 1% * 10% = 0.1%, so the lobotomy is at least as safe as chancing Nikhilitis.

Disclaimers abound here, of course. Mortality is not the only danger, money is not the only cost, and these numbers are extreme. But the reasoning holds even if the real-world calculations need more terms. ↩ ↩²