Skip to content Skip to menu

We recently published a paper (Unreliability of putative fMRI biomarkers during emotional face processing) that asks the question: can we use fMRI as a treatment biomarker to predict treatment response in psychiatry?

By Camilla Nord, Institute for Cognitive Neurosciences, UCL.


Why search for biomarkers?

Psychiatry has long been searching for usable biomarkers. In other branches of medicine, biomarkers in the blood, urine, or cerebrospinal fluid can give some indication of the best treatment approach for patients. In cancer therapeutics, the genetic makeup of patients’ tumours can predict how well they will respond to specific drugs or even if they will respond at all.  In the case of psychiatric disorders, nothing like this exists. Your GP or psychiatrist makes a best guess, but there aren’t any medical tests to tell your doctor which treatment will work for you.


Using fMRI to uncover biomarkers

Our and others’ previous work suggests neuroimaging, especially fMRI, can predict treatment response in depression.  Several studies have shown that when patients with depression are exposed to negative stimuli in the scanner, those with greater activation in the subgenual anterior cingulate cortex (sgACC) are most likely to respond to psychotherapy (the opposite is true for the most common class of antidepressant, selective serotonin reuptake inhibitors – see the figure below). There’s also some evidence that activation in another brain region, the amygdala, can also separate out patients who would be better responders to antidepressants or psychotherapy.


Fig.1. Hemodynamic responses to negative stimuli in the perigenual anterior cingulate cortex (ACC) predict subsequent response to treatment in depression, but in different directions for pharmacological and psychological treatments. Individuals with greater perigenual ACC responses to negative stimuli have greater mood improvement after treatment with fluoxetine or venlafaxine, whereas the converse is true for responders to cognitive behavioral therapy (CBT) and behavioral activation therapy (BA).


The importance of reliability

But there’s a problem with all these claims. These sorts of studies, like most of neuroscience and psychology, rely on group averages. Even if an effect is real and replicable, these effects might not hold true at the level of the individual because of variability in the measurement technique (fMRI) or measure (the BOLD response).

The target figure here is commonly used to depict two aspects of any measurement: validity and reliability. Imagine you throw a dart at a target four times. Darts that fall close to the target (and, because we often use group statistics, darts whose mean falls over the target, as in B.) are described as valid. In figure C., the darts are not hitting the target (i.e., low validity), but your aim is pretty reliable for that incorrect spot. In a sense, this type of error is easier to fix, because all of your darts – or measurements – are wrong in the same exact way, and once you find the source of this error, you can correct them until they look like Figure D. But in Figure B, it’s a little more complicated. You may be able to draw valid conclusions from the average of your measurements (as with fMRI group-level analyses), but it is unclear if these data will ever be useful at the level of the individual. The fMRI measurements researchers hope to use in psychiatry are at least somewhat valid – many studies have found them at the level of the group – but we wanted to know whether they were reliable at the level of the individual.


Fig. 2. Reliability & Validity.


The study

We scanned a group of 29 volunteers twice, two weeks apart. On each day in the scanner, participants did the same three tasks (two runs of each). All three tasks involved visually presenting emotional faces. These tasks are some of the most common in psychiatric neuroscience research, and evoke strong amygdala activation, and robust sgACC deactivation. What we were interested in was the reliability of each participant’s BOLD response in the amygdala and sgACC, compared to a control region, the fusiform face area (FFA).

Here, reliability does not test whether the BOLD response is identical between days, but rather whether, relative to every other participant, the strength or weakness of an individual’s BOLD response stayed the same. That is, if you had the strongest amygdala activation on day 1 compared to the rest of the group, you still had one of the strongest amygdala activations on day 2. This reliability is essential if you want to use a subject’s amygdala activation to classify them as a treatment responder or non-responder.

In all three tasks, we found surprisingly low reliability in the amygdala and sgACC responses to emotional faces. This was true not only after the two week period, but even between the two runs, which were about ten minutes apart. By contrast, the control region, the FFA, depicted in yellow on the figure below, showed high reliability, both between-run and between the two days.

This implies that our test biomarkers, amygdala and sgACC responses to emotional faces, might not be able to predict treatment response at the level of the individual.



Fig. 3. Whole-brain activation maps and parameter estimates for the three functionally-defined regions of interest (left and right amygdala, and subgenual anterior cingulate cortex, sgACC), and the comparison region, the right fusiform face area (FFA), for all runs (both days). Coloured arrows and stars indicate coordinates used in the analysis: cyan arrows correspond to peak activation in the left amygdala; green arrows to peak activation in the right; yellow arrows indicate the coordinate from a previous study (McKeeff and Tong, 2007) used for the FFA analysis; magenta arrows indicate peak activation in the sgACC.



We have shown that the field’s current way for defining potential biomarkers – in this case, looking at individual responses to emotional faces in the amygdala and sgACC to predict treatment response – might not be viable. Instead, fMRI studies might have to look toward different regions or measures (e.g., structural MRI) to find a useful treatment biomarker. In future, studies that claim to have found treatment biomarkers with fMRI should also test that biomarker’s within-subject reliability. Otherwise, even robust group-level findings will fall flat when tested for clinical utility.


Camilla Nord ()

Institute for Cognitive Neuroscience, UCL