'And an echo murmured back the word':

How the brain constructs the speech we hear

Abstract

What do we hear when someone speaks? What does auditory cortex (AC) do with that information? I present neuroimaging data suggesting that the impression that we simply hear “sounds” and that AC is the bottom of a feedforward processing hierarchy are the wrong answers to these questions. Rather, when engaged by naturalistic language stimuli, AC is the afferent recipient of multimodal information extracted from preceding discourse content, observable mouth movements, speech-associated gestures, emotional facial displays, written text, and more. Such contextual information seems to be the starting point for the formation of hypotheses that are used to derive predictions about the nature of the information that might arrive in AC. Strong predictions result in a large conservation of metabolic resources in AC, presumably because no further evidence from the auditory world is required to confirm hypotheses. Thus, results suggest that a great deal of what we hear is not sound but, rather, an echo of internal knowledge that shapes and constrains interpretation of the impoverished information reaching AC. That is, hearing speech and AC functioning is a constructive process that relies on multimodal information available during real-world communication.