Speech Science Forum -- Yue Chen
An exploration of machine learning based speech perception modelling: A case study of tone and focus perception

Speech perception is a critical component of speech communication enabling human listeners hear, interpret and understand speech sounds. It involves transforming highly variable acoustic speech signals into linguistic representations, such as phonemes, syllables, or words. However, variability in speech signals poses a significant challenge in speech perception research. Also, how listeners process speech signals continuously over time remains a topic of debate. This talk begins with an introduction to some classic speech perception theories and models and their proposals for handling speech variability. Then, I will present a case study of perception modelling of lexical tone and prosodic focus based on computational simulation. Finally, I will discuss the validity of using machine learning approaches to simulate human speech perception and propose a hypothetical framework for speech perception. This project provides critical insights into the understanding of the underlying mechanisms of speech perception and how to bridge the gap between human speech perception and speech engineering.
Yue Chen
PhD Student
University College London