XClose

UCL Psychology and Language Sciences

Home
Menu

Speech Modelling

speech modelling logo of cartoon baby and some shapes
Current Research Themes

  • Applied research in speech signal processing and speech signal classification
  • Computational modelling of speech articulation, prosody, and acquisition

Key Researchers:

Chris Carignan

Perhaps the most defining characteristic of our species is the complexity of speech to communicate meaning. Through muscular control of a relatively small portion of the body (the vocal tract), a speaker is able to modify the vibration of air molecules as a vessel for transmitting a mental concept to a listener. My research involves using a wide variety of state-of-the-art technologies (real-time MRI, ultrasound tongue imaging, electromagnetic articulometry, nasalance, laryngography) to investigate how speakers coordinate vocal tract articulators to produce speech sounds, how this shaping of the vocal tract affects the acoustic output, and how these acoustic changes are perceived by listeners. Knowledge of these aspects of speech production and perception can help explain sound patterns that we observe as languages evolve over time, predict future language evolution, and teach us about the physical and cognitive characteristics of our shared capacity for human language.

Mark Huckvale

My research involves many aspects of speech science and technology: speech production, hearing, speech perception, speech acquisition, speech synthesis, speech recognition and speaker recognition.

Recent research activities include:
Centre for Law-Enforcement Audio Research (CLEAR). A joint research centre with Imperial College London that investigates methods for the enhancement of degraded speech signals.

Avatar Therapy. A project that investigates the use of computer avatars in the provision of therapy for sufferers of auditory hallucinations (hearing voices).

KLAIR Virtual Infant. A machine learning toolkit for the study of the computational modelling of early speech acquisition by infants through real-time interaction with caregivers.

VOQAL Voice Quality Toolbox. A toolkit for the analysis of changes in voice quality caused by aging, disease or stress.

Yi Xu

My research is primarily concerned with the basic mechanisms of speech production and perception in connected discourse in general, and speech prosody in particular. My work also concerns computational modeling and automatic synthesis of speech, computational modeling of the neural process of speech acquisition and emotions in speech.

Emma Holmes

I’m interested in how we percieve sounds in challenging listening environments—such as understanding what a friend's saying when there are other conversations going on around us. In particular, I'm interested in how auditory cognition (e.g., attention and prior knowledge) affects our perception of speech and other sounds, and how these processes are affected by hearing loss. My research combines behavioural techniques (e.g., auditory psychophysics), cognitive neuroscience (e.g., EEG, MEG, and fMRI), and computational modelling.

Current and Past Projects:

  • High quality simulation of early vocal learning
    Researchers: Yi Xu in collaboration with Peter Birkholz (Germany), Santitham Prom-on (Thailand) and Lorna Halliday (Cambridge)
    Funded by Leverhulme Foundation. Dates: 2019-2022


    Developing a coherent understanding of the basic mechanisms of human vocal learning is long overdue. Speech is acquired during childhood, and this is what allows the unique human ability to communicate complex ideas to be passed on across generations. It is still unclear how exactly this acquisition is accomplished. The earliest stage of the acquisition is the most baffling, as at that time infants can neither understand instructions nor ask questions. Computational simulation offers a means to identify the specific steps and conditions needed for the success of vocal learning. Our modelling aims at simulating vocal learning to the extent that a trained articulatory synthesizer can generate syllables that are both intelligible and natural sounding. This has never been achieved by other research groups, but has been partially demonstrated by our preliminary results. The success of this project will therefore critically enhance our knowledge of vocal learning. Also, by removing or weakening various aspects of a successful simulated learning process, we can identify likely sources of specific deficits in various speech and developmental disorders. Moreover, once natural sounding speech can be generated in the simulation, significant insights would be gained about long-standing theoretical issues like coarticulation, syllable formation and motor equivalence. An effective simulation of speech as a skilled motor movement may also have implications for motor control and motor learning in general. Finally, a full simulation that can generate natural sounding speech may have implications for speech technology, robotics and artificial intelligence.
     
  • A Common Prosody Platform for testing theories and models of speech prosody
    Researchers: Yi Xu in collaboration with D. H. Whalen, and Hosung Nam, Christian DiCanio (USA), Fang Liu (Reading), Santitham Prom-on (Thailand), Amalia Arvaniti (Kent), Wentao Gu (China).
    Funded by National Science Foundation (NSF), USA. Dates: 2014-2016


    Prosody research has seen significant development in recent decades, and numerous theories and computational models have been proposed. However, many fundamental issues remain unresolved and some are still under heated debate. This lack of consensus has resulted in slow advances in developing speech applications with capabilities for processing prosody. This project is a collaborative effort to accelerate progress in prosody research by developing a Common Prosody Platform (CPP). CPP will consist of an open-access web site that hosts a collection of trainable models in the form of Praat scripts, each implementing a major theory of prosody. CPP will therefore facilitate theory evaluation by enabling them to make numerical predictions that can be directly compared with natural prosody in fine detail. CPP will be tested on autosegmental-metrical (AM) theory, parallel encoding and target approximation (PENTA), articulatory phonology/task dynamic model (TADA), and command response (Fujisaki) model, and apply them to to English, Greek, Mandarin and Itunyoso Trique (an endangered tone language).