- Teaching Programmes
- Research Programmes
- Research Departments
- Clinical, Educational Health and Psychology
- Cognitive, Perceptual and Brain Sciences
- Developmental Science
- Institute of Cognitive Neuroscience
- Language & Communication
- Speech, Hearing and Phonetic Sciences
- UCL Interaction Centre
- Research Facilities
- News and Events
- Vacancies and Opportunities
- Contact Us
News and Events
Read all the
latest news within the Division of Psychology and
Read more about the PALS Swan Silver Award
Divisional Subject Pool
The Division has won the Silver Award for the UCL Green Impact scheme
The Division runs Continuing Professional Development courses.
Speech, Hearing and Phonetic Sciences
Completed Research Projects | Current Projects
Modelling speech prosody based on communicative function and articulatory dynamics
Prosody is an important aspect of speech that contributes to expressiveness and intelligibility of the speech. Quantitative modeling of speech prosody is a key in the advancement of speech science and technology. Based on a previous successful research collaboration, the proposed research will be a major systematic effort to develop an “articulatory-functional” quantitative model of speech prosody and integrate into it meaningful communicative functions.
Auditory specialization for speech perception
Individuals are born with an ability to discern speech sounds (phonemes) in all of the world's languages, but they develop through childhood so that they become specialized to perceive native-language phonemes. The aim of this study is to test our hypothesis that this specialization for native-language phonemes begins to occur in central auditory processing, at a functional level prior to linguistic categorization. The work uses behavioural measures and MEG to examine and perception of English phonemes by adult native speakers of Sinhala and Japanese
Researchers: Mark Huckvale, Gaston Hilkhuysen. Funded by the UK Home Office. Duration: 5 years 2007-2012
The CLEAR project aims to create a centre of excellence in tools and techniques for the cleaning of poor-quality audio recordings of speech. The centre is initially funded by the U.K. Home Office for a period of five years and is run in collaboration with the Department of Electrical and Electronic Engineering at Imperial College.
Speech perception and language acquisition in children with hearing impairments
How do children with hearing aids and cochlear implants learn their native language? Can they use the same learning mechanisms and acoustic cues as their normal-hearing peers? In this study, we examine what children with hearing impairments know about the sound structure of their native language. We are also interested in finding out how they acquire this knowledge and whether it is correlated with their vocabulary and grammar skills.
Accent and language effects on speech perception with noise or hearing loss
MRC-ESRC competitive studentship awarded to Melanie Pinet. Supervisor Paul Iverson. Dates October 2008 - September 2012
One of the key factors that determines speech intelligibility under challenging conditions is the difference between the accents of the talker and listener. For example, normal-hearing listeners can be accurate at recognizing a wide range of accents in quiet, but in noise they are much poorer (e.g., 20 percentage points less accurate) if they try to understand native (L1) or non-native (L2) accented speech that does not closely match their own accent. The aim of this PhD research is to provide a more detailed account of this talker-listener interaction in order to establish the underlying factors involved in L1 and L2 speech communication in noise for normal-hearing and hearing-impaired populations
HearCom is an integrated project under the FP6 ICT programme. It involves 30 partners from 12 countries and is coordinated by Tammo Houtgast and Marcel Vlaming from the VU University Medical Center in Amsterdam. Our society is strongly and increasingly communication-oriented. As much of this focuses on sound and speech, many people experience severe limitations in their activities, caused either by a hearing loss or by poor environmental conditions. The HearCom project aims at reducing these limitations in auditory communication.
Dates:2004-2009. Funded by:CEC (EU). Duration:5 years. Researchers: Andrew Faulkner
The KLAIR project aims to build and develop a computational platform to assist research into the acquisition of spoken language. The main part of KLAIR is a sensori-motor server that supplies a client with a virtual infant on screen that can see, hear and speak. The client can monitor the audio visual input to the server and can send articulatory gestures to the head for it to speak through an articulatory synthesizer. The client can also control the position of the head and the eyes as well as setting facial expressions. By encapsulating the real-time complexities of audio and video processing within a server that will run on a modern PC, we hope that KLAIR will encourage and facilitate more experimental research into spoken language acquisition through interaction.
Dates:2009. Researchers: Mark Huckvale
Quantitative modeling of tone and intonation
To develop a quantitative Target Approximation (qTA) model for simulating F0 contours of speech. Following the articulatory-functional framework of the PENTA model (Xu, 2005), the qTA model simulates the production of tone and intonation as a process of syllable-synchronized sequential target approximation. In the model, tone and intonation are treated as communicative functions that directly specify the parameters of the qTA model. The numerical values of the qTA will be extracted from natural speech via supervised learning. And the quality of the modeling output will be both numerically assessed and perceptually evaluated.
Dates:2005. Funded by:Collaborative. Researchers: Yi Xu with Santitham Prom-on and Bundit Thipakorn, King Mongkutâ's University of Technology Thonburi, Thailand
Role of sensory feedback in speech production as revealed by the effects of pitch- and amplitude-shifted auditory feedback
The overall goal of this research project is to understand the function of sensory feedback in the control of voice fundamental frequency (F0) and intensity through the technique of reflex testing. The specific aims of the project are: to determine if the pitch-shift and loudness-shift reflex magnitudes depend on vocal task; to determine if the direction of pitch-shift and loudness-shift reflexes depend on the reference used for error correction; and to investigate mechanisms of interaction between kinesthetic and auditory feedback on voice control. The overall hypothesis is that sensory feedback is modulated according to the specific vocal tasks in whish subjects are engaged. By testing reflexes in different tasks, we will learn how sensory feedback is modulated in the tasks. We also hypothesize that auditory reflexes, like reflexes in other parts of the body, may reverse their direction depending on the vocal task. The mechanisms controlling such reflex reversals will be investigated, and this information will be important for understanding some voice disorders. It is also hypothesized that kinesthetic and auditory feedback interact in their control of the voice. Applying temporary anesthetic to the vocal folds and simultaneously testing auditory reflexes will provide important information on brain mechanisms that govern interaction between these two sources of feedback.
Dates:2004-2009. Funded by:Internal/NIH. Duration:4 years. Researchers: Yi Xu With Charles Larson and colleagues, Northwestern University, USA (funded by NIH: 2004-2009).
Speaker-controlled variability in connected discourse: acoustic-phonetic characteristics and impact on speech perception
This project investigates why certain speakers are easier to understand than others. Speech production is highly variable both across and within speakers. This is partly due to differences in the vocal tract anatomy and partly under the control of the speaker. This project examines whether clearer speakers are more extreme in their articulations (as measured from the acoustic properties of their speech) or whether they are more consistent in their production of speech sounds. In order to better model natural communication, the speech to be analysed is recorded using a new task aimed at eliciting spontaneous dialogue with specific keywords. The first study investigates whether 'inherent' speaker clarity is consistent across different types of discourse and whether speaker clarity is more closely correlated with cross-category differences or within-category consistency in production. The second study investigates whether clearer speakers show a greater degree of adaptation to the needs of listeners. This study has implications for models of speech perception. Understanding what makes a 'clear speaker' will also be informative for applications requiring clear communication, such as teaching, speech and language therapy, and the selection of voices for clinical testing and for speech technology applications.
Dates:2008-2011. Funded by:ESRC . Duration:3 Years. Researchers: Valerie Hazan, Rachel Baker
Speech processors for combined electrical and acoustic hearing
A substantial number of cochlear implant users have considerable residual hearing in the unimplanted ear and recent studies have demonstrated that the use of a contralateral hearing aid often provides significantly improved speech perception, particularly in noise. The factors responsible for bimodal benefits are not well understood, though it appears likely that they result mostly from the provision of complementary information across modalities, rather than true binaural interactions. The proposed work will examine factors likely to be important in optimising the bimodal transmission of speech spectral information, focusing on three aspects of place-coding. This research will both clarify our understanding of factors underlying bimodal benefits and help to develop clinically applicable methods for optimally combining an implant and a contralateral hearing aid, thus providing a highly cost-effective way to improve everyday perceptual performance in many users of cochlear implants.
Spoken Language Conversion with Accent Morphing
Spoken language conversion is the challenge of using synthesis systems to generate utterances in the voice of a speaker but in a language unknown to the speaker. Previous approaches have been based on voice conversion and voice adaptation technologies applied to the output of a foreign language TTS system. This inevitably reduces the quality and intelligibility of the output, since the source speaker will not be a good source of phonetic material in the new language. Our work contrasts previous work with a new approach that uses two synthesis systems: one in the source speaker's voice, one in the voice of a native speaker of the target language. Audio morphing technology is then exploited to correct the foreign accent of the source speaker, while at the same time trying to maintain his or her identity. In this project we aim to construct a spoken language conversion system using accent morphing and evaluate its performance in terms of intelligibility and speaker identity.
Dates:2006-. Researchers: Mark Huckvale, Kayoko Yanagisawa
The size code in the expression of anger and joy in speech
To test the "size code" hypothesis for encoding anger and joy in speech. According to the hypothesis, these two emotions are conveyed in speech by exaggerating or understating the body size of the speaker, just as nonhuman animals exaggerate or understate their body size to communicate threat or appeasement. We will conduct acoustic analysis of publicly available emotional speech databases, and synthesize Thai vowels with a 3D articulatory synthesizer using parameter manipulations suggested by the size code hypothesis, and asked Thai listeners to judge the body size and emotion of the speaker. Initial results are in support of the size code hypothesis.
Dates:2005. Funded by:Collaborative. Researchers: Yi Xu With Suthathip Chuenwattanapranithi, King Mongkutâs University of Technology Thonburi, Thailand.
The effects of phoneme discrimination and semantic therapies for speech perception deficits in aphasia
This project will investigate how children with specific reading difficulties (dyslexia) and those who are reading normally perceive the sounds of speech. To decode speech, listeners need to be able to ignore ‘irrelevant’ variation in the speech signal that is linked to differences in speaker, speaking style, accent, etc. It is claimed that children with SRD are more sensitive to these variations than other children. We will check this claim using tests in which we can manipulate specific acoustic patterns within the word. We will then test children’s perception of many different consonants to try and better understand what makes some more difficult to identify than others. Finally, we will test children’s ability to adapt to different speakers and speaking styles.
This project examines vowel perception and plasticity during second- language (L2) learning by adults. The study evaluates whether individuals learn to 'perceptually switch' between their L1 (first- language) and L2 vowel systems, and assesses the role of fine-grained phonetic detail in the learning process. Study 1 will use a new method to generate phonetically detailed L1 (first-language) and L2 perceptual vowel maps for native speakers of Norwegian, German, Spanish, and French. Study 2 will train matched groups of German and Spanish learners to identify English vowels and examine how their L1 and L2 vowel spaces change over time. Study 3 will train French speakers with varying English-language experience. The research will contribute to our scientific understanding of phonetic perception and plasticity, introduce methodological innovations, and help guide the development of new computer-based phonetic training methods.
Dates:2005-2008. Funded by:ESRC. Duration:3 Years. Researchers: Paul Iverson
The Effects of Pulse Rate in Cochlear Implants
Optimisation of voice pitch information in cochlear implant speech processing
Main aim: To improve the transmission of pitch-related temporal information through a cochlear implant. Importance and timeliness: Current cochlear implant speech processing methods have been optimised for speech intelligibility in deafened adults. They provide very limited information to signal variations in the pitch of speech, especially over the range of pitch that is significant for the deaf child both in communication and in the development of spoken language. Cochlear implants are now becoming provided to deaf children in increasing numbers, yet there has been minimal attention to processing methods adapted to their needs.
The main purpose of the SYNFACE project is to increase the possibilities for hard of hearing people to communicate by telephone. Many people use lip-reading during conversations, and this is especially important for hard of hearing people. However, this clearly doesn't work over the telephone!. This project aims to develop a talking face controlled by the incoming telephone speech signal. The talking face will facilitate speech understanding by providing lip-reading support. This method works with any telephone and is cost-effective compared to video telephony and text telephony that need compatible equipment at both ends.
Acoustic and visual enhancement of speech for computer-based auditory training
Does seeing the speaker help in learning tricky aspects of a new language? A synthetic face is used to support modelling of troublesome phonetic gestures by second language learners