XClose

UCL Psychology and Language Sciences

Home
Menu

Speech Science Forum -- Mark Huckvale

23 January 2025, 3:00 pm–4:00 pm

mh_pp

Understanding dimensions of speaker variation found in large corpora

Event Information

Open to

All

Availability

Yes

Organiser

Victor Rosi

Location

G15
2
Wakefield Street
London
WC1N 1PF

This talk describes part of a project to develop a universal voice conversion system – that is a system which can change the voice characteristics of any speaker to any other speaker. While current voice conversion systems use a recording of the target speaker to identify the characteristics of the output voice, here we would like to specify the output voice using a set of controls representing the basic dimensions of speaker variation. This then raises the question of what these dimensions should be! This talk presents an investigation of a large corpus of audio recordings from thousands of speakers. Speaker embeddings (high dimensional vectors) are used to represent each speaker in the corpus and the dominant dimensions of the embedding space are uncovered using principal components analysis (PCA) and linear discriminant analysis (LDA). A greedy linear model explores how dimensions in the speaker embedding space relate to known voice characteristics such as average pitch height, vocal tract length and voice quality. We also explore the importance of speaker gender, age, and accent as well as audio signal quality in the embedding space. The availability of large corpora and foundation models provides new opportunities for original phonetic research.

Zoom link: https://ucl.zoom.us/j/92052680901

About the Speaker

Mark Huckvale

Emeritus Professor at University College London

Mark Huckvale is Emeritus Professor of Speech Sciences in the Department of Speech, Hearing and Phonetic Sciences at University College London. In a long career in Speech and Hearing Sciences, he has published over 100 research articles in areas involving speech recognition, speech synthesis, voice conversion, speech intelligibility and computational paralinguistics. He is best known for work in accent recognition, use of avatars to provide mental health therapy, hearing for speech, and voice analysis for the measurement of speaker state. He is currently CEO of Avatar Therapy Ltd which seeks to make available in clinical practice a novel therapy for relief from auditory hallucinations in schizophrenia.