Speech Science Forum -- Mark Huckvale
23 January 2025, 3:00 pm–4:00 pm

Understanding dimensions of speaker variation found in large corpora
Event Information
Open to
- All
Availability
- Yes
Organiser
-
Victor Rosi
Location
-
G152Wakefield StreetLondonWC1N 1PF
This talk describes part of a project to develop a universal voice conversion system – that is a system which can change the voice characteristics of any speaker to any other speaker. While current voice conversion systems use a recording of the target speaker to identify the characteristics of the output voice, here we would like to specify the output voice using a set of controls representing the basic dimensions of speaker variation. This then raises the question of what these dimensions should be! This talk presents an investigation of a large corpus of audio recordings from thousands of speakers. Speaker embeddings (high dimensional vectors) are used to represent each speaker in the corpus and the dominant dimensions of the embedding space are uncovered using principal components analysis (PCA) and linear discriminant analysis (LDA). A greedy linear model explores how dimensions in the speaker embedding space relate to known voice characteristics such as average pitch height, vocal tract length and voice quality. We also explore the importance of speaker gender, age, and accent as well as audio signal quality in the embedding space. The availability of large corpora and foundation models provides new opportunities for original phonetic research.
Zoom link: https://ucl.zoom.us/j/92052680901
About the Speaker
Mark Huckvale
Emeritus Professor at University College London
Mark Huckvale is Emeritus Professor of Speech Sciences in the Department of Speech, Hearing and Phonetic Sciences at University College London. In a long career in Speech and Hearing Sciences, he has published over 100 research articles in areas involving speech recognition, speech synthesis, voice conversion, speech intelligibility and computational paralinguistics. He is best known for work in accent recognition, use of avatars to provide mental health therapy, hearing for speech, and voice analysis for the measurement of speaker state. He is currently CEO of Avatar Therapy Ltd which seeks to make available in clinical practice a novel therapy for relief from auditory hallucinations in schizophrenia.