Speech Science Forum 6th May - Dr. Shinji Watanabe
06 May 2021, 4:00 pm–5:00 pm
Please join us for May 6th for Dr. Shinji Watanabe's talk entitled, "Tackling Multispeaker Conversation Processing based on Speaker Diarization and Multispeaker Speech Recognition".
Event Information
Open to
- All
Availability
- Yes
Organiser
-
Dr. Antony Scott Trotter – Speech, Hearing and Phonetic Science
Title: Tackling Multispeaker Conversation Processing based on Speaker Diarization and Multispeaker Speech Recognition
Abstract:
Recently, speech recognition and understanding studies have shifted their focus from single-speaker automatic speech recognition (ASR) in controlled scenarios to more challenging and realistic multispeaker conversation analysis based on ASR and speaker diarization. The CHiME speech separation and recognition challenge is one of the attempts to tackle these new paradigms. This talk first describes the introduction and challenge results of the latest CHiME-6 challenge, focusing on recognizing multispeaker conversations in a dinner party scenario. The second part of this talk is to tackle this problem based on an emergent technique based on an end-to-end neural architecture. We first introduce an end-to-end single-microphone multispeaker ASR technique based on a recurrent neural network and transformer to show the effectiveness of the proposed method. Second, we extend this approach to leverage the benefit of the multi-microphone input and realize simultaneous speech separation and recognition within a single neural network trained only with the ASR objective. Finally, we also introduce our recent attempts of speaker diarization based on end-to-end neural architecture, including basic concepts, on-line extensions, and handling unknown numbers of speakers.
About the Speaker
Dr. Shinji Watanabe
Associate Professor at Carnegie Mellon University
Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda University, Tokyo, Japan. He was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan, from 2001 to 2011, a visiting scholar in Georgia institute of technology, Atlanta, GA in 2009, and a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA USA from 2012 to 2017. Prior to the move to Carnegie Mellon University, he was an associate research professor at Johns Hopkins University, Baltimore, MD USA from 2017 to 2020. His research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing. He has been published more than 200 papers in peer-reviewed journals and conferences and received several awards, including the best paper award from the IEEE ASRU in 2019. He served as an Associate Editor of the IEEE Transactions on Audio Speech and Language Processing. He was/has been a member of several technical committees, including the APSIPA Speech, Language, and Audio Technical Committee (SLA), IEEE Signal Processing Society Speech and Language Technical Committee (SLTC), and Machine Learning for Signal Processing Technical Committee (MLSP).
More about Dr. Shinji Watanabe