Acoustic modeling for distant multitalker speech recognition with single and multichannel branches abstract. This paper presents a novel heterogeneousinput multichannel acoustic model am that has both singlechannel and multichannel input branches. A complete overview of distant automatic speech recognition the performance of conventional automatic speech. Books and tutorials links and slides these publications describe recent work on the general topic of distant speech recognition. This distant speech french corpus was recorded with 21 speakers who acted scenarios of activities of daily living. A complete overview of distant automatic speech recognition the performance of conventional automatic speech recognition asr systems degrades dramatically as soon as the microphone is moved away from the mouth of the speaker. An experiment for distant speech recognition on the ami sdm corpus shows that 10layer plain and highway lstm networks presented. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distanttalking acoustic models. A network of deep neural networks for distant speech. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and. Volume 106 pages 1150 january 2019 download full issue.
New era for robust speech recognition springerlink. A complete overview of distant automatic speech recognition. This is a unique book that covers the entire distant automatic speech recognition asr problem in one single volume. Note that the performance may not be optimal in this case. Distant speech recognition by matthias woelfel overdrive. This book covers the stateoftheart in deep neuralnetworkbased methods for noise robustness in distant speech recognition applications. Presents several applications of the methods and technologies described in this book accompanying website with open source software and tools to construct stateoftheart distant speech recognition systems this reference will be an invaluable resource for researchers, developers, engineers and other professionals, as well as advanced students. In this work, we outline five pressing problems in the dsr research field, and we make initial proposals for their solutions. Despite the significant progress made in the last years, stateoftheart speech recognition technologies provide a satisfactory performance only. D on farfield speech recognition in the middle of 2007.
Pdf deep learning for distant speech recognition mirco. Reverberation and noise are known to severely affect the automatic speech recognition asr performance of speech recorded by distant microphones. Abstract distant speech recognition dsr holds out the promise of providing a natural human computer interface in that it enables verbal interactions with computers without the necessity of donning intrusive body or headmounted devices. Techniques for noise robustness in automatic speech. Covers the entire topic of distant asr and offers practical solutions to overcome the problems related to it provides documentation and sample scripts to enable readers to construct stateoftheart distant speech recognition systems gives relevant background information in acoustics and filter techniques, explains the extraction. These toolkits are meant for facilitating research and development of automatic distant speech recognition. He has been involved in two european research projects on distant speech recognition. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation. The performance of conventional automatic speech recognition asr systems degrades dramatically as soon as the microphone is moved away. Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Automatic speech recognition asr is the process and the related technology for converting the speech signal into its corresponding sequence of words or other linguistic entities by means of algorithms implemented in a device, a computer, or computer clusters deng and oshaughnessy, 2003. Two original solutions are presented, based on information fusion approaches at di erent levels of the recognition system, one at frontend stage and one at postdecoding stage, namely for the problems of channel selection cs and.
However, the number of channels is a tunable parameter and it can work for a single microphone as well. Strategies for distant speech recognitionin reverberant. The latter disturbances severely hamper the intelligibility of a speech signal, making distant speech recognition dsr one of the major open. Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. Abstractdistant speech recognition dsr holds out the promise of providing a. A prominent limitation of current systems lies in the lack of matching and communication between the various technologies involved in. Deep learning for distant speech recognition nasaads. Automatic speech recognition an overview sciencedirect. These problems have to be overcome in order for speech interaction between human and computer to. On the contrary, 10layer residual lstm networks provided the lowest wer 41.
New era for robust speech recognition exploiting deep. He has given seminars in speech and robust speech recognition and has published more than 25 papers in this field. For instance, the array processing community ignores speaker adaptation techniques, which can. Among the other achievements, building computers that understand speech represents a crucial leap. The latter disturbances severely hamper the intelligibility of a speech signal, making distant speech recognition dsr one of the major open challenges in the field. A network of deep neural networks for distant speech recognition. John mcdonough a complete overview of distant automatic speech recognition. They include a book i coauthored with matthias woelfel as well a book chapter that i am currently preparing with kenichi kumatani, both published by wiley. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant talking acoustic models. It incorporates knowledge and research in the computer. Distant speech recognition wiley telecom books ieee xplore. Techniques acting at the decoding stage, such as our novel approach called driven decoding algorithm dda, gave better speech recognition results than the baseline and other approaches. If ground breaking progress is to be made in the emerging field of distant speech recognition dsr, this abysmal state of affairs must change.
Covers the entire topic of distant asr and offers practical solutions to overcome the problems related to it. This is due to a broad variety of effects such as background noise, overlapping speech from other speakers, and reverberation. A complete system for distant speech recognition dsr typically consists of several distinct components. Distant speech recognition presents a contemporary and comprehensive description of both theoretic abstraction and practical issues inherent in the distant asr problem. Time spectral analysis perceptually motivated representation spectral estimation and analysis cepstral processing comparison between mel. Cnns for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone sdm and multiple distant microphones mdm. The book reflects the state of the art in important areas of speech and audio signal processing.
Covers the entire topic of distant asr and offers practical solutions to overcome the problems related to it prov. Speech enhancement, dereverberation, echo cancellation and. In the mdm case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the cnn. Microphone array processing for distant speech recognition. Despite the remarkable progress recently made in distant speech recognition, stateoftheart technology still suffers from a lack of robustness, especially when adverse acoustic conditions characterized by nonstationary noises and reverberation are met. It is also known as automatic speech recognition asr, computer speech recognition or speech to text stt. In addition to her work on core speech recognition technology, she has also developed several algorithms for noise compensation, and was the prime architect of cmus awardwinning submission to the 2001 naval research labs challenge on automatic recognition of speech in noisy environments spine. Distant speech recognitionpresents a contemporary and comprehensive description of both theoretic abstraction and practical issues inherent in the distant asr problem. While conventional asr systems perform miserably for speech captured with farfield sensors, there are a number of techniques developed in other areas of signal processing that can mitigate the deleterious effects of noise and reverberation, as well as separating speech from overlapping speakers. The performance of conventional automatic speech recognition. Therefore, we must deal with reverberation if we are to realize highperformance handsfree speech recognition. The system has been designed mainly for distant speech recognition, which most usually involves more than one microphone microphone array. Speech feature extraction distant speech recognition. The performance of conventional automatic speech recognition asr systems.
787 101 1201 918 1164 346 656 350 1305 946 1605 783 1498 1381 229 1293 1383 164 108 1293 557 403 1369 424 1030 218 943 973 731 586