FMSP Lectures

Seminar information archive ~09/19Next seminarFuture seminars 09/20~


2016/01/18

14:00-15:00   Room #126 (Graduate School of Math. Sci. Bldg.)
Samuli Siltanen (University of Helsinki)
Blind deconvolution for human speech signals (ENGLISH)
[ Abstract ]
The structure of vowel sounds in human speech can be divided into two independent components. One of them is the “excitation signal,” which is a kind of buzzing sound created by the vocal folds flapping against each other. The other is the “filtering effect” caused by resonances in the vocal tract, or the confined space formed by the mouth and throat. The Glottal Inverse Filtering (GIF) problem is to (algorithmically) divide a microphone recording of a vowel sound into its two components. This “blind deconvolution” type task is an ill-posed inverse problem. Good-quality GIF filtering is essential for computer-generated speech needed for example by disabled people (think Stephen Hawking). Also, GIF affects the quality of synthetic speech in automatic information announcements and car navigation systems. Accurate estimation of the voice source from recorded speech is known to be difficult with current glottal inverse filtering (GIF) techniques, especially in the case of high-pitch speech of female or child subjects. In order to tackle this problem, the present study uses two different solution methods for GIF: Bayesian inversion and alternating minimization. The first method takes advantage of the Markov chain Monte Carlo (MCMC) modeling in defining the parameters of the vocal tract inverse filter. The filtering results are found to be superior to those achieved by the standard iterative adaptive inverse filtering (IAIF), but the computation is much slower than IAIF. Alternating minimization cuts down the computation time while retaining most of the quality improvement.
[ Reference URL ]
http://fmsp.ms.u-tokyo.ac.jp/FMSPLectures_Siltanen.pdf