登入選單
返回Google圖書搜尋
Maximum-likelihood Linear Regression Coefficients as Features for Speaker Recognition
註釋This dissertation addresses text-independent Automatic Speaker Verification (ASV) using features issued from Maximum Likelihood Linear Regression (MLLR) adaptation of Markov models with Gaussian mixture observation densities. MLLR transform coefficients obtained by adaptation of a speaker-independent model to speech data capture relevant cues characterizing a speaker. We focus on the MLLR-SVM paradigm classifying these features using Support Vector Machines (SVM). We propose a purely acoustic approach which avoids the need for transcripts and structural language constraints of previous systems by using Constrained MLLR (CMLLR) transforms together with Speaker Adaptive Training (SAT) of a Universal Background Model (UBM). We assess the impact of SAT and feature-space and model-space CMLLR transforms and we propose several alternative representations of CMLLR transforms based on the Singular Value Decomposition (SVD). We also assess inter-session variability compensation in CMLLR-SVM via Nuisance Attribute Projection (NAP). We use this framework to further develop a feature-level session compensation technique. We focus on multi-class (C)MLLR-SVM systems using LVCSR acoustic models. We perform a comprehensive experimental study of adaptation schemes exploring multiple axes such as front-end type, transform type, number of transforms, model type or training method. We draw numerous conclusions from it, namely the distinct behavior of CMLLR and MLLR adaptation which we analyze. We explore lattice MLLR adaptation as a means of dealing with erroneous transcripts as well as several fusion strategies at the feature and score levels.