Bilinear model for speaker adaptation using tensor analysis (Speech and audio processing and translation)

Y. Jeong,S P Yi

Bilinear model for speaker adaptation using tensor analysis (Speech and audio processing and translation)

2010

Y. Jeong
S P Yi

A novel speaker adaptation method based on two-way analysis of training speakers is described. A set of training models is expressed as a tensor and is decomposed into two factors using nonlinear iterative partial least squares, producing a bilinear model. The resulting model has bases of lower dimension and more free parameters than those of eigenvoice, enabling more elaborate modelling for a moderate amount of adaptation data. Results from the isolated-word recognition test show that the proposed model outperforms both eigenvoice and maximum likelihood linear regression (MLLR) for adaptation data longer than 15 s. Moreover, the proposed method can straightforwardly be extended to n-way analysis, e.g. for simultaneous adaptation of speaker, environment, etc.

Keywords:

Bilinear interpolation
Partial least squares regression
Dimension (vector space)
Speech recognition
Free parameter
Translation (geometry)
Computer science
Set (abstract data type)
Tensor
Audio signal processing

Correction
Cite
Save
Machine Reading By IdeaReader

References

Citations