Jiaohua Tao

Chinese Academy of Sciences

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Trends

Author Order

Document Type

Co-Authors

Jilei Tian

Tsinghua University

Meng Zhang

Nanjing Drum Tower Hospital

Jani Nurminen

Tampere University

Xia Wang

Beijing Normal University

Cooperative Institutions

Nokia (Netherlands)

Nokia (Finland)

Tsinghua University

BMW of North America (United States)

Nokia (China)

Microsoft Research Asia (China)

Microsoft (United States)

China Mobile (China)

Beijing University of Posts and Telecommunications

University of Eastern Finland

Author Statistics

Papers

Citation

H-Index

i-10 index

Research Field

Phoneme cluster based state mapping for text-independent voice conversion

IEEE International Conference on Acoustics Speech and Signal Processing (2009)

Meng Zhang Jiaohua Tao Jani Nurminen Jilei Tian Xia Wang

This paper takes phonetic information into account for data alignment in text-independent voice conversion. Hidden Markov models are used for representing the phonetic structure of training speech. States belonging to same phoneme are grouped together to form a phoneme cluster. A state mapped codebook based transformation is established using information on the corresponding phoneme clusters from source and targets speech and weighted linear transform. For each source vector, several nearest clusters are considered simultaneously while mapping in order to generate a continuous and stable transform. Experimental results indicate that the proposed use of phonetic information increases the similarity between converted speech and target speech. The proposed technique is applicable to both intra-lingual and cross-lingual voice conversion.

Similarity (geometry)

10.1109/icassp.2009.4960575

Cite

Citations (7)

Phonetic anchor based state mapping for text-independent voice conversion

Meng Zhang Jiaohua Tao Jani Nurminen Jilei Tian Xia Wang

This paper describes a novel method for text-independent voice conversion using improved state mapping. HMM is used for representing the phonetic structure of training speech. Centroids of the common phonemes between source and target speech are utilized as phonetic anchors while establishing a mapping between acoustic spaces of source and target speakers. These phonetic anchors and weighted linear transform are used for creating a continuous parametric mapping from source to target speech parameters. The proposed technique is applicable to both intra-lingual and cross-lingual voice conversion. Experimental results show that state mapping is improved using proposed technique.

Centroid

10.1109/icosp.2008.4697232

Cite

Citations (2)