Skip to main content

2008 | OriginalPaper | Buchkapitel

42. Vector-Based Spoken Language Classification

verfasst von : Haizhou Li, Dr., Bin Ma, Dr., Chin-Hui Lee, Dr.

Erschienen in: Springer Handbook of Speech Processing

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This chapter presents a vector space characterization (VSC) approach to automatic spoken language classification. It is assumed that the space of all spoken utterances can be represented by a universal set of fundamental acoustic units common to all languages. We address research issues related to defining the set of fundamental acoustic units, modeling these units, transcribing speech utterances with these unit models and designing vector-based decision rules for spoken language classification. The proposed VSC approach is evaluated on the 1996 and 2003 National Institute of Standards and Technology (NIST) language recognition evaluation tasks. It is shown that the VSC framework is capable of incorporating any combination of existing vector-based feature representations and classifier designs. We will demonstrate that the VSC-based classification systems achieve competitively low error rates for both spoken language identification and verification.
The chapter is organized as follows. In Sect. 42.1, we introduce the concept of vector space characterization of spoken utterance and establish the notion of acoustic letter, acoustic word and spoken document. In Sect. 42.2 we discuss acoustic segment modeling in relation to augmented phoneme inventory. In Sect. 42.3, we discuss voice tokenization and spoken document vectorization. In Sect. 42.4, we discuss vector-based classifier design strategies. In Sect. 42.5, we report several experiments as the case study of classifier design, and the analytic study of front- and back-end. Finally in Sect. 42.6, we summarize the discussions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
42.1.
Zurück zum Zitat M.A. Zissman: Comparison of four approaches to automatic language identification of telephone speech, IEEE Trans. Speech Audio Process. 4(1), 31-44 (1996), 1CrossRef M.A. Zissman: Comparison of four approaches to automatic language identification of telephone speech, IEEE Trans. Speech Audio Process. 4(1), 31-44 (1996), 1CrossRef
42.2.
Zurück zum Zitat C.-H. Lee, F.K. Soong, K.K. Paliwal (Eds.): Automatic Speech and Speaker Recognition: Advanced Topics (Kluwer Academic, Dordrecht 1996) C.-H. Lee, F.K. Soong, K.K. Paliwal (Eds.): Automatic Speech and Speaker Recognition: Advanced Topics (Kluwer Academic, Dordrecht 1996)
42.3.
Zurück zum Zitat J.L. Gauvain, L. Lamel: Large-vocabulary continuous speech recognition: advances and applications, Proc. IEEE 88(8), 1181-1200 (2000)CrossRef J.L. Gauvain, L. Lamel: Large-vocabulary continuous speech recognition: advances and applications, Proc. IEEE 88(8), 1181-1200 (2000)CrossRef
42.4.
Zurück zum Zitat G. Salton: The SMART Retrieval System (Prentice-Hall, Englewood Cliffs 1971) G. Salton: The SMART Retrieval System (Prentice-Hall, Englewood Cliffs 1971)
42.5.
Zurück zum Zitat F. Sebastiani: Machine learning in automated text categorization, ACM Comput. Surv. 34(1), 1-47 (2002)CrossRef F. Sebastiani: Machine learning in automated text categorization, ACM Comput. Surv. 34(1), 1-47 (2002)CrossRef
42.6.
Zurück zum Zitat S. Gao, W. Wu, C.-H. Lee, T.-S. Chua: A MFoM learning approach to robust multiclass multi-label text categorization, Proc. ICML (2004) pp. 42-49 S. Gao, W. Wu, C.-H. Lee, T.-S. Chua: A MFoM learning approach to robust multiclass multi-label text categorization, Proc. ICML (2004) pp. 42-49
42.7.
Zurück zum Zitat S. Gao, B. Ma, H. Li, C.-H. Lee: A text-categorization approach to spoken language identification, Proc. Interspeech (2005) pp. 2837-2840 S. Gao, B. Ma, H. Li, C.-H. Lee: A text-categorization approach to spoken language identification, Proc. Interspeech (2005) pp. 2837-2840
42.8.
Zurück zum Zitat J.R. Bellegarda: Exploiting latent semantic information in statistical language modeling, Proc. IEEE 88(8), 1279-1296 (2000)CrossRef J.R. Bellegarda: Exploiting latent semantic information in statistical language modeling, Proc. IEEE 88(8), 1279-1296 (2000)CrossRef
42.9.
Zurück zum Zitat T.J. Hazen: Automatic Language Identification Using a Segment-based Approach (MIT, Cambridge 1993), MS Thesis T.J. Hazen: Automatic Language Identification Using a Segment-based Approach (MIT, Cambridge 1993), MS Thesis
42.10.
Zurück zum Zitat K.M. Berkling, E. Barnard: Analysis of phoneme-based features for language identification, Proc. ICASSP (1994) pp. 289-292 K.M. Berkling, E. Barnard: Analysis of phoneme-based features for language identification, Proc. ICASSP (1994) pp. 289-292
42.11.
Zurück zum Zitat K.M. Berkling, E. Barnard: Language identification of six languages based on a common set of broad phonemes, Proc. ICSLP (1994) pp. 1891-1894 K.M. Berkling, E. Barnard: Language identification of six languages based on a common set of broad phonemes, Proc. ICSLP (1994) pp. 1891-1894
42.12.
Zurück zum Zitat C. Corredor-Ardoy, J.L. Gauvain, M. Adda-Decker, L. Lamel: Language identification with language-independent acoustic models, Proc. Eurospeech, Vol. 1 (1997) pp. 55-58 C. Corredor-Ardoy, J.L. Gauvain, M. Adda-Decker, L. Lamel: Language identification with language-independent acoustic models, Proc. Eurospeech, Vol. 1 (1997) pp. 55-58
42.13.
Zurück zum Zitat C.-H. Lee, F.K. Soong, B.-H. Juang: A segment model based approach to speech recognition, Proc. ICASSP (1988) pp. 501-504 C.-H. Lee, F.K. Soong, B.-H. Juang: A segment model based approach to speech recognition, Proc. ICASSP (1988) pp. 501-504
42.14.
Zurück zum Zitat A.K.V.S. Jayram, V. Ramasubramanian, T.V. Sreenivas: Language identification using parallel sub-word recognition, Proc. ICASSP (2003) pp. 32-35 A.K.V.S. Jayram, V. Ramasubramanian, T.V. Sreenivas: Language identification using parallel sub-word recognition, Proc. ICASSP (2003) pp. 32-35
42.15.
Zurück zum Zitat J.L. Hieronymus: ASCII phonetic symbols for the worldʼs languages: Worldbet, Technical Report AT&T Bell Labs (1994) J.L. Hieronymus: ASCII phonetic symbols for the worldʼs languages: Worldbet, Technical Report AT&T Bell Labs (1994)
42.16.
Zurück zum Zitat Y.K. Muthusamy, N. Jain, R.A. Cole: Perceptual benchmarks for automatic language identification, Proc. ICASSP (1994) pp. 333-336 Y.K. Muthusamy, N. Jain, R.A. Cole: Perceptual benchmarks for automatic language identification, Proc. ICASSP (1994) pp. 333-336
42.17.
Zurück zum Zitat B. Ma, H. Li, C.-H. Lee: An acoustic segment modeling approach to automatic language identification, Proc. Interspeech (2005) pp. 2829-2832 B. Ma, H. Li, C.-H. Lee: An acoustic segment modeling approach to automatic language identification, Proc. Interspeech (2005) pp. 2829-2832
42.18.
Zurück zum Zitat L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77(2), 257-286 (1989)CrossRef L.R. Rabiner: A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77(2), 257-286 (1989)CrossRef
42.19.
Zurück zum Zitat H. Li, B. Ma, C.-H. Lee: A vector space modeling approach to spoken language identification, IEEE Trans. Audio Speech Language Process. 15(1), 271-284 (2007)CrossRef H. Li, B. Ma, C.-H. Lee: A vector space modeling approach to spoken language identification, IEEE Trans. Audio Speech Language Process. 15(1), 271-284 (2007)CrossRef
42.20.
Zurück zum Zitat H.K.J. Kuo, C.-H. Lee: Discriminative training of natural language call routers, IEEE Trans. Speech Audio Process. 11(1), 24-35 (2003)CrossRef H.K.J. Kuo, C.-H. Lee: Discriminative training of natural language call routers, IEEE Trans. Speech Audio Process. 11(1), 24-35 (2003)CrossRef
42.21.
Zurück zum Zitat J. Chu-Carroll, B. Carpenter: Vector-based natural languagecall routing, Comput. Linguist. 25(3), 361-388 (1999) J. Chu-Carroll, B. Carpenter: Vector-based natural languagecall routing, Comput. Linguist. 25(3), 361-388 (1999)
42.22.
Zurück zum Zitat H. Li, B. Ma: A phonotactic language model for spoken language identification, Proc. ACL (2005) pp. 515-522 H. Li, B. Ma: A phonotactic language model for spoken language identification, Proc. ACL (2005) pp. 515-522
42.23.
Zurück zum Zitat G.K. Zipf: Human Behavior and the Principal of Least Effort, An Introduction to Human Ecology (Addison-Wesley, Reading 1949) G.K. Zipf: Human Behavior and the Principal of Least Effort, An Introduction to Human Ecology (Addison-Wesley, Reading 1949)
42.24.
Zurück zum Zitat K.S. Jones: A statistical interpretation of term specificity and its application in retrieval, J. Doc. 28, 11-20 (1972)CrossRef K.S. Jones: A statistical interpretation of term specificity and its application in retrieval, J. Doc. 28, 11-20 (1972)CrossRef
42.25.
Zurück zum Zitat C.-W. Hsu, C.-J. Lin: A comparison of methods for multiclass support vector machines, IEEE T. Neural Netw. 13(2), 415-425 (2002)CrossRef C.-W. Hsu, C.-J. Lin: A comparison of methods for multiclass support vector machines, IEEE T. Neural Netw. 13(2), 415-425 (2002)CrossRef
42.26.
Zurück zum Zitat V. Vapnik: The Nature of Statistical Learning Theory (Springer, Berlin, Heidelberg 1995)CrossRefMATH V. Vapnik: The Nature of Statistical Learning Theory (Springer, Berlin, Heidelberg 1995)CrossRefMATH
42.27.
Zurück zum Zitat A. Statnikov, C. Aliferis, I. Tsamardinos, D. Hardin, S. Levy: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis, Bioinformatics 21(5), 631-643 (2005)CrossRef A. Statnikov, C. Aliferis, I. Tsamardinos, D. Hardin, S. Levy: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis, Bioinformatics 21(5), 631-643 (2005)CrossRef
42.28.
Zurück zum Zitat J. Weston, C. Watkins: Multi-class support vector machines, Tech. Rep. CSD-TR-98-04 (University of London, London 1998) J. Weston, C. Watkins: Multi-class support vector machines, Tech. Rep. CSD-TR-98-04 (University of London, London 1998)
42.29.
Zurück zum Zitat Y. Lee, Y. Lin, G. Wahba: Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data, J. Am. Stat. Assoc. 99(465), 67-81 (2004)MathSciNetCrossRefMATH Y. Lee, Y. Lin, G. Wahba: Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data, J. Am. Stat. Assoc. 99(465), 67-81 (2004)MathSciNetCrossRefMATH
42.30.
Zurück zum Zitat J.C. Platt, N. Cristianini, J. Shawe-Taylor: Large margin DAGʼs for multiclass classification, Advances in Neural Information Processing Systems, Vol. 12 (Cambridge, MIT Press 2000) pp. 547-553 J.C. Platt, N. Cristianini, J. Shawe-Taylor: Large margin DAGʼs for multiclass classification, Advances in Neural Information Processing Systems, Vol. 12 (Cambridge, MIT Press 2000) pp. 547-553
42.31.
Zurück zum Zitat S. Katagiri, C.-H. Lee: A New Hybrid Algorithm for Speech Recognition Based on HMM Segmentation and Discriminative Classification, IEEE Trans. Speech Audio Process. 1(4), 421-430 (1993)CrossRef S. Katagiri, C.-H. Lee: A New Hybrid Algorithm for Speech Recognition Based on HMM Segmentation and Discriminative Classification, IEEE Trans. Speech Audio Process. 1(4), 421-430 (1993)CrossRef
42.32.
Zurück zum Zitat K.-Y. Su, C.-H. Lee: Speech Recognition using Weighted HMM and Subspace Projection Approaches, IEEE Trans. Speech Audio Process. 2(1), 69-79 (1994)CrossRef K.-Y. Su, C.-H. Lee: Speech Recognition using Weighted HMM and Subspace Projection Approaches, IEEE Trans. Speech Audio Process. 2(1), 69-79 (1994)CrossRef
42.33.
Zurück zum Zitat M. Kobayashi, M. Aono: Vector space models for search and cluster mining. In: Survey of Text Mining, ed. by M.W. Berry (Springer, Berlin, Heidelberg 2003) M. Kobayashi, M. Aono: Vector space models for search and cluster mining. In: Survey of Text Mining, ed. by M.W. Berry (Springer, Berlin, Heidelberg 2003)
42.34.
Zurück zum Zitat R.O. Duda, P.E. Hart, D.G. Stork: Pattern Classification (Wiley, New York 2001)MATH R.O. Duda, P.E. Hart, D.G. Stork: Pattern Classification (Wiley, New York 2001)MATH
42.35.
Zurück zum Zitat K. Crammer, Y. Singer: Improved Output Coding for Classification Using Continuous Relaxation, Proc. NIPS (2000) pp. 437-443 K. Crammer, Y. Singer: Improved Output Coding for Classification Using Continuous Relaxation, Proc. NIPS (2000) pp. 437-443
42.36.
Zurück zum Zitat S. Haykin: Neural Networks: A Comprehensive Foundation (McMillan, London 1994)MATH S. Haykin: Neural Networks: A Comprehensive Foundation (McMillan, London 1994)MATH
42.37.
Zurück zum Zitat J. Li, S. Yaman, C.-H. Lee, B. Ma, R. Tong, D. Zhu, H. Li: Language recognition based on score distribution feature vectors and discriminative classfier fusion, Proc IEEE Odyssey Speaker and Language Reognition Workshop (2006) J. Li, S. Yaman, C.-H. Lee, B. Ma, R. Tong, D. Zhu, H. Li: Language recognition based on score distribution feature vectors and discriminative classfier fusion, Proc IEEE Odyssey Speaker and Language Reognition Workshop (2006)
42.38.
Zurück zum Zitat S. Katagiri, B.-H. Juang, C.-H. Lee: Pattern Recognition Using A Generalized Probabilistic Descent Method, Proc. IEEE 86(11), 2345-2373 (1998)CrossRef S. Katagiri, B.-H. Juang, C.-H. Lee: Pattern Recognition Using A Generalized Probabilistic Descent Method, Proc. IEEE 86(11), 2345-2373 (1998)CrossRef
42.39.
Zurück zum Zitat J.L. Gauvain, A. Messaoudi, H. Schwenk: Language recognition using phone lattices, Proc. ICSLP (2004) pp. 1215-1218 J.L. Gauvain, A. Messaoudi, H. Schwenk: Language recognition using phone lattices, Proc. ICSLP (2004) pp. 1215-1218
42.40.
Zurück zum Zitat E. Singer, P.A. Torres-Carrasquillo, T.P. Gleason, W.M. Campbell, D.A. Reynolds: Acoustic, phonetic and discriminative approaches to automatic language recognition, Proc. Eurospeech (2003) pp. 1345-1348 E. Singer, P.A. Torres-Carrasquillo, T.P. Gleason, W.M. Campbell, D.A. Reynolds: Acoustic, phonetic and discriminative approaches to automatic language recognition, Proc. Eurospeech (2003) pp. 1345-1348
42.41.
Zurück zum Zitat P.A. Torres-Carassquilo, E. Singer, M.A. Kohler, R.J. Greene, D.A. Reynolds, J.R. Deller Jr.: Approaches to language identification using Gaussian mixture models and shifted delta cepstral features, Proc. ICSLP (2002) pp. 89-92 P.A. Torres-Carassquilo, E. Singer, M.A. Kohler, R.J. Greene, D.A. Reynolds, J.R. Deller Jr.: Approaches to language identification using Gaussian mixture models and shifted delta cepstral features, Proc. ICSLP (2002) pp. 89-92
42.42.
Zurück zum Zitat B.P. Lim, H. Li, B. Ma: Using local and global phonotactic features in Chinese dialect identification, Proc. ICASSP (2005) pp. 577-580 B.P. Lim, H. Li, B. Ma: Using local and global phonotactic features in Chinese dialect identification, Proc. ICASSP (2005) pp. 577-580
42.43.
Zurück zum Zitat P.A. Torres-Carrasquillo, D.A. Reynolds, R.J. Deller Jr.: Language identification using Gaussian mixture model tokenization, Proc. ICASSP (2002) pp. 757-760 P.A. Torres-Carrasquillo, D.A. Reynolds, R.J. Deller Jr.: Language identification using Gaussian mixture model tokenization, Proc. ICASSP (2002) pp. 757-760
42.44.
Zurück zum Zitat T.G. Dietterich, G. Bakiri: Solving multiclass learning problems via error-correcting output codes, J Artif. Intell. Res. 2, 263-286 (1995)MATH T.G. Dietterich, G. Bakiri: Solving multiclass learning problems via error-correcting output codes, J Artif. Intell. Res. 2, 263-286 (1995)MATH
42.45.
Zurück zum Zitat H. Li, B. Ma, R. Tong: Vector-Based Spoken Language Recognition using Output Coding, Proc. Interspeech (2006) H. Li, B. Ma, R. Tong: Vector-Based Spoken Language Recognition using Output Coding, Proc. Interspeech (2006)
42.46.
Zurück zum Zitat R. Tong, B. Ma, D. Zhu, H. Li, E.S. Chng: Integrating acoustic, prosodic and phonotactic features for spoken language identification, Proc. ICASSP 1, 205-208 (2006) R. Tong, B. Ma, D. Zhu, H. Li, E.S. Chng: Integrating acoustic, prosodic and phonotactic features for spoken language identification, Proc. ICASSP 1, 205-208 (2006)
42.47.
Zurück zum Zitat W.M. Campbell, D.E. Sturim, D.A. Reynolds: Support vector machines using GMM Supervectors for speaker recognition, IEEE Signal Process. Lett. 13(5), 308-311 (2006)CrossRef W.M. Campbell, D.E. Sturim, D.A. Reynolds: Support vector machines using GMM Supervectors for speaker recognition, IEEE Signal Process. Lett. 13(5), 308-311 (2006)CrossRef
42.48.
Zurück zum Zitat A. Stolcke, L. Ferrer, S. Kajarekar, E. Shriberg, A. Venkataraman: MLLR transforms as features in speaker recognition, Proc. Interspeech (2005) pp. 2425-2428 A. Stolcke, L. Ferrer, S. Kajarekar, E. Shriberg, A. Venkataraman: MLLR transforms as features in speaker recognition, Proc. Interspeech (2005) pp. 2425-2428
42.49.
Zurück zum Zitat W.M. Campbell, J.P. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carrasquillo: Support vector machines for speaker and language recognition, Comput. Speech. Lang. 20(2-3), 210-229 (2005)CrossRef W.M. Campbell, J.P. Campbell, D.A. Reynolds, E. Singer, P.A. Torres-Carrasquillo: Support vector machines for speaker and language recognition, Comput. Speech. Lang. 20(2-3), 210-229 (2005)CrossRef
Metadaten
Titel
Vector-Based Spoken Language Classification
verfasst von
Haizhou Li, Dr.
Bin Ma, Dr.
Chin-Hui Lee, Dr.
Copyright-Jahr
2008
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-540-49127-9_42

Neuer Inhalt