Skip to main content
Erschienen in: Neural Computing and Applications 7-8/2013

01.06.2013 | ICONIP 2011

Robust speech recognition based on independent vector analysis using harmonic frequency dependency

verfasst von: Soram Jun, Minook Kim, Myungwoo Oh, Hyung-Min Park

Erschienen in: Neural Computing and Applications | Ausgabe 7-8/2013

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper describes an algorithm that enhances speech by independent vector analysis (IVA) using harmonic frequency dependency for robust speech recognition. While the conventional IVA exploits the full-band uniform dependencies of each source signal, a harmonic clique model is introduced to improve the enhancement performance by modeling strong dependencies among multiples of fundamental frequencies. An IVA-based learning algorithm is derived to consider the non-holonomic constraint and the minimal distortion principle to reduce the unavoidable distortion of IVA, and the minimum power distortionless response beamformer is used as a pre-processing step. In addition, the algorithm compares the log-spectral features of the enhanced speech and observed noisy speech to identify time–frequency segments corrupted by noise and restores those with the cluster-based missing feature reconstruction technique. Experimental results demonstrate that the proposed method enhances recognition performance significantly in noisy environments, especially with competing interference.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Singh R, Stern RM, Raj B (2002) Model compensation and matched condition methods for robust speech recognition. In: Davis G (ed) Noise reduction in speech applications. CRC Press, Florida Singh R, Stern RM, Raj B (2002) Model compensation and matched condition methods for robust speech recognition. In: Davis G (ed) Noise reduction in speech applications. CRC Press, Florida
3.
Zurück zum Zitat Raj B, Parikh V, Stern RM (1997) The effects of background music on speech recognition accuracy. In: IEEE ICASSP, pp 851–854 Raj B, Parikh V, Stern RM (1997) The effects of background music on speech recognition accuracy. In: IEEE ICASSP, pp 851–854
4.
Zurück zum Zitat Haykin S (2000) Unsupervised adaptive filtering, volume 1: blind source separation. Wiley, New York Haykin S (2000) Unsupervised adaptive filtering, volume 1: blind source separation. Wiley, New York
5.
Zurück zum Zitat Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications. Academic Press, Oxford Comon P, Jutten C (2010) Handbook of blind source separation: independent component analysis and applications. Academic Press, Oxford
6.
Zurück zum Zitat Hyvärinen A, Harhunen J, Oja E (2001) Independent component analysis. Wiley, New YorkCrossRef Hyvärinen A, Harhunen J, Oja E (2001) Independent component analysis. Wiley, New YorkCrossRef
7.
Zurück zum Zitat Kim T, Attias HT, Lee S-Y, Lee T-W (2007) Blind source separation exploiting higher-order frequency dependencies. IEEE Trans Audio Speech Lang Process 15:70–79CrossRef Kim T, Attias HT, Lee S-Y, Lee T-W (2007) Blind source separation exploiting higher-order frequency dependencies. IEEE Trans Audio Speech Lang Process 15:70–79CrossRef
8.
Zurück zum Zitat Lee I, Jang G-J, Lee T-W (2009) Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals. IET Elect Lett 45(13):710–711CrossRef Lee I, Jang G-J, Lee T-W (2009) Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals. IET Elect Lett 45(13):710–711CrossRef
9.
Zurück zum Zitat Choi CH, Chang W, Lee S-Y (2012) Blind source separation of speech and music signals using harmonic frequency dependent independent vector analysis. IET Elect Lett 48(2):124–125CrossRef Choi CH, Chang W, Lee S-Y (2012) Blind source separation of speech and music signals using harmonic frequency dependent independent vector analysis. IET Elect Lett 48(2):124–125CrossRef
10.
Zurück zum Zitat Matsuoka K, Nakashima S (2001) Minimal distortion principle for blind source separation. In: International workshop on ICA and BSS, pp. 722–727 Matsuoka K, Nakashima S (2001) Minimal distortion principle for blind source separation. In: International workshop on ICA and BSS, pp. 722–727
11.
Zurück zum Zitat Raj B, Seltzer ML, Stern RM (2004) Reconstruction of missing features for robust speech recognition. Speech Commun 43:275–296CrossRef Raj B, Seltzer ML, Stern RM (2004) Reconstruction of missing features for robust speech recognition. Speech Commun 43:275–296CrossRef
12.
Zurück zum Zitat Amari SI, Chen TP, Cichocki A (2000) Nonholonomic orthogonal learning algorithms for blind source separation, Neural computation 12(6). MIT Press Cambridge, MA Amari SI, Chen TP, Cichocki A (2000) Nonholonomic orthogonal learning algorithms for blind source separation, Neural computation 12(6). MIT Press Cambridge, MA
13.
Zurück zum Zitat Kim L-H, Tashev I, Acero A (2010) Reverberated speech signal separation based on regularized subband feedforward ICA and instantaneous direction of arrival. In: IEEE ICASSP, pp 2678–2681 Kim L-H, Tashev I, Acero A (2010) Reverberated speech signal separation based on regularized subband feedforward ICA and instantaneous direction of arrival. In: IEEE ICASSP, pp 2678–2681
14.
Zurück zum Zitat Raj B, Stern RM (2005) Missing-feature methods for robust automatic speech recognition. IEEE Signal Process Mag 22:101–116CrossRef Raj B, Stern RM (2005) Missing-feature methods for robust automatic speech recognition. IEEE Signal Process Mag 22:101–116CrossRef
15.
Zurück zum Zitat Kim M, Kim J-S, Park H-M (2011) Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique. In: Proceedings of SPIE 8058, 80580D Kim M, Kim J-S, Park H-M (2011) Robust speech recognition using missing feature theory and target speech enhancement based on degenerate unmixing and estimation technique. In: Proceedings of SPIE 8058, 80580D
16.
Zurück zum Zitat Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, New Jersey Rabiner L, Juang BH (1993) Fundamentals of speech recognition. Prentice-Hall, New Jersey
17.
Zurück zum Zitat Price P, Fisher WM, Bernstein J, Pallet DS (1988) The DARPA 1000-word resource management database for continuous speech recognition. In: Proceedings of IEEE ICASSP, pp 651–654 Price P, Fisher WM, Bernstein J, Pallet DS (1988) The DARPA 1000-word resource management database for continuous speech recognition. In: Proceedings of IEEE ICASSP, pp 651–654
18.
Zurück zum Zitat Young SJ, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK book (for HTK version 3.4). University of Cambridge, Cambridge Young SJ, Evermann G, Gales M, Hain T, Kershaw D, Liu X, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland PC (2006) The HTK book (for HTK version 3.4). University of Cambridge, Cambridge
19.
Zurück zum Zitat Allen JB, Berkley DA (1979) Image method for efficiently simulating small-room acoustics. J Acoust Soc Am 65(4):943–950CrossRef Allen JB, Berkley DA (1979) Image method for efficiently simulating small-room acoustics. J Acoust Soc Am 65(4):943–950CrossRef
Metadaten
Titel
Robust speech recognition based on independent vector analysis using harmonic frequency dependency
verfasst von
Soram Jun
Minook Kim
Myungwoo Oh
Hyung-Min Park
Publikationsdatum
01.06.2013
Verlag
Springer-Verlag
Erschienen in
Neural Computing and Applications / Ausgabe 7-8/2013
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-012-1002-6

Weitere Artikel der Ausgabe 7-8/2013

Neural Computing and Applications 7-8/2013 Zur Ausgabe