Skip to main content
Top
Published in: Cluster Computing 3/2014

01-09-2014

Target speech feature extraction using non-parametric correlation coefficient

Authors: Sang Yeob Oh, Kyung-Yong Chung

Published in: Cluster Computing | Issue 3/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Speech recognition systems for the automobile have a few weaknesses, including failure to recognize speech due to the mixing of environment noise from inside and outside the car and from other voices. Therefore, this paper features a technique for extracting only the selected target voice from input sound that is a mixture of voices and noises. The feature for selective speech extraction composes a correlation map of auditory elements by using similarity between channels and continuity of time, and utilizes a method of extracting speech features by using a non-parametric correlation coefficient. This proposed method was validated by showing that the average distortion of separation of the technique decreased by 0.8630 dB. It was shown that the performance of the selective feature extraction utilizing a cross correlation is good, but overall, the selective feature extraction utilizing a non-parametric correlation is better.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Dupont, S., Luettin, J.: Audio-visual speech modelling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000) CrossRef Dupont, S., Luettin, J.: Audio-visual speech modelling for continuous speech recognition. IEEE Trans. Multimed. 2(3), 141–151 (2000) CrossRef
2.
go back to reference Gowdy, J.N., Subramanya, A., Bartels, C., Bilmes, J.: DBN-based muti-stream models for audio-visual speech recognition. In: Proc. IEEE International Conference Acoustics, Speech, and Signal Processing, pp. 993–996 (2004) Gowdy, J.N., Subramanya, A., Bartels, C., Bilmes, J.: DBN-based muti-stream models for audio-visual speech recognition. In: Proc. IEEE International Conference Acoustics, Speech, and Signal Processing, pp. 993–996 (2004)
3.
go back to reference Bilmes, J.A., Bartels, C.: Graphical model architectures for speech recognition. IEEE Signal Process. Mag. 22, 89–100 (2005) CrossRef Bilmes, J.A., Bartels, C.: Graphical model architectures for speech recognition. IEEE Signal Process. Mag. 22, 89–100 (2005) CrossRef
4.
go back to reference Schwartz, J.-L., Berthommier, F., Savariaux, C.: Seeing to hear better: evidence for early audio-visual interactions in speech identification. ERIC J. Rep.-Res. Cogn. 93(2), 69–78 (2004) Schwartz, J.-L., Berthommier, F., Savariaux, C.: Seeing to hear better: evidence for early audio-visual interactions in speech identification. ERIC J. Rep.-Res. Cogn. 93(2), 69–78 (2004)
5.
go back to reference Chibelushi, C.C., Deravi, F., Moson, J.S.: A review of speech-based bimodal recognition. IEEE Trans. Multimed. 4(1), 23–37 (2002) CrossRef Chibelushi, C.C., Deravi, F., Moson, J.S.: A review of speech-based bimodal recognition. IEEE Trans. Multimed. 4(1), 23–37 (2002) CrossRef
6.
go back to reference Pham, T.T., Kim, J.Y., Na, S.Y., Hwang, S.T.: Robust eye localization for lip reading in mobile environment. In: Proc. of SCIS&ISIS, Japan, pp. 385–388 (2008) Pham, T.T., Kim, J.Y., Na, S.Y., Hwang, S.T.: Robust eye localization for lip reading in mobile environment. In: Proc. of SCIS&ISIS, Japan, pp. 385–388 (2008)
7.
go back to reference Pham, T.T., Song, M.G., Kim, J.Y., Na, S.Y., Hwang, S.T.: A robust lip center detection in cell phone environment. In: Proc. of IEEE Symposium on Signal Processing and Information Technology, Sarajevo, pp. 390–395 (2008) Pham, T.T., Song, M.G., Kim, J.Y., Na, S.Y., Hwang, S.T.: A robust lip center detection in cell phone environment. In: Proc. of IEEE Symposium on Signal Processing and Information Technology, Sarajevo, pp. 390–395 (2008)
8.
go back to reference Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15, 1135–1150 (2004) CrossRef Hu, G., Wang, D.L.: Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Netw. 15, 1135–1150 (2004) CrossRef
9.
go back to reference Wu, X.H.: Auditory perception mechanism and computational auditory scene analysis. Post doctor research report (1997) Wu, X.H.: Auditory perception mechanism and computational auditory scene analysis. Post doctor research report (1997)
10.
go back to reference Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Commun. 34, 267–285 (2001) CrossRefMATH Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and uncertain acoustic data. Speech Commun. 34, 267–285 (2001) CrossRefMATH
11.
go back to reference Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of missing features for robust speech recognition. Speech Commun. 43(4), 275–296 (2004) CrossRef Raj, B., Seltzer, M.L., Stern, R.M.: Reconstruction of missing features for robust speech recognition. Speech Commun. 43(4), 275–296 (2004) CrossRef
12.
go back to reference Shao, Y., Wang, D.L.: Model-based sequential organization in cochannel speech. IEEE Trans. Audio Speech Lang. Process. 14, 289–298 (2006) CrossRef Shao, Y., Wang, D.L.: Model-based sequential organization in cochannel speech. IEEE Trans. Audio Speech Lang. Process. 14, 289–298 (2006) CrossRef
13.
14.
go back to reference Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006) CrossRef Cooke, M., Barker, J., Cunningham, S., Shao, X.: An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5), 2421–2424 (2006) CrossRef
15.
go back to reference Moharil, S., Lee, S.Y.: Load balancing on temporally heterogeneous cluster of workstations for parallel simulated annealing. Clust. Comput. 14(4), 295–310 (2011) CrossRef Moharil, S., Lee, S.Y.: Load balancing on temporally heterogeneous cluster of workstations for parallel simulated annealing. Clust. Comput. 14(4), 295–310 (2011) CrossRef
16.
go back to reference Hasswa, A., Hassanein, H.: A smart spaces architecture based on heterogeneous contexts, particularly social contexts. Clust. Comput. 15(4), 373–390 (2012) CrossRef Hasswa, A., Hassanein, H.: A smart spaces architecture based on heterogeneous contexts, particularly social contexts. Clust. Comput. 15(4), 373–390 (2012) CrossRef
17.
go back to reference Jung, Y.G., Han, M.S., Chung, K.Y., Lee, S.J.: Monotonicity and performance evaluation: applications to high speed and mobile networks. Clust. Comput. 15(4), 401–414 (2012) CrossRef Jung, Y.G., Han, M.S., Chung, K.Y., Lee, S.J.: Monotonicity and performance evaluation: applications to high speed and mobile networks. Clust. Comput. 15(4), 401–414 (2012) CrossRef
18.
19.
go back to reference Kim, J.H., Lee, D., Chung, K.Y.: Item recommendation based on context-aware model for personalized u-healthcare service. Multimed. Tools Appl. (2013). doi:10.1007/s11042-011-0920-0 Kim, J.H., Lee, D., Chung, K.Y.: Item recommendation based on context-aware model for personalized u-healthcare service. Multimed. Tools Appl. (2013). doi:10.​1007/​s11042-011-0920-0
21.
22.
go back to reference Lee, K.D., Nam, M.Y., Chung, K.Y., Lee, Y.H., Kang, U.G.: Context and profile based cascade classifier for efficient people detection and safety care system. Multimed. Tools Appl. 63(1), 27–44 (2013) CrossRef Lee, K.D., Nam, M.Y., Chung, K.Y., Lee, Y.H., Kang, U.G.: Context and profile based cascade classifier for efficient people detection and safety care system. Multimed. Tools Appl. 63(1), 27–44 (2013) CrossRef
23.
go back to reference Jung, Y.G., Han, M.S., Chung, K.Y., Lee, S.J.: A study of a valid frequency range using correlation analysis of throat signal. Inf. Int. Interdiscip. J. 14(11), 3791–3799 (2011) Jung, Y.G., Han, M.S., Chung, K.Y., Lee, S.J.: A study of a valid frequency range using correlation analysis of throat signal. Inf. Int. Interdiscip. J. 14(11), 3791–3799 (2011)
Metadata
Title
Target speech feature extraction using non-parametric correlation coefficient
Authors
Sang Yeob Oh
Kyung-Yong Chung
Publication date
01-09-2014
Publisher
Springer US
Published in
Cluster Computing / Issue 3/2014
Print ISSN: 1386-7857
Electronic ISSN: 1573-7543
DOI
https://doi.org/10.1007/s10586-013-0284-5

Other articles of this Issue 3/2014

Cluster Computing 3/2014 Go to the issue

Premium Partner