Skip to main content

2017 | OriginalPaper | Buchkapitel

Spoken Keyword Retrieval Using Source and System Features

verfasst von : Maulik C. Madhavi, Hemant A. Patil, Nikhil Bhendawade

Erschienen in: Pattern Recognition and Machine Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, a novel excitation source-related feature set, viz., Teager Energy-based Mel Frequency Cepstral Coefficients (T-MFCC) is proposed for the task of spoken keyword detection. Experiments are carried out on TIMIT database for spoken keyword detection. Furthermore, state-of-the-art feature set, viz., MFCC is used as the baseline spectral feature set to represent implicitly vocal tract (i.e., system) information. The idea is to exploit the vocal-source (and its nonlinear coupling with formant) and system-related information embedded in the spoken query. Experimental results show % EER of 17.23 and 22.58 for MFCC and proposed T-MFCC features, respectively. However, the significant reduction in % EER, i.e., by 1.8 % (as compared to MFCC) is observed when evidences from T-MFCC and MFCC are combined using score-level fusion; indicating that proposed feature set captures complementary linguistic information (in the spoken keyword) than MFCC alone.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
2.
Zurück zum Zitat Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Audio Speech Lang. Process. 28(4), 357–366 (1980)CrossRef Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Audio Speech Lang. Process. 28(4), 357–366 (1980)CrossRef
3.
Zurück zum Zitat Dhananjaya, N., Yegnanarayana, B., Suryakanth, V.G.: Acoustic-phonetic information from excitation source for refining manner hypotheses of a phone recognizer. In: Proceedings of ICASSP, Prague, pp. 5252–5255. IEEE (2011) Dhananjaya, N., Yegnanarayana, B., Suryakanth, V.G.: Acoustic-phonetic information from excitation source for refining manner hypotheses of a phone recognizer. In: Proceedings of ICASSP, Prague, pp. 5252–5255. IEEE (2011)
4.
Zurück zum Zitat Dimitriadis, D., Maragos, P., Potamianos, A.: Robust AM-FM features for speech recognition. IEEE Signal Process. Lett. 12(9), 621–624 (2005)CrossRef Dimitriadis, D., Maragos, P., Potamianos, A.: Robust AM-FM features for speech recognition. IEEE Signal Process. Lett. 12(9), 621–624 (2005)CrossRef
5.
Zurück zum Zitat Gopalan, K., Chu, T.: Keyword word recognition using a fusion of spectral, cepstral and modulation features. In: Proceedings of CONIELECOMP, Cholula, pp. 234–238. IEEE (2012) Gopalan, K., Chu, T.: Keyword word recognition using a fusion of spectral, cepstral and modulation features. In: Proceedings of CONIELECOMP, Cholula, pp. 234–238. IEEE (2012)
6.
Zurück zum Zitat Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 6(10), 259–261 (1999)CrossRef Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Process. Lett. 6(10), 259–261 (1999)CrossRef
7.
Zurück zum Zitat Kaiser, J.F.: On a simple algorithm to calculate the energy’ of a signal. In: Proceedings of ICASSP, Albuquerque, pp. 381–384. IEEE (1990) Kaiser, J.F.: On a simple algorithm to calculate the energy’ of a signal. In: Proceedings of ICASSP, Albuquerque, pp. 381–384. IEEE (1990)
9.
Zurück zum Zitat Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37(11), 1641–1648 (1989)CrossRef Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 37(11), 1641–1648 (1989)CrossRef
10.
Zurück zum Zitat Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proceedings of EUROSPEECH, Rhodes, pp. 1895–1898 (1997) Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. In: Proceedings of EUROSPEECH, Rhodes, pp. 1895–1898 (1997)
11.
Zurück zum Zitat Narayana, K.V.S., Sreenivas, T.V.: Comparison of AM-FM based features for robust speech recognition. In: Proceedings of INTERSPEECH, Brighton, pp. 1545–1548 (2008) Narayana, K.V.S., Sreenivas, T.V.: Comparison of AM-FM based features for robust speech recognition. In: Proceedings of INTERSPEECH, Brighton, pp. 1545–1548 (2008)
12.
Zurück zum Zitat Patil, H.A., Parhi, K.K.: Development of TEO phase for speaker recognition. In: Proceedings of SPCOM, pp. 1–5 (2010) Patil, H.A., Parhi, K.K.: Development of TEO phase for speaker recognition. In: Proceedings of SPCOM, pp. 1–5 (2010)
13.
Zurück zum Zitat Patil, H.A., Parhi, K.K.: Novel variable length teager energy based features for person recognition from their hum. In: Proceedings of ICASSP, Dallas, pp. 4526–4529. IEEE (2010) Patil, H.A., Parhi, K.K.: Novel variable length teager energy based features for person recognition from their hum. In: Proceedings of ICASSP, Dallas, pp. 4526–4529. IEEE (2010)
14.
Zurück zum Zitat Patil, H.A., Basu, T.K.: The teager energy based features for identification of identical twins in multi-lingual environment. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 333–337. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30499-9_50 CrossRef Patil, H.A., Basu, T.K.: The teager energy based features for identification of identical twins in multi-lingual environment. In: Pal, N.R., Kasabov, N., Mudi, R.K., Pal, S., Parui, S.K. (eds.) ICONIP 2004. LNCS, vol. 3316, pp. 333–337. Springer, Heidelberg (2004). doi:10.​1007/​978-3-540-30499-9_​50 CrossRef
16.
Zurück zum Zitat Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Audio Speech Lang. Process. 26(1), 43–49 (1978)CrossRefMATH Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Audio Speech Lang. Process. 26(1), 43–49 (1978)CrossRefMATH
17.
Zurück zum Zitat Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanisms in the vocal tract. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modelling. NATO ASI Series, vol. 55. Springer, Dordrecht (1990). doi:10.1007/978-94-009-2037-8_10 Teager, H.M., Teager, S.M.: Evidence for nonlinear sound production mechanisms in the vocal tract. In: Hardcastle, W.J., Marchal, A. (eds.) Speech Production and Speech Modelling. NATO ASI Series, vol. 55. Springer, Dordrecht (1990). doi:10.​1007/​978-94-009-2037-8_​10
18.
Zurück zum Zitat Tejedor, J., Wang, D., King, S., Frankel, J., Colás, J.: A posterior probability-based system hybridisation and combination for spoken term detection. In: Proceedings of INTERSPEECH, Brighton, pp. 2131–2134 (2009) Tejedor, J., Wang, D., King, S., Frankel, J., Colás, J.: A posterior probability-based system hybridisation and combination for spoken term detection. In: Proceedings of INTERSPEECH, Brighton, pp. 2131–2134 (2009)
19.
Zurück zum Zitat Wallace, R., Vogt, R., Sridharan, S.: A phonetic search approach to the 2006 NIST spoken term detection evaluation. In: Proceedings of INTERSPEECH, Antwerp, pp. 2385–2388 (2007) Wallace, R., Vogt, R., Sridharan, S.: A phonetic search approach to the 2006 NIST spoken term detection evaluation. In: Proceedings of INTERSPEECH, Antwerp, pp. 2385–2388 (2007)
20.
Zurück zum Zitat Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK book (for HTK version 3.4) (2006) Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.A., Moore, G., Odell, J., Ollason, D., Povey, D., et al.: The HTK book (for HTK version 3.4) (2006)
Metadaten
Titel
Spoken Keyword Retrieval Using Source and System Features
verfasst von
Maulik C. Madhavi
Hemant A. Patil
Nikhil Bhendawade
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-69900-4_42