Skip to main content

2016 | OriginalPaper | Buchkapitel

Better Phoneme Recognisers Lead to Better Phoneme Posteriorgrams for Search on Speech? An Experimental Analysis

verfasst von : Paula Lopez-Otero, Laura Docio-Fernandez, Carmen Garcia-Mateo

Erschienen in: Advances in Speech and Language Technologies for Iberian Languages

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Phoneme posteriorgrams are widely used for speech representation when performing query-by-example search on speech. These posteriorgrams are computed by obtaining the per-frame a posteriori probability of each unit in a phoneme recogniser, regardless the architecture of this phoneme recogniser. It is straightforward to believe that the higher the quality of the phone transcriptions generated by a phoneme recogniser, the higher the quality of its resulting phoneme posteriorgrams; however, to the best of our knowledge, no analysis exist proving this statement. This paper aims at investigating whether there is a correlation between the phone error rate of a recogniser and the maximum term weighted value obtained when performing query-by-example search on speech. Experiments on the Albayzin corpus in Spanish language showed a slight correlation between these two metrics, which suggests that the goodness of phoneme posteriorgram representation is somehow related to phone error rate, but there are other factors that affect their performance in search on speech tasks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The Spoken Term Detection (STD) 2006 Evaluation Plan, National Institute of Standards and Technology (NIST): http://​www.​itl.​nist.​gov/​iad/​mig/​tests/​std/​2006/​docs/​std06-evalplan-v10.​pdf.
 
Literatur
1.
Zurück zum Zitat Abad, A., Astudillo, R., Trancoso, I.: The L2F spoken web search system for Mediaeval 2013. In: Proceedings of the MediaEval 2013 Workshop (2013) Abad, A., Astudillo, R., Trancoso, I.: The L2F spoken web search system for Mediaeval 2013. In: Proceedings of the MediaEval 2013 Workshop (2013)
2.
Zurück zum Zitat Anguera, X., Metze, F., Buzo, A., Szöke, I., Rodriguez-Fuentes, L.: The spoken web search task. In: Proceedings of the MediaEval 2013 Workshop (2013) Anguera, X., Metze, F., Buzo, A., Szöke, I., Rodriguez-Fuentes, L.: The spoken web search task. In: Proceedings of the MediaEval 2013 Workshop (2013)
3.
Zurück zum Zitat Anguera, X., Rodriguez-Fuentes, L., Szöke, I., Buzo, A., Metze, F.: Query by example search on speech at MediaEval 2014. In: Proceedings of the MediaEval 2014 Workshop (2014) Anguera, X., Rodriguez-Fuentes, L., Szöke, I., Buzo, A., Metze, F.: Query by example search on speech at MediaEval 2014. In: Proceedings of the MediaEval 2014 Workshop (2014)
4.
Zurück zum Zitat Buzo, A., Cucu, H., Molnar, I., Ionescu, B., Burileanu, C.: SpeeD @ MediaEval 2013: a phone recognition approach to spoken term detection. In: Proceedings of the MediaEval 2013 Workshop (2013) Buzo, A., Cucu, H., Molnar, I., Ionescu, B., Burileanu, C.: SpeeD @ MediaEval 2013: a phone recognition approach to spoken term detection. In: Proceedings of the MediaEval 2013 Workshop (2013)
5.
Zurück zum Zitat Can, D., Saraclar, M.: Lattice indexing for spoken term detection. IEEE Trans. Audio Speech Lang. Process. 19(8), 2338–2347 (2011)CrossRef Can, D., Saraclar, M.: Lattice indexing for spoken term detection. IEEE Trans. Audio Speech Lang. Process. 19(8), 2338–2347 (2011)CrossRef
6.
Zurück zum Zitat Chelba, C., Hazen, T.J., Saraclar, M.: Retrieval and browsing of spoken content. IEEE Sig. Process. Mag. 25(3), 39–49 (2008)CrossRef Chelba, C., Hazen, T.J., Saraclar, M.: Retrieval and browsing of spoken content. IEEE Sig. Process. Mag. 25(3), 39–49 (2008)CrossRef
7.
Zurück zum Zitat Gales, M.: Maximum likelihood linear transformations for hmm-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef Gales, M.: Maximum likelihood linear transformations for hmm-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)CrossRef
8.
Zurück zum Zitat Garofolo, J., Auzanne, G., Voorhees, E.: The TREC spoken document retrieval task: a success story. In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) (2014) Garofolo, J., Auzanne, G., Voorhees, E.: The TREC spoken document retrieval task: a success story. In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU) (2014)
9.
Zurück zum Zitat Hazen, T., Shen, W., White, C.: Query-by-example spoken term detection using phonetic posteriorgram templates. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU, pp. 421–426 (2009) Hazen, T., Shen, W., White, C.: Query-by-example spoken term detection using phonetic posteriorgram templates. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU, pp. 421–426 (2009)
10.
Zurück zum Zitat Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: Phonetic unit selection for cross-lingual query-by-example spoken term detection. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 223–229 (2015) Lopez-Otero, P., Docio-Fernandez, L., Garcia-Mateo, C.: Phonetic unit selection for cross-lingual query-by-example spoken term detection. In: IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 223–229 (2015)
11.
Zurück zum Zitat Mantena, G., Achanta, S., Prahallad, K.: Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5), 944–953 (2014)CrossRef Mantena, G., Achanta, S., Prahallad, K.: Query-by-example spoken term detection using frequency domain linear prediction and non-segmental dynamic time warping. IEEE/ACM Trans. Audio Speech Lang. Process. 22(5), 944–953 (2014)CrossRef
12.
Zurück zum Zitat Martinez, M., Lopez-Otero, P., Varela, R., Cardenal-Lopez, A., Docio-Fernandez, L., Garcia-Mateo, C.: GTM-UVigo systems for Albayzin 2014 search on speech evaluation. In: Iberspeech 2014: VIII Jornadas en Tecnologa del Habla and IV SLTech Workshop (2014) Martinez, M., Lopez-Otero, P., Varela, R., Cardenal-Lopez, A., Docio-Fernandez, L., Garcia-Mateo, C.: GTM-UVigo systems for Albayzin 2014 search on speech evaluation. In: Iberspeech 2014: VIII Jornadas en Tecnologa del Habla and IV SLTech Workshop (2014)
13.
Zurück zum Zitat Metze, F., Barnard, E., Davel, M., Heerden, C.V., Anguera, X., Gravier, G., Rajput, N.: The spoken web search task. In: Proceedings of the MediaEval 2012 Workshop (2012) Metze, F., Barnard, E., Davel, M., Heerden, C.V., Anguera, X., Gravier, G., Rajput, N.: The spoken web search task. In: Proceedings of the MediaEval 2012 Workshop (2012)
14.
Zurück zum Zitat Metze, F., Rajput, N., Anguera, X., Davel, M., Gravier, G., Heerden, C.V., Mantena, G., Muscariello, A., Pradhallad, K., Szöke, I., Tejedor, J.: The spoken web search task at MediaEval 2011. In: Proceedings of ICASSP (2012) Metze, F., Rajput, N., Anguera, X., Davel, M., Gravier, G., Heerden, C.V., Mantena, G., Muscariello, A., Pradhallad, K., Szöke, I., Tejedor, J.: The spoken web search task at MediaEval 2011. In: Proceedings of ICASSP (2012)
15.
Zurück zum Zitat Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proceedings of Eurospeech (1993) Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J., Nadeu, C.: Albayzin speech database: design of the phonetic corpus. In: Proceedings of Eurospeech (1993)
16.
Zurück zum Zitat Müller, M.: Information Retrieval for Music and Motion. Springer, Heidelberg (2007)CrossRef Müller, M.: Information Retrieval for Music and Motion. Springer, Heidelberg (2007)CrossRef
17.
Zurück zum Zitat Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of ICASSP, pp. 4057–4060 (2008) Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of ICASSP, pp. 4057–4060 (2008)
19.
Zurück zum Zitat Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011) Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)
20.
Zurück zum Zitat Rodriguez-Fuentes, L., Varona, A., Penagarikano, M.: GTTS-EHU systems for QUESST at MediaEval 2014. In: Proceedings of the MediaEval 2014 Workshop (2014) Rodriguez-Fuentes, L., Varona, A., Penagarikano, M.: GTTS-EHU systems for QUESST at MediaEval 2014. In: Proceedings of the MediaEval 2014 Workshop (2014)
21.
Zurück zum Zitat Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics Speech Sig. Process. 26(1), 43–49 (1978)CrossRefMATH Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics Speech Sig. Process. 26(1), 43–49 (1978)CrossRefMATH
22.
Zurück zum Zitat Schwarz, P.: Phoneme recognition based on long temporal context. Ph.D. thesis, Brno University of Technology (2009) Schwarz, P.: Phoneme recognition based on long temporal context. Ph.D. thesis, Brno University of Technology (2009)
23.
Zurück zum Zitat Siohan, O., Bacchiani, M.: Fast vocabulary independent audio search using path based graph indexing. In: Proceedings of Interspeech/Eurospeech, pp. 53–56 (2005) Siohan, O., Bacchiani, M.: Fast vocabulary independent audio search using path based graph indexing. In: Proceedings of Interspeech/Eurospeech, pp. 53–56 (2005)
24.
Zurück zum Zitat Szöke, I., Burget, L., Grézl, F., C̆ernocký, J., Ondel, L.: Calibration and fusion of query-by-example systems - BUT SWS 2013. In: Proceedings of ICASSP, pp. 7899–7903 (2014) Szöke, I., Burget, L., Grézl, F., C̆ernocký, J., Ondel, L.: Calibration and fusion of query-by-example systems - BUT SWS 2013. In: Proceedings of ICASSP, pp. 7899–7903 (2014)
25.
Zurück zum Zitat Szöke, I., Rodriguez-Fuentes, L., Buzo, A., Anguera, X., Metze, F., Proenca, J., Lojka, M., Xiong, X.: Query by example search on speech at Mediaeval 2015. In: Proceedings of the MediaEval 2015 Workshop (2015) Szöke, I., Rodriguez-Fuentes, L., Buzo, A., Anguera, X., Metze, F., Proenca, J., Lojka, M., Xiong, X.: Query by example search on speech at Mediaeval 2015. In: Proceedings of the MediaEval 2015 Workshop (2015)
26.
Zurück zum Zitat Szöke, I., Skácel, M., Burget, L.: BUT QUESST2014 system description. In: Proceedings of the MediaEval 2014 Workshop (2014) Szöke, I., Skácel, M., Burget, L.: BUT QUESST2014 system description. In: Proceedings of the MediaEval 2014 Workshop (2014)
27.
Zurück zum Zitat Veselý, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of Interspeech, pp. 2345–2349, no. 8 (2013) Veselý, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of Interspeech, pp. 2345–2349, no. 8 (2013)
28.
Zurück zum Zitat Yang, P., Xu, H., Xiao, X., Xie, L., Leung, C.C., Chen, H., Yu, J., Lv, H., Wang, L., Leow, S., Ma, B., Chng, E., Li, H.: The NNI query-by-example system for MediaEval 2014. In: Proceedings of the MediaEval 2014 Workshop (2014) Yang, P., Xu, H., Xiao, X., Xie, L., Leung, C.C., Chen, H., Yu, J., Lv, H., Wang, L., Leow, S., Ma, B., Chng, E., Li, H.: The NNI query-by-example system for MediaEval 2014. In: Proceedings of the MediaEval 2014 Workshop (2014)
Metadaten
Titel
Better Phoneme Recognisers Lead to Better Phoneme Posteriorgrams for Search on Speech? An Experimental Analysis
verfasst von
Paula Lopez-Otero
Laura Docio-Fernandez
Carmen Garcia-Mateo
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-49169-1_13