Top

International Journal of Speech Technology

Published in:

25-01-2019

Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications

Authors: Shima Tabibian, Ahmad Akbari, Babak Nasersharif

Published in: International Journal of Speech Technology | Issue 1/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Spoken term detection (STD) refers to discovering all occurrences of a given term in a set of speech utterances. One of the well-known approaches for the STD system is the phone lattice search (PLS) that produces a phone-based lattice of speech utterances. Since the accuracy of a phone recognizer affects the accuracy of the STD system, the PLS approach utilizes the minimum edit distance (MED) measure to compensate the phone recognizer errors. While this measure increases the detection rate, it also raises the false alarm rate. In this paper, we consider the PLS approach as the baseline. Then, we use Viterbi scoring and Jaro-Winkler similarity measure in order to decrease the false alarm rate. Since the proposed approach uses more techniques than the baseline approach, the search speed may decrease. To overcome this problem, we use lattice pruning and indexing techniques such as depth first search algorithm to increase the search speed in online and offline applications, respectively. We report the experimental results for monophone-based and triphone-based STD system. The results indicate that using triphone-based STD system improved the performance about 2% in comparison with monophone-based STD system. Moreover, when we used triphone-based models, the proposed approach including MED measure, Viterbi scores and Jaro-Winkler similarity measure improved the accuracy of the method with only MED measure, about 17%.

previous article A statistical framework for EEG channel selection and seizure prediction on mobile

next article Development and analysis of Punjabi ASR system for mobile phones under different acoustic models

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Allauzen, C., Mohri, M., & Saraclar, M. (2004). General indexation of weighted automata: Application to spoken utterance retrieval. In: Proceedings of the workshop on interdisciplinary approaches to speech indexing and retrieval at HLT-NAACL 2004. Association for computational linguistics (pp. 33–40).

Audhkhasi, K., & Verma, A. (2007). Keyword search using modified minimum edit distance measure. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2007 (pp. IV-929–IV-932). IEEE.

BenZeghiba, M. F., Gauvain, J.-L., & Lamel, L. (2010). Improved n-gram phonotactic models for language recognition. In: Eleventh annual conference of the international speech communication association (pp. 2710–2713).

Bijankhan, M., Sheykhzadegan, J., Roohani, M. R., Zarrintare, R., Ghasemi, S. Z., & Ghasedi, M. E. (2003). Tfarsdat-the telephone Farsi speech database. In: Eighth european conference on speech communication and technology.

Burget, L., Černocký, J., Fapšo, M., Karafiát, M., Matějka, P., Schwarz, P., Smrž, P., & Szöke, I. (2006). Indexing and search methods for spoken documents. In: International conference on text, speech and dialogue (pp. 351–358). Berlin: Springer.

Can, D., Cooper, E., Sethy, A., White, C., Ramabhadran, B., & Saraclar, M. (2009). Effect of pronounciations on OOV queries in spoken term detection. In: IEEE international conference on acoustics, speech and signal processing ICASSP 2009 (pp. 3957–3960).

Can, D., & Saraclar, M. (2011). Lattice indexing for spoken term detection. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2338–2347.CrossRef

Cernocky, J., Szoke, I., Fapso, M., Karafiat, M., Burget, L., Kopecky, J., Grezl, F., Schwarz, P., Glembek, O., & Oparin, I. (2007). Search in speech for public security and defense. In: IEEE Workshop on signal processing applications for public security and forensics, SAFE’07 (pp. 1–7).

Chaudhari, U. V., & Picheny, M. (2007). Improvements in phone based audio search via constrained match with high order confusion estimates. In: IEEE Workshop on automatic speech recognition and understanding, ASRU 2007 (pp. 665–670).

Chelba, C., & Acero, A. (2005). Position specific posterior lattices for indexing speech. In: Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 443–450).

Goodarzi, M. M., Shekofteh, Y., Rezaei, I. S., & Kabudian, J. (2014). Discriminative confidence measure using linear combination of duration-based features and acoustic-based scores in keyword spotting. In: IEEE 7th international symposium on telecommunications (IST) (pp 316–319).

Gracia, C., Anguera, X., Luque, J., & Artzi, I. (2014). Phoneme-lattice to phoneme-sequence matching algorithm based on dynamic programming. In: Advances in speech and language technologies for iberian languages (pp. 99–108). Cham: Springer.

Li, W., Wu, J., & Wang, Z. A. (2008). Trellis based fast lattice generating algorithm. In: IEEE 6th international symposium on chinese spoken language processing, ISCSLP’08 (pp. 1–4).

Mamou, J., Ramabhadran, B., & Siohan, O. (2007). Vocabulary independent spoken term detection. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, 2007 (pp. 615–622). New York: ACM

Mangu, L., Soltau, H., Kuo, H.-K., Kingsbury, B., & Saon, G. (2013). Exploiting diversity for spoken term detection. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), 2013 (pp. 8282–8286).

Mansikkaniemi, A. (2010). Acoustic model and language model adaptation for a mobile dictation service. Aalto University: Master of Science.

Masoud, A. (2017). Keyword spotting in persian speech using a hybrid model of DNN and HMM. Msc, Amirkabir University of Technology.

Meng, S., Yu, P., Seide, F., & Liu, J. (2007). A study of lattice-based spoken term detection for Chinese spontaneous speech. In: IEEE workshop on automatic speech recognition and understanding, ASRU (pp. 635–640).

Mertens, T., & Schneider, D. (2009). Efficient subword lattice retrieval for German spoken term detection. In: IEEE international conference on, acoustics, speech and signal processing, ICASSP 2009 (pp. 4885–4888).

Rajabzadeh, M., Tabibian, S., Akbari, A., & Nasersharif, B. (2012a). An improved phone lattice search method for triphone based keyword spotting in online persian telephony speech. In: International conference on contemporary (CICIS) (pp. 294–299).

Rajabzadeh, M., Tabibian, S., Akbari, A., & Nasersharif, B. (2012b). Improved dynamic match phone lattice search using viterbi scores and Jaro Winkler distance for keyword spotting system. In: IEEE 16th CSI International Symposium on, Artificial Intelligence and Signal Processing (AISP) (pp. 423–427).

Sak, H., Saraclar, M., & Güngör, T. (2010). On-the-fly lattice rescoring for real-time automatic speech recognition. In: Eleventh annual conference of the international speech communication association.

Saraclar, M., & Sproat, R. (2004). Lattice-based search for spoken utterance retrieval. In: Proceedings of the human language technology conference of the north american chapter of the association for computational linguistics: HLT-NAACL 2004, (pp. 129–136).

Shao, J., Zhao, Q., Zhang, P., Liu, Z., & Yan, Y. (2008). Fast fuzzy keyword spotting using syllable confusion network indexing. Chinese Journal of Electronics, 17(2), 265–270.

Shekofteh, Y., Kabudian, J., Goodarzi, M. M., & Rezaei, I. S. (2012). Confidence measure improvement using useful predictor features and support vector machines. In: IEEE 20th Iranian conference on electrical engineering (ICEE) (pp. 1168–1171).

Shokri, A., Tabibian, S., Akbari, A., Nasersharif, B., & Kabudian, J. (2011). A robust keyword spotting system for Persian conversational telephone speech using feature and score normalization and ARMA filter. In: GCC conference and exhibition (GCC) (pp. 497–500).

Siohan, O., Bacchiani, M. (2005). Fast vocabulary-independent audio search using path-based graph indexing. In: Ninth European conference on speech communication and technology (pp. 53–56).

Szoke, I., Schwarz, P., Matejka, P., Burget, L., Karafiát, M., Fapso, M., & Cernocky, J. (2005). Comparison of keyword spotting approaches for informal continuous speech. In: Ninth European conference on speech communication and technology (pp. 633–636).

Tabibian, S., Akbari, A., & Nasersharif, B. (2018). Discriminative keyword spotting using triphones information and N-best search. Information Sciences, 423, 157–171.CrossRef

Thambiratnam, K., & Sridharan, S. (2005). Dynamic match phone-lattice searches for very fast and accurate unrestricted vocabulary keyword spotting. In: IEEE international conference on acoustics, speech, and signal processing, proceedings (ICASSP’05) (Vol. 461, pp. I/465–I/468).

Thambiratnam, K., & Sridharan, S. (2007). Rapid yet accurate speech indexing using dynamic match lattice spotting. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 346–357.CrossRef

Trinh, K., Nguyen, H., Duong, D., & Vu, Q. (2008). An empirical study of multipass decoding for vietnamese LVCSR. In: Spoken languages technologies for under-resourced languages (pp. 12–17).

Vazirnezhad, B., Almasganj, F., & Ahadi, S. M. (2009). Hybrid statistical pronunciation models designed to be trained by a medium-size corpus. Computer Speech & Language, 23(1), 1–24.CrossRef

Wallace, R., Baker, B., Vogt, R., & Sridharan, S. (2009a). The effect of language models on phonetic decoding for spoken term detection. In: Proceedings of the third workshop on Searching spontaneous conversational speech (pp. 31–36). New York: ACM.

Wallace, R., Vogt, R., & Sridharan, S. (2009b). Spoken term detection using fast phonetic decoding. In: IEEE international conference on acoustics, speech and signal processing, ICASSP 2009 (pp. 4881–4884).

Wang, X., Xie, L., Ma, B., Chng, E. S., & Li, H. (2010). Phoneme lattice based TextTiling towards multilingual story segmentation. In: Eleventh annual conference of the international speech communication association (pp. 1305–1308).

Winkler, W. E. (1990). String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage (pp. 354–359).

Winkler, W. E. (2006). Overview of record linkage and current research directions. Citeseer.

Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., & Povey, D. (2002). The HTK book (Vol. 3, p. 175). Cambridge: Cambridge University Engineering Department.

Young, S. J., Woodland, P., & Byrne, W. (1993). HTK: hidden Markov Model Toolkit V1. 5.

Zhang, S., Shuang, Z., Shi, Q., & Qin, Y. (2010). Improved mandarin keyword spotting using confusion garbage model. In: IEEE 20th International conference on pattern recognition (ICPR) (pp. 3700–3703).

Zhou, Z.-Y., Yu, P., Chelba, C., & Seide, F. (2006). Towards spoken-document retrieval for the internet: Lattice indexing for large-scale web-search architectures. In: Proceedings of the main conference on human language technology conference of the North American chapter of the association of computational linguistics (pp. 415–422).

Title: Improved dynamic match phone lattice search for Persian spoken term detection system in online and offline applications
Authors: Shima Tabibian
Ahmad Akbari
Babak Nasersharif
Publication date: 25-01-2019
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 1/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-019-09594-w

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2019

Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

Long short-term memory recurrent neural network architectures for Urdu acoustic modeling

A comparative study of deep neural network based Punjabi-ASR system

Hidden-Markov-model based statistical parametric speech synthesis for Marathi with optimal number of hidden states

Noise reduction in speech signals using adaptive independent component analysis (ICA) for hands free communication devices

New optimal variable step size-adaptive regularized-affine projection algorithm