Skip to main content
Top
Published in: International Journal of Speech Technology 1/2019

05-02-2019

Segment-level probabilistic sequence kernel and segment-level pyramid match kernel based extreme learning machine for classification of varying length patterns of speech

Authors: Shikha Gupta, Ahmed Karanath, Kansul Mahrifa, A. D. Dileep, Veena Thenkanidiyoor

Published in: International Journal of Speech Technology | Issue 1/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this work, we address some issues in the classification of varying length patterns of speech represented as sets of continuous-valued feature vectors using kernel methods. Kernels designed for varying length patterns are called as dynamic kernels. We propose two dynamic kernels namely segment-level pyramid match kernel (SLPMK) and segment-level probabilistic sequence kernel (SLPSK) for classification of long duration speech, represented as varying length sets of feature vectors using extreme learning machine (ELM). SLPMK and SLPSK are designed by partitioning the speech signal into increasingly finer segments and matching the corresponding segments. SLPSK is built upon a set of Gaussian basis functions, where half of the basis functions contain class-specific information while the other half implicates the common characteristics of all the speech utterances of all classes. The computational complexity of SVM training algorithms is usually intensive, which is at least quadratic with respect to the number of training examples. It is difficult to deal with the immense amount of data using traditional SVMs. For reducing the training time of classifier we propose to use a simple algorithm namely ELM. ELM refers to a wider type of generalized single hidden layer feedforward networks (SLFNs) whose hidden layer need not be tuned. In our work, we proposed to explore kernel based ELM to exploit dynamic kernels. We study the performance of the ELM-based classifiers using the proposed SLPSK and SLPMK for speech emotion recognition and speaker identification tasks and compare with other kernels for varying length patterns. Experimental studies showed that proposed ELM-based approach offer a 10–12% of relative improvement over baseline approach, and a 3–9% relative improvement over ELMs/SVMs using other state-of-the-art dynamic kernels.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Alexandos, I., Tefas, A., & Pitas, ioannis. (2015). On the kernel extreme learning machine classifiers. Pattern Recognition Letters, 54, 11–17.CrossRef Alexandos, I., Tefas, A., & Pitas, ioannis. (2015). On the kernel extreme learning machine classifiers. Pattern Recognition Letters, 54, 11–17.CrossRef
go back to reference Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1(Dec), 113–141.MathSciNetMATH Allwein, E. L., Schapire, R. E., & Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research, 1(Dec), 113–141.MathSciNetMATH
go back to reference Boughorbel, S., Tarel, J. P., & Boujemaa, N. (2005). The intermediate matching kernel for image local features. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2005) (pp. 889–894), Montreal Boughorbel, S., Tarel, J. P., & Boujemaa, N. (2005). The intermediate matching kernel for image local features. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2005) (pp. 889–894), Montreal
go back to reference Burkhardt, F., Paeschke, A., Rolfes, M., & Weiss, W. S. B. (2005). A database of German emotional speech. In Proceedings of INTERSPEECH (pp. 1517–1520), Lisbon. Burkhardt, F., Paeschke, A., Rolfes, M., & Weiss, W. S. B. (2005). A database of German emotional speech. In Proceedings of INTERSPEECH (pp. 1517–1520), Lisbon.
go back to reference Campbell, W. M., & Sturim, D. D. E. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.CrossRef Campbell, W. M., & Sturim, D. D. E. (2006). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.CrossRef
go back to reference Chen, Yh., Lopez-Moreno, I., Sainath, T., Visontai, M., Alvarez, R., & Parada, C. (2015). Locally connected and convolutional neural networks for small footprint speaker recognition. In Proceedings of INTERSPEECH (pp. 1136–1140), Dresden. Chen, Yh., Lopez-Moreno, I., Sainath, T., Visontai, M., Alvarez, R., & Parada, C. (2015). Locally connected and convolutional neural networks for small footprint speaker recognition. In Proceedings of INTERSPEECH (pp. 1136–1140), Dresden.
go back to reference Chorowski, J., Wang, J., & Zurada, J. M. (2014). Review and performance comparison of svm-and elm-based classifiers. Neurocomputing, 128, 507–516.CrossRef Chorowski, J., Wang, J., & Zurada, J. M. (2014). Review and performance comparison of svm-and elm-based classifiers. Neurocomputing, 128, 507–516.CrossRef
go back to reference Dileep, A. D., & Chandra Sekhar, C. (2012). Speaker recognition using pyramid match kernel based support vector machines. Internatiional Journal for Speech Technology, 15(3), 365–379.CrossRef Dileep, A. D., & Chandra Sekhar, C. (2012). Speaker recognition using pyramid match kernel based support vector machines. Internatiional Journal for Speech Technology, 15(3), 365–379.CrossRef
go back to reference Dileep, A. D., & Chandra Sekhar, C. (2014). GMM-based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines. IEEE Transactions on Neural Networks and Learning Systems, 25(8), 1421–1432.CrossRef Dileep, A. D., & Chandra Sekhar, C. (2014). GMM-based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines. IEEE Transactions on Neural Networks and Learning Systems, 25(8), 1421–1432.CrossRef
go back to reference Gemert, Veenman C. J., Smeulders, A. W. M., & Geusebroek, J. M. (2010). Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(17), 1271–1283.CrossRef Gemert, Veenman C. J., Smeulders, A. W. M., & Geusebroek, J. M. (2010). Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(17), 1271–1283.CrossRef
go back to reference Gordon, G., & Tibshirani, R. (2012). Karush-kuhn-tucker conditions. Optimization, 10(725/36), 725. Gordon, G., & Tibshirani, R. (2012). Karush-kuhn-tucker conditions. Optimization, 10(725/36), 725.
go back to reference Grauman, K., & Darrell, T. (2007). The pyramid match kernel: Efficient learning with sets of features. The Journal of Machine Learning Research, 8, 725–760.MATH Grauman, K., & Darrell, T. (2007). The pyramid match kernel: Efficient learning with sets of features. The Journal of Machine Learning Research, 8, 725–760.MATH
go back to reference Gupta, S., Dileep, A. D., & Thenkanidiyoor, V. (2016a). Segment-level pyramid match kernels for the classification of varying length patterns of speech using svms. In Signal Processing Conference (EUSIPCO), 2016 24th European, IEEE (pp. 2030–2034). Gupta, S., Dileep, A. D., & Thenkanidiyoor, V. (2016a). Segment-level pyramid match kernels for the classification of varying length patterns of speech using svms. In Signal Processing Conference (EUSIPCO), 2016 24th European, IEEE (pp. 2030–2034).
go back to reference Gupta, S., Thenkanidiyoor, V., & Dileep, A. D. (2016b). Segment-level probabilistic sequence kernel based support vector machines for classification of varying length patterns of speech. In International Conference on Neural Information Processing (pp. 321–328). New York: Springer. Gupta, S., Thenkanidiyoor, V., & Dileep, A. D. (2016b). Segment-level probabilistic sequence kernel based support vector machines for classification of varying length patterns of speech. In International Conference on Neural Information Processing (pp. 321–328). New York: Springer.
go back to reference Huang, G. B., Chen, L., & Siew, C. K. (2006). Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks, 17(4), 879–892.CrossRef Huang, G. B., Chen, L., & Siew, C. K. (2006). Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Transactions on Neural Networks, 17(4), 879–892.CrossRef
go back to reference Huang, G. B., Zhou, H., Ding, X., et al. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, B (Cybernetics), 42(2), 513–529.CrossRef Huang, G. B., Zhou, H., Ding, X., et al. (2012). Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, B (Cybernetics), 42(2), 513–529.CrossRef
go back to reference Lee, K. A., HTK You, C. H. (2007). A GMM-based probabilistic sequence kernel for speaker verification. In Proceedings of INTERSPEECH, (pp. 294–297), Antwerp. Lee, K. A., HTK You, C. H. (2007). A GMM-based probabilistic sequence kernel for speaker verification. In Proceedings of INTERSPEECH, (pp. 294–297), Antwerp.
go back to reference Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), (vol. 2, pp. 2169–2178), New York. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), (vol. 2, pp. 2169–2178), New York.
go back to reference Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16(8), 2203–2213.CrossRef Mao, Q., Dong, M., Huang, Z., & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia, 16(8), 2203–2213.CrossRef
go back to reference Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistics in Medicine, 17(8), 857–872.CrossRef Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistics in Medicine, 17(8), 857–872.CrossRef
go back to reference Rabiner, L., & Juang, B. H. (2003). Fundamentals of Speech Recognition. Pearson Education. Rabiner, L., & Juang, B. H. (2003). Fundamentals of Speech Recognition. Pearson Education.
go back to reference Rao, C. R., & Mitra, S. K. (1971). Generalized inverse of matrices and its applications (Vol. 7). New York: Wiley.MATH Rao, C. R., & Mitra, S. K. (1971). Generalized inverse of matrices and its applications (Vol. 7). New York: Wiley.MATH
go back to reference Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.CrossRef Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.CrossRef
go back to reference Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.CrossRef Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.CrossRef
go back to reference Sachdev, A., Dileep, A. D., & Thenkanidiyoor, V. (2015). Example-specific density based matching kernel for classification of varying length patterns of speech using support vector machines. In Proceedings of ICONIP, (pp. 177–184). Istanbul. Sachdev, A., Dileep, A. D., & Thenkanidiyoor, V. (2015). Example-specific density based matching kernel for classification of varying length patterns of speech using support vector machines. In Proceedings of ICONIP, (pp. 177–184). Istanbul.
go back to reference Smith, N., Gales, M., & Niranjan, M. (2001). Data-dependent kernels in SVM classification of speech patterns. Tech. Rep. CUED/F-INFENG/TR.387, Cambridge University Engineering Department, Cambridge. Smith, N., Gales, M., & Niranjan, M. (2001). Data-dependent kernels in SVM classification of speech patterns. Tech. Rep. CUED/F-INFENG/TR.387, Cambridge University Engineering Department, Cambridge.
go back to reference Steidl, S. (2009). Automatic classification of emotion-related user states in spontaneous childern’s speech. PhD thesis, Der Technischen Fakultät der Universität Erlangen-Nürnberg, Germany. Steidl, S. (2009). Automatic classification of emotion-related user states in spontaneous childern’s speech. PhD thesis, Der Technischen Fakultät der Universität Erlangen-Nürnberg, Germany.
go back to reference Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1), 11–32.CrossRef Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7(1), 11–32.CrossRef
go back to reference Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 3539–3546). Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 3539–3546).
go back to reference Wang J., KYFLTH Yang, J., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of CVPR’10, IEEE (pp. 3360–3367). State College: The Pennsylvania State University. Wang J., KYFLTH Yang, J., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In Proceedings of CVPR’10, IEEE (pp. 3360–3367). State College: The Pennsylvania State University.
go back to reference Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of CVPR’09, IEEE, (pp. 1794–1801). Yang, J., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In Proceedings of CVPR’09, IEEE, (pp. 1794–1801).
go back to reference You, C. H., Lee, K. A., & Li, H. (2009). An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Processing Letters, 16(1), 49–52.CrossRef You, C. H., Lee, K. A., & Li, H. (2009). An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Processing Letters, 16(1), 49–52.CrossRef
go back to reference Zhang, L., Zhang, D., & Tian, F. (2016). Svm and elm: Who wins? object recognition with deep convolutional features from imagenet. In Proceedings of ELM-2015 (Vol. 1, pp. 249–263). Springer: New York. Zhang, L., Zhang, D., & Tian, F. (2016). Svm and elm: Who wins? object recognition with deep convolutional features from imagenet. In Proceedings of ELM-2015 (Vol. 1, pp. 249–263). Springer: New York.
Metadata
Title
Segment-level probabilistic sequence kernel and segment-level pyramid match kernel based extreme learning machine for classification of varying length patterns of speech
Authors
Shikha Gupta
Ahmed Karanath
Kansul Mahrifa
A. D. Dileep
Veena Thenkanidiyoor
Publication date
05-02-2019
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2019
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-09587-1

Other articles of this Issue 1/2019

International Journal of Speech Technology 1/2019 Go to the issue