Skip to main content
Erschienen in: International Journal of Speech Technology 2/2017

04.05.2017

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

verfasst von: Sourjya Sarkar, K. Sreenivasa Rao

Erschienen in: International Journal of Speech Technology | Ausgabe 2/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper explores the robustness of supervector-based speaker modeling approaches for speaker verification (SV) in noisy environments. In this paper speaker modeling is carried out in two different frameworks: (i) Gaussian mixture model-support vector machine (GMM-SVM) combined method and (ii) total variability modeling method. In the GMM-SVM combined method, supervectors obtained by concatenating the mean of an adapted speaker GMMs are used to train speaker-specific SVMs during the training/enrollment phase of SV. During the evaluation/testing phase, noisy test utterances transformed into supervectors are subjected to SVM-based pattern matching and classification. In the total variability modeling method, large size supervectors are reduced to a low dimensional channel robust vector (i-vector) prior to SVM training and subsequent evaluation. Special emphasis has been laid on the significance of a utterance partitioning technique for mitigating data-imbalance and utterance duration mismatches. An adaptive boosting algorithm is proposed in the total variability modeling framework for enhancing the accuracy of SVM classifiers. Experiments performed on the NIST-SRE-2003 database with training and test utterances corrupted with additive noises indicate that the aforementioned modeling methods outperform the standard GMM-universal background model (GMM-UBM) framework for SV. It is observed that the use of utterance partitioning and adaptive boosting in the speaker modeling frameworks result in substantial performance improvements under degraded conditions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In Proceedings of the 15th European conference on machine learning, Pisa, Italy (vol. 3201, pp. 39–50). Akbani, R., Kwek, S., & Japkowicz, N. (2004). Applying support vector machines to imbalanced datasets. In Proceedings of the 15th European conference on machine learning, Pisa, Italy (vol. 3201, pp. 39–50).
Zurück zum Zitat Bishop, C. M. (2006). Pattern recognition and machine learning. New Delhi: Springer.MATH Bishop, C. M. (2006). Pattern recognition and machine learning. New Delhi: Springer.MATH
Zurück zum Zitat Campbell, W., Campbell, J., & Reynolds, D. (2006a). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.CrossRef Campbell, W., Campbell, J., & Reynolds, D. (2006a). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.CrossRef
Zurück zum Zitat Campbell, W., Campbell, J., Reynolds, D., Singer, E., & Carrasquillo, P. (2006b). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2–3), 210–229.CrossRef Campbell, W., Campbell, J., Reynolds, D., Singer, E., & Carrasquillo, P. (2006b). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2–3), 210–229.CrossRef
Zurück zum Zitat Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. P. (2000). SMOTE: Synthetic minority over-sampling technique. In International conference on knowledge based computer systems, Mumbai, India. Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. P. (2000). SMOTE: Synthetic minority over-sampling technique. In International conference on knowledge based computer systems, Mumbai, India.
Zurück zum Zitat Chawla, N., Lazarevic, A., Hall, L., & Bowyer, K. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In 7th European conference on principles and practice of knowledge discovery in databases, Cavtat-Dubrovnik, Croatia. Chawla, N., Lazarevic, A., Hall, L., & Bowyer, K. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In 7th European conference on principles and practice of knowledge discovery in databases, Cavtat-Dubrovnik, Croatia.
Zurück zum Zitat Chen, L. F., Liao, H. Y. M., Ko, M. T., Lin, J. C., & Yu, G. J. (2000). A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognition, 33(10), 1713–1726.CrossRef Chen, L. F., Liao, H. Y. M., Ko, M. T., Lin, J. C., & Yu, G. J. (2000). A new LDA-based face recognition system which can solve the small sample size problem. Pattern Recognition, 33(10), 1713–1726.CrossRef
Zurück zum Zitat Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing, 28(4), 357–366.CrossRef Davis, S., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics Speech and Signal Processing, 28(4), 357–366.CrossRef
Zurück zum Zitat Dehak, N., Dehak, R., Kenny, P., Brummer, N., Ouellet, P., & Dumouchel, P. (2009a). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In Proceeding of the 10th annual conference of the international speech communication association (INTERSPEECH ’09), Brighton. Dehak, N., Dehak, R., Kenny, P., Brummer, N., Ouellet, P., & Dumouchel, P. (2009a). Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In Proceeding of the 10th annual conference of the international speech communication association (INTERSPEECH ’09), Brighton.
Zurück zum Zitat Dehak, N., Kenny, P., Dehak, R., Glembek, O., Dumouchel, P., Burget, L., Hubeika, V., & Castaldo, F. (2009b). Support vector machines and joint factor analysis for speaker verification. In Proceedings of IEEE International conference on acoustics, speech and signal processing (ICASSP ’09), Brighton (pp. 4237–4240). Dehak, N., Kenny, P., Dehak, R., Glembek, O., Dumouchel, P., Burget, L., Hubeika, V., & Castaldo, F. (2009b). Support vector machines and joint factor analysis for speaker verification. In Proceedings of IEEE International conference on acoustics, speech and signal processing (ICASSP ’09), Brighton (pp. 4237–4240).
Zurück zum Zitat Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.CrossRef Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.CrossRef
Zurück zum Zitat Doddington, G. (2001). Speaker recognition based on idiolectal differences between speakers. In Proceedings of the European conference of speech communication technology (EUROSPEECH ’01), Aalborg, Denmark (pp. 2521–2524). Doddington, G. (2001). Speaker recognition based on idiolectal differences between speakers. In Proceedings of the European conference of speech communication technology (EUROSPEECH ’01), Aalborg, Denmark (pp. 2521–2524).
Zurück zum Zitat Du, S., Xiao, X., & Chng, E. S. (2015). Dnn feature compensation for noise robust speaker verification. In IEEE China summit and international conference on signal and information processing (ChinaSIP), Chengdu, China (pp. 871–875). Du, S., Xiao, X., & Chng, E. S. (2015). Dnn feature compensation for noise robust speaker verification. In IEEE China summit and international conference on signal and information processing (ChinaSIP), Chengdu, China (pp. 871–875).
Zurück zum Zitat Fauve, B., Evans, N., & Mason, J. (2008). Improving the performance of text-independent short duration SVM- and GMM-based speaker verification. In Workshop on speaker and language recognition (Odyssey), Stellenbosch, South Africa. Fauve, B., Evans, N., & Mason, J. (2008). Improving the performance of text-independent short duration SVM- and GMM-based speaker verification. In Workshop on speaker and language recognition (Odyssey), Stellenbosch, South Africa.
Zurück zum Zitat Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of thirteenth international conference on machine learning (ICML ’96), Bari, Italy. Freund, Y., &  Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of thirteenth international conference on machine learning (ICML ’96), Bari, Italy.
Zurück zum Zitat Gales, M. J. F., & Young, S. J. (1995). Robust speech recognition in additive and convolutional noise using parallel model combination. Computer Speech & Language, 9, 289–307.CrossRef Gales, M. J. F., & Young, S. J. (1995). Robust speech recognition in additive and convolutional noise using parallel model combination. Computer Speech & Language, 9, 289–307.CrossRef
Zurück zum Zitat Gauvain, J., & Lee, C. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2(2), 291–298.CrossRef Gauvain, J., & Lee, C. (1994). Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing, 2(2), 291–298.CrossRef
Zurück zum Zitat Guo, H., & Viktor, H. L. (2004). Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach. ACM SIGKDD Explorations Newsletter: Special issue on Learning from Imbalanced Datasets, 6, 30–39.CrossRef Guo, H., & Viktor, H. L. (2004). Learning from imbalanced data sets with boosting and data generation: The DataBoost-IM approach. ACM SIGKDD Explorations Newsletter: Special issue on Learning from Imbalanced Datasets, 6, 30–39.CrossRef
Zurück zum Zitat Hatch, A. O., Kajarekar, S., & Stolcke, A. (2005). Within-class covariance normalization for SVM-based speaker recognition. In Proceedings of the international conference of spoken language processing (ICSLP ’05), Juju Island, Korea. Hatch, A. O., Kajarekar, S., &  Stolcke, A. (2005). Within-class covariance normalization for SVM-based speaker recognition. In Proceedings of the international conference of spoken language processing (ICSLP ’05), Juju Island, Korea.
Zurück zum Zitat Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S., & Mason, M. (2011). i-Vector based speaker recognition on short utterances. In Processdings of the 12th annual conference of the international speech communication association (INTERSPEECH ’11), Florence, Italy. Kanagasundaram, A., Vogt, R., Dean, D., Sridharan, S., & Mason, M. (2011). i-Vector based speaker recognition on short utterances. In Processdings of the 12th annual conference of the international speech communication association (INTERSPEECH ’11), Florence, Italy.
Zurück zum Zitat Kang, P., & Cho, S. (2006). EUS SVMs: Ensemble of under-sampled SVMs for data imbalance problems. In ICONIP (1) (pp. 837–846), Hong Kong, China. Kang, P., & Cho, S. (2006). EUS SVMs: Ensemble of under-sampled SVMs for data imbalance problems. In ICONIP (1) (pp. 837–846), Hong Kong, China.
Zurück zum Zitat Kenny, P., Boulianne, G., & Dumouchel, P. (2005a). Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 13(3), 345–354.CrossRef Kenny, P., Boulianne, G., & Dumouchel, P. (2005a). Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing, 13(3), 345–354.CrossRef
Zurück zum Zitat Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2005b). Factor analysis simplified. In Proceedings of the IEEE international conference on acoustics speech and signal processing (ICASSP ’05), Philadelphia, PA (vol. 1, pp. 637–640). Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2005b). Factor analysis simplified. In Proceedings of the IEEE international conference on acoustics speech and signal processing (ICASSP ’05), Philadelphia, PA (vol. 1, pp. 637–640).
Zurück zum Zitat Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007a). Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 15(4), 1435–1447.CrossRef Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007a). Joint factor analysis versus eigenchannels in speaker recognition. IEEE Transactions on Audio, Speech and Language Processing, 15(4), 1435–1447.CrossRef
Zurück zum Zitat Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007b). Speaker and session variability in GMM-based speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1448–1460.CrossRef Kenny, P., Boulianne, G., Ouellet, P., & Dumouchel, P. (2007b). Speaker and session variability in GMM-based speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 15(4), 1448–1460.CrossRef
Zurück zum Zitat Kenny, P., Stafylakis, T., Ouellet, P., Alam, M. J., & Dumouchel, P. (2013). for speaker verification with utterances of arbitrary duration. In Proceedings of 38th international conference on acoustics, speech, and signal processing (ICASSP ’13), Vancouver, Canada. Kenny, P., Stafylakis, T., Ouellet, P., Alam, M. J., & Dumouchel, P. (2013). for speaker verification with utterances of arbitrary duration. In Proceedings of 38th international conference on acoustics, speech, and signal processing (ICASSP ’13), Vancouver, Canada.
Zurück zum Zitat Kheder, W. B., Matrouf, D., Ajili, M., & Bonastre, J. -F. (2014). Probabilistic approach using joint clean and noisy i-vectors modeling for speaker recognition. In Interspeech-2016, Sydney, Australia. Kheder, W. B., Matrouf, D., Ajili, M., & Bonastre, J. -F. (2014). Probabilistic approach using joint clean and noisy i-vectors modeling for speaker recognition. In Interspeech-2016, Sydney, Australia.
Zurück zum Zitat Lei, Y., Burget, L., Ferrer, L., Graciarena, M. & Scheffer, N. (2012). Towards noise-robust speaker recognition using probabilistic linear discriminant analysis. In ICASSP-2012 (pp. 4253–4256), Kyoto, Japan. Lei, Y., Burget, L., Ferrer, L., Graciarena, M. & Scheffer, N. (2012). Towards noise-robust speaker recognition using probabilistic linear discriminant analysis. In ICASSP-2012 (pp. 4253–4256), Kyoto, Japan.
Zurück zum Zitat Lin, Z., Hao, Z., Yang, X., & Liu, X. (2009). Several SVM ensemble methods integrated with under-sampling for imbalanced data learning. In Advanced data mining and applications (pp. 536–544), Beijing, China. Lin, Z., Hao, Z., Yang, X., & Liu, X. (2009). Several SVM ensemble methods integrated with under-sampling for imbalanced data learning. In Advanced data mining and applications (pp. 536–544), Beijing, China.
Zurück zum Zitat Mak, M. W., & Rao, W. (2011). Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification. Speech Communication, 53(1), 119–130.CrossRef Mak, M. W., & Rao, W. (2011). Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification. Speech Communication, 53(1), 119–130.CrossRef
Zurück zum Zitat Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in asssessment of detection task performance. In Proceedings of the European conference of speech communication technology (EUROSPEECH ’97) (pp. 1895–1898), Rhodes, Greece. Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in asssessment of detection task performance. In Proceedings of the European conference of speech communication technology (EUROSPEECH ’97) (pp. 1895–1898), Rhodes, Greece.
Zurück zum Zitat McLaren, M., Lei, Y., Scheffer, N., & Ferrer, L. (2014). Application of convolutional neural networks to speaker recognition in noisy conditions. In Interspeech-2014 (pp. 686–690), Singapore. McLaren, M., Lei, Y., Scheffer, N., & Ferrer, L. (2014). Application of convolutional neural networks to speaker recognition in noisy conditions. In Interspeech-2014 (pp. 686–690), Singapore.
Zurück zum Zitat Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1711–1723.CrossRef Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1711–1723.CrossRef
Zurück zum Zitat Moreno, P. (1996). Speech recognition in noisy environments, Ph.D. thesis, Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh. Moreno, P. (1996). Speech recognition in noisy environments, Ph.D. thesis, Electrical and Computer Engineering Department, Carnegie Mellon University, Pittsburgh.
Zurück zum Zitat Pelecanos, J., Chaudhari, U., & Ramaswamy, G.(2004). Compensation of utterance length for speaker verification. In ODYSSEY04: The speaker and language recognition workshop, Teledo, Spain. Pelecanos, J., Chaudhari, U., & Ramaswamy, G.(2004). Compensation of utterance length for speaker verification. In ODYSSEY04: The speaker and language recognition workshop, Teledo, Spain.
Zurück zum Zitat Prasanna, S. R. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2552–2565.CrossRef Prasanna, S. R. M., & Pradhan, G. (2011). Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2552–2565.CrossRef
Zurück zum Zitat Prasanna, S. R. M., Govind, D., Rao, K. S., & Yegnanarayana, B. (2010). Fast prosody modification using instants of significant excitation. In Speech Prosody, Chicago. Prasanna, S. R. M., Govind, D., Rao, K. S., & Yegnanarayana, B. (2010). Fast prosody modification using instants of significant excitation. In Speech Prosody, Chicago.
Zurück zum Zitat Rao, K., & Yegnanarayana, B. (2006). Voice conversion by prosody and vocal tract modification. In 9th International conference on information technology (pp. 111–116), Bhubaneswar, Orissa. Rao, K., & Yegnanarayana, B. (2006). Voice conversion by prosody and vocal tract modification. In 9th International conference on information technology (pp. 111–116), Bhubaneswar, Orissa.
Zurück zum Zitat Rao, K. S. (2005). Acquisition and incorporation prosody knowledge for speech systems in Indian languages, Ph.D. thesis, Indian Institute of Technology Madras. Rao, K. S. (2005). Acquisition and incorporation prosody knowledge for speech systems in Indian languages, Ph.D. thesis, Indian Institute of Technology Madras.
Zurück zum Zitat Rao, W., & Mak, M. W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1012–1022.CrossRef Rao, W., & Mak, M. W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1012–1022.CrossRef
Zurück zum Zitat Reynolds, D., Quatieri, T., & Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1), 19–41.CrossRef Reynolds, D., Quatieri, T., & Dunn, R. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1), 19–41.CrossRef
Zurück zum Zitat Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Acoustic, Speech and Signal Processing, 3(1), 72–83.CrossRef Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Acoustic, Speech and Signal Processing, 3(1), 72–83.CrossRef
Zurück zum Zitat Ribas, D., Vincent, E., & Calvo, J. R. (2015). Uncertainty propagation for noise robust speaker recognition: the case of nist-sre. In Interspeech-2015, Dresden, Germany. Ribas, D., Vincent, E., & Calvo, J. R. (2015). Uncertainty propagation for noise robust speaker recognition: the case of nist-sre. In Interspeech-2015, Dresden, Germany.
Zurück zum Zitat Roy, A., Doss, M. M., & Marcel, S. (2010). Boosted binary features for noise-robust speaker verification. In Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP ’10), Dallas, TX (pp. 4442–4445). Roy, A., Doss, M. M., & Marcel, S. (2010). Boosted binary features for noise-robust speaker verification. In Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP ’10), Dallas, TX (pp. 4442–4445).
Zurück zum Zitat Sarkar, S., & Rao, K. (2013). Significance of utterance partitioning in gmm-svm based speaker verification in varying background environment. In COCOSDA-2013, Noida, India. Sarkar, S., & Rao, K. (2013). Significance of utterance partitioning in gmm-svm based speaker verification in varying background environment. In COCOSDA-2013, Noida, India.
Zurück zum Zitat Sarkar, S., & Rao, K. (2014a). A novel boosting algorithm for improved i-vector based speaker verification in noisy environments. In Interspeech-2014, Singapore. Sarkar, S., & Rao, K. (2014a). A novel boosting algorithm for improved i-vector based speaker verification in noisy environments. In Interspeech-2014, Singapore.
Zurück zum Zitat Sarkar, S., & Rao, K. S. (2014b). Stochastic feature compensation methods for speaker verification in noisy environments. Applied Soft Computing, 19, 198–214.CrossRef Sarkar, S., & Rao, K. S. (2014b). Stochastic feature compensation methods for speaker verification in noisy environments. Applied Soft Computing, 19, 198–214.CrossRef
Zurück zum Zitat Solomonoff, A., Quillen, C., & Boardman, I. (2004). Channel compensation for SVM speaker recognition. In IEEE workshop on speaker and language recognition (Odyssey ’04), Toledo, Spain (pp. 57–62). Solomonoff, A., Quillen, C., & Boardman, I. (2004). Channel compensation for SVM speaker recognition. In IEEE workshop on speaker and language recognition (Odyssey ’04), Toledo, Spain (pp. 57–62).
Zurück zum Zitat Sun, Y., Todorovic, S., & Li, J. (2006). Reducing the overfitting of AdaBoost by controlling its data distribution skewness. International Journal of Pattern Recognition and Artificial Intelligence, 20, 1093–1116.CrossRef Sun, Y., Todorovic, S., & Li, J. (2006). Reducing the overfitting of AdaBoost by controlling its data distribution skewness. International Journal of Pattern Recognition and Artificial Intelligence, 20, 1093–1116.CrossRef
Zurück zum Zitat Tang, Y., Zhang, Y. Q., Chawla, N. V., & Krasser, S. (2009). SVMs modeling for highly imbalanced classification. IEEE Transaction on Systems, Man, and Cybernetics Part B: Cybernetics, 39, 281–288.CrossRef Tang, Y., Zhang, Y. Q., Chawla, N. V., & Krasser, S. (2009). SVMs modeling for highly imbalanced classification. IEEE Transaction on Systems, Man, and Cybernetics Part B: Cybernetics, 39, 281–288.CrossRef
Zurück zum Zitat Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.CrossRef Varga, A., & Steeneken, H. J. (1993). Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Speech Communication, 12, 247–251.CrossRef
Zurück zum Zitat Verma, P., & Das, P. K. (2015). i-Vectors in speech processing applications: a survey. International Journal of Speech Technology, 18, 529–546.CrossRef Verma, P., & Das, P. K. (2015). i-Vectors in speech processing applications: a survey. International Journal of Speech Technology, 18, 529–546.CrossRef
Zurück zum Zitat Veropoulos, K., Campbell, C., & Cristianini, N. (1999). Contolling the sensitivity of support vector machines. In Proceedings of international joint conference on artificial intelligence, Stockholm, Sweden. Veropoulos, K., Campbell, C., & Cristianini, N. (1999). Contolling the sensitivity of support vector machines. In Proceedings of international joint conference on artificial intelligence, Stockholm, Sweden.
Zurück zum Zitat Wan, V., & Renals, S. (2005). Speaker verification using sequence discriminant support vector machines. IEEE Transaction on Acoustic, Speech and Audio Processing, 13(2), 203–210.CrossRef Wan, V., & Renals, S. (2005). Speaker verification using sequence discriminant support vector machines. IEEE Transaction on Acoustic, Speech and Audio Processing, 13(2), 203–210.CrossRef
Zurück zum Zitat Wu, G., & Chang, E. (2005). KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Transactions on Knowledge and Data Engineering, 17(6), 786–795.CrossRef Wu, G., & Chang, E. (2005). KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Transactions on Knowledge and Data Engineering, 17(6), 786–795.CrossRef
Zurück zum Zitat Ye, J. (2005). Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6, 483–502.MathSciNetMATH Ye, J. (2005). Characterization of a family of algorithms for generalized discriminant analysis on undersampled problems. Journal of Machine Learning Research, 6, 483–502.MathSciNetMATH
Zurück zum Zitat You, C. H., Lee, K. A., & Li, H. (2009). An SVM kernel with GMM-Supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Processing Letters, 16(1), 49–52.CrossRef You, C. H., Lee, K. A., & Li, H. (2009). An SVM kernel with GMM-Supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Processing Letters, 16(1), 49–52.CrossRef
Metadaten
Titel
Supervector-based approaches in a discriminative framework for speaker verification in noisy environments
verfasst von
Sourjya Sarkar
K. Sreenivasa Rao
Publikationsdatum
04.05.2017
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 2/2017
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-017-9410-8

Weitere Artikel der Ausgabe 2/2017

International Journal of Speech Technology 2/2017 Zur Ausgabe

Neuer Inhalt