Skip to main content
Top

2014 | OriginalPaper | Chapter

6. Speaker Identification and Time Scale Modification Using VOPs

Authors : K. Sreenivasa Rao, Anil Kumar Vuppala

Published in: Speech Processing in Mobile Environments

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter, the proposed two-stage VOP detection method is used for improving the Speaker Identification (SI) performance in the presence of coding. With the help of VOPs, the crucial regions of speech segments which mainly characterize speaker-specific information are determined. Features extracted from these crucial speech segments are used for speaker identification task for improving the recognition accuracy. The accurate VOPs determined from the proposed method are also explored for nonuniform time scale modification. The proposed nonuniform time scale modification method provides high quality speech while varying speech rate. In this method, vowel regions are modified nonuniformly based on the type of vowel, and consonant and transition regions are unaltered irrespective of speaking rate. Here, vowel onset points are used to determine consonant, vowel, and transition regions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference K.N. Stevens, Acoustic Phonetics (MIT Press, Cambridge, MA, 1999) K.N. Stevens, Acoustic Phonetics (MIT Press, Cambridge, MA, 1999)
2.
go back to reference D. Crystal, A Dictionary of Linguistics and Phonetics (Basil Blackwell, Cambridge, Massachusetts, 1985) D. Crystal, A Dictionary of Linguistics and Phonetics (Basil Blackwell, Cambridge, Massachusetts, 1985)
3.
go back to reference M.A. Jack, J. Laver, Aspects of Speech Technology (Edinburgh university press, Edinburgh, 1988) M.A. Jack, J. Laver, Aspects of Speech Technology (Edinburgh university press, Edinburgh, 1988)
4.
go back to reference S.R.M. Prasanna, Event-based analysis of speech, PhD thesis, IIT Madras, March 2004 S.R.M. Prasanna, Event-based analysis of speech, PhD thesis, IIT Madras, March 2004
5.
go back to reference S.R.M. Prasanna, S.V. Gangashetty, B. Yegnanarayana, Significance of vowel onset point for speech analysis, in Proc. of Int. Conf. Signal Processing and Communications, (Bangalore, India, 2001), pp. 81–88 S.R.M. Prasanna, S.V. Gangashetty, B. Yegnanarayana, Significance of vowel onset point for speech analysis, in Proc. of Int. Conf. Signal Processing and Communications, (Bangalore, India, 2001), pp. 81–88
6.
go back to reference K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)CrossRef K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)CrossRef
7.
go back to reference D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87, 866–873 (1990)CrossRef D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87, 866–873 (1990)CrossRef
8.
go back to reference J.-H. Wang, S.-H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Phoenix, Arizona, 1999), pp. 1261–1264 J.-H. Wang, S.-H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Phoenix, Arizona, 1999), pp. 1261–1264
9.
go back to reference S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, (Jeju Island, Korea, 2004), pp. 401–410 S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, (Jeju Island, Korea, 2004), pp. 401–410
10.
go back to reference J.-F. Wang, C.H. Wu, S.H. Chang, J.Y. Lee, A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)CrossRef J.-F. Wang, C.H. Wu, S.H. Chang, J.Y. Lee, A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)CrossRef
11.
go back to reference S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana., Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances, in Proc. of IEEE ICISIP, pp. 159–164, 2004 S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana., Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances, in Proc. of IEEE ICISIP, pp. 159–164, 2004
12.
go back to reference S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information, in Proc. of Interspeech (Lisbon, Portugal, 2005), pp. 1133–1136 S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information, in Proc. of Interspeech (Lisbon, Portugal, 2005), pp. 1133–1136
13.
go back to reference A. Kazemzadeh, J. Tepperman, J. Silva, H. You, S. Lee, A. Alwan, S. Narayanan, Automatic detection of voice onset time contrasts for use in pronunciation assessment, in Proc. Int. Conf. Spoken Language Processing (Pittsburgh, PA, USA, 2006) A. Kazemzadeh, J. Tepperman, J. Silva, H. You, S. Lee, A. Alwan, S. Narayanan, Automatic detection of voice onset time contrasts for use in pronunciation assessment, in Proc. Int. Conf. Spoken Language Processing (Pittsburgh, PA, USA, 2006)
14.
go back to reference V. Stouten, H.V. hamme, Automatic voice onset time estimation from reassignment spectra. Speech Comm. 51, 1194–1205 (2009) V. Stouten, H.V. hamme, Automatic voice onset time estimation from reassignment spectra. Speech Comm. 51, 1194–1205 (2009)
15.
go back to reference S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)CrossRef S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)CrossRef
16.
go back to reference K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Comm. 51, 1263–1269 (2009)CrossRef K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Comm. 51, 1263–1269 (2009)CrossRef
17.
go back to reference K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. (Elsevier) 55(6), 745–756 (2013) K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. (Elsevier) 55(6), 745–756 (2013)
18.
go back to reference J.H.L. Hansen, S.S. Gray, W. Kim, Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Comm. 52, 777–789 (2010)CrossRef J.H.L. Hansen, S.S. Gray, W. Kim, Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Comm. 52, 777–789 (2010)CrossRef
19.
go back to reference C. Prakash, N. Dhananjaya, S. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. of Int. Conf. on the Systems, Signals and Image Processing (IWSSIP), (IEEE, Sarajevo, Bosnia and Herzegovina, 2011), pp. 1–4 C. Prakash, N. Dhananjaya, S. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. of Int. Conf. on the Systems, Signals and Image Processing (IWSSIP), (IEEE, Sarajevo, Bosnia and Herzegovina, 2011), pp. 1–4
20.
go back to reference D. Zaykovskiy, Survey of the speech recognition techniques for mobile devices, in Proc. of DS Publications, 2006 D. Zaykovskiy, Survey of the speech recognition techniques for mobile devices, in Proc. of DS Publications, 2006
21.
go back to reference Z.H. Tan, B. Lindberg, Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, London, 2008)CrossRefMATH Z.H. Tan, B. Lindberg, Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, London, 2008)CrossRefMATH
22.
go back to reference J.M. Huerta, Speech recognition in mobile environments, PhD thesis, Carnegie Mellon University, Apr. 2000 J.M. Huerta, Speech recognition in mobile environments, PhD thesis, Carnegie Mellon University, Apr. 2000
23.
go back to reference A.M. Peinado, J.C. Segura, Speech Recognition over Digital Channels (Wiley, New York, 2006)CrossRef A.M. Peinado, J.C. Segura, Speech Recognition over Digital Channels (Wiley, New York, 2006)CrossRef
24.
go back to reference S. Kafley, A.K. Vuppala, A. Chauhan, K.S. Rao, “Continuous digit recognition in mobile environment,” in Proc. of IEEE Techsym (IIT Kharagpur, India, 2010), pp. 217–222 S. Kafley, A.K. Vuppala, A. Chauhan, K.S. Rao, “Continuous digit recognition in mobile environment,” in Proc. of IEEE Techsym (IIT Kharagpur, India, 2010), pp. 217–222
25.
go back to reference A.M. Gomez, A.M. Peinado, V. Sanchez, A.J. Rubio, Recognition of coded speech transmitted over wireless channels. IEEE Trans. Wireless Comm. 5, 2555–2562 (2006)CrossRef A.M. Gomez, A.M. Peinado, V. Sanchez, A.J. Rubio, Recognition of coded speech transmitted over wireless channels. IEEE Trans. Wireless Comm. 5, 2555–2562 (2006)CrossRef
26.
go back to reference S. Euler, J. Zinke, The influence of speech coding algorithms on automatic speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994), pp. 621–624 S. Euler, J. Zinke, The influence of speech coding algorithms on automatic speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994), pp. 621–624
27.
go back to reference B.T. Lilly, K.K. Paliwal, Effect of speech coders on speech recognition performance, in Proc. Int. Conf. Spoken Language Processing (Philadelphia, PA, USA, 1996), pp. 2344–2347 B.T. Lilly, K.K. Paliwal, Effect of speech coders on speech recognition performance, in Proc. Int. Conf. Spoken Language Processing (Philadelphia, PA, USA, 1996), pp. 2344–2347
28.
go back to reference A. Gallardo-Antolin, C. Pelaez-Moreno, F.D. de Maria, Recognizing GSM digital speech. IEEE Trans. Speech Audio Process 13(6), 1186–1205 (2005)CrossRef A. Gallardo-Antolin, C. Pelaez-Moreno, F.D. de Maria, Recognizing GSM digital speech. IEEE Trans. Speech Audio Process 13(6), 1186–1205 (2005)CrossRef
29.
go back to reference F. Quatieri, E. Singer, R.B. Dunn, D.A. Reynolds, J.P. Campbell, Speaker and language recognition using speech codec parameters, in Proc. of Eurospeech (Budapest, Hungary, 1999), pp. 787–790 F. Quatieri, E. Singer, R.B. Dunn, D.A. Reynolds, J.P. Campbell, Speaker and language recognition using speech codec parameters, in Proc. of Eurospeech (Budapest, Hungary, 1999), pp. 787–790
30.
go back to reference R.B. Dunn, T.F. Quatieri, D.A. Reynolds, J.P. Campbell, Speaker recognition from coded speech in matched and mismatched condition, in Proc. of Speaker Recognition Workshop (Crete, Greece, 1999), pp. 115–120 R.B. Dunn, T.F. Quatieri, D.A. Reynolds, J.P. Campbell, Speaker recognition from coded speech in matched and mismatched condition, in Proc. of Speaker Recognition Workshop (Crete, Greece, 1999), pp. 115–120
31.
go back to reference R. Dunn, T. Quatieri, D. Reynolds, J. Campbell, Speaker recognition from coded speech and the effects of score normalization, in Proc. of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (IEEE, Monterery, CA, USA, 2001), pp. 1562–1567 R. Dunn, T. Quatieri, D. Reynolds, J. Campbell, Speaker recognition from coded speech and the effects of score normalization, in Proc. of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (IEEE, Monterery, CA, USA, 2001), pp. 1562–1567
32.
go back to reference A. Krobba, M. Debyeche, A. Amrouche, Evaluation of speaker identification system using GSM-EFR speech data, in Proc. of Int. Conf. on Design and Technology of Integrated Systems (Nanoscale Era Hammamet, 2010), pp. 1–5 A. Krobba, M. Debyeche, A. Amrouche, Evaluation of speaker identification system using GSM-EFR speech data, in Proc. of Int. Conf. on Design and Technology of Integrated Systems (Nanoscale Era Hammamet, 2010), pp. 1–5
33.
go back to reference A. Janicki, T. Staroszczyk, Speaker recognition from coded speech using support vector machines, in Proc. of 4th Int. Conf. on Text, Speech and Dialogue (Springer, Pilsen, Czech Republic, 2011), pp. 291–298 A. Janicki, T. Staroszczyk, Speaker recognition from coded speech using support vector machines, in Proc. of 4th Int. Conf. on Text, Speech and Dialogue (Springer, Pilsen, Czech Republic, 2011), pp. 291–298
34.
go back to reference C. Mokbel, G. Chollet, Speech recognition in adverse environments: speech enhancement and spectral transformations, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Ontario, Canada, 1991) C. Mokbel, G. Chollet, Speech recognition in adverse environments: speech enhancement and spectral transformations, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Ontario, Canada, 1991)
35.
go back to reference J.A. Nolazco-Flores, S. Young, CSS-PMC: a combined enhancement/compensation scheme for continuous speech recognition in noise. Cambridge University Engineering Department. Technical Report, 1993 J.A. Nolazco-Flores, S. Young, CSS-PMC: a combined enhancement/compensation scheme for continuous speech recognition in noise. Cambridge University Engineering Department. Technical Report, 1993
36.
go back to reference J. Huang, Y. Zhao, Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Process. Lett. 4, 283–285 (1997)CrossRef J. Huang, Y. Zhao, Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Process. Lett. 4, 283–285 (1997)CrossRef
37.
go back to reference K. Hermus, W. Verhelst, P. Wambacq, Optimized subspace weighting for robust speech recognition in additive noise environments, in Proc. of ICSLP (Beijing, China, 2000), pp. 542–545 K. Hermus, W. Verhelst, P. Wambacq, Optimized subspace weighting for robust speech recognition in additive noise environments, in Proc. of ICSLP (Beijing, China, 2000), pp. 542–545
38.
go back to reference K. Hermus, P. Wambacq, Assessment of signal subspace based speech enhancement for noise robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Montreal, Canada, 2004), pp. 945–948 K. Hermus, P. Wambacq, Assessment of signal subspace based speech enhancement for noise robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Montreal, Canada, 2004), pp. 945–948
39.
go back to reference H. Kris, W. Patrick, V.H. Hugo, A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Appl. Signal Process. 195–209 (2007) H. Kris, W. Patrick, V.H. Hugo, A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Appl. Signal Process. 195–209 (2007)
40.
go back to reference H. Hermanski, N. Morgan, H.G. Hirsch, Recognition of speech in additive and convolutional noise based on RASTA spectral processing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994) H. Hermanski, N. Morgan, H.G. Hirsch, Recognition of speech in additive and convolutional noise based on RASTA spectral processing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994)
41.
go back to reference O. Viiki, B. Bye, K. Laurila, A recursive feature vector normalization approach for robust speech recognition in noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Seattle, USA, 1998) O. Viiki, B. Bye, K. Laurila, A recursive feature vector normalization approach for robust speech recognition in noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Seattle, USA, 1998)
42.
go back to reference D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, A. Acero, A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Las Vegas, USA, 2008), pp. 4041–4044 D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, A. Acero, A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Las Vegas, USA, 2008), pp. 4041–4044
43.
go back to reference X. Cui, A. Alwan, Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR. IEEE Trans. Speech Audio Process. 13, 1161–1172 (2005)CrossRef X. Cui, A. Alwan, Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR. IEEE Trans. Speech Audio Process. 13, 1161–1172 (2005)CrossRef
44.
go back to reference F. Hilger, H. Ney, Quantile based histogram equalization for noise robust large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 845–854 (2006)CrossRef F. Hilger, H. Ney, Quantile based histogram equalization for noise robust large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 845–854 (2006)CrossRef
45.
go back to reference A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Perez-Cordoba, M.C. Benitez, A.J. Rubio, Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Speech Audio Process. 13(3), 355–366 (2005)CrossRef A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Perez-Cordoba, M.C. Benitez, A.J. Rubio, Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Speech Audio Process. 13(3), 355–366 (2005)CrossRef
46.
go back to reference Y. Suh, M. Ji, H. Kim, Probabilistic class histogram equalization for robust speech recognition. IEEE Signal Process. Lett. 14(4), 287–290 (2007)CrossRef Y. Suh, M. Ji, H. Kim, Probabilistic class histogram equalization for robust speech recognition. IEEE Signal Process. Lett. 14(4), 287–290 (2007)CrossRef
47.
go back to reference K. Ohkura, M. Sugiyama, Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Canada, 1991) K. Ohkura, M. Sugiyama, Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Canada, 1991)
48.
go back to reference M. Gales, S.Young, S.J. Young, Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996) M. Gales, S.Young, S.J. Young, Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)
49.
go back to reference P.J. Moreno, Speech Recognition in Noisy Environments, PhD thesis, Carnegie Mellon University, 1996 P.J. Moreno, Speech Recognition in Noisy Environments, PhD thesis, Carnegie Mellon University, 1996
50.
go back to reference S.V. Vaseghi, B.P. Milner, Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 5, 11–21 (1997)CrossRef S.V. Vaseghi, B.P. Milner, Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 5, 11–21 (1997)CrossRef
51.
go back to reference H. Liao, M.J.F. Gales, Adaptive training with joint uncertainty decoding for robust recognition of noisy data, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Honolulu, USA, 2007), pp. 389–392 H. Liao, M.J.F. Gales, Adaptive training with joint uncertainty decoding for robust recognition of noisy data, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Honolulu, USA, 2007), pp. 389–392
52.
go back to reference O. Kalinli, M.L. Seltzer, J. Droppo, A. Acero, Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 18(8), 1889–1901 (2010) O. Kalinli, M.L. Seltzer, J. Droppo, A. Acero, Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 18(8), 1889–1901 (2010)
53.
go back to reference D.K. Kim, M.J.F. Gales, Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(2), 315–325 (2011)CrossRef D.K. Kim, M.J.F. Gales, Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(2), 315–325 (2011)CrossRef
54.
go back to reference S.V. Gangashetty, Neural network models for recognition of consonant-vowel units of speech in Multiple Languages, PhD thesis, IIT Madras, October 2004 S.V. Gangashetty, Neural network models for recognition of consonant-vowel units of speech in Multiple Languages, PhD thesis, IIT Madras, October 2004
55.
go back to reference C.C. Sekhar, Neural Network models for recognition of stop consonant-vowel (SCV) segments in continuous speech, PhD thesis, IIT Madras, 1996 C.C. Sekhar, Neural Network models for recognition of stop consonant-vowel (SCV) segments in continuous speech, PhD thesis, IIT Madras, 1996
56.
go back to reference K.S. Rao, Application of prosody models for developing speech systems in indian languages. Int. J. Speech Tech. (Springer) 14, 19–33 (2011)CrossRef K.S. Rao, Application of prosody models for developing speech systems in indian languages. Int. J. Speech Tech. (Springer) 14, 19–33 (2011)CrossRef
57.
go back to reference C.C. Sekhar, W.F. Lee, K. Takeda, F. Itakura, Acoustic modeling of subword units using support vector machines, in Proc. of WSLP (Mumbai, India, 2003) C.C. Sekhar, W.F. Lee, K. Takeda, F. Itakura, Acoustic modeling of subword units using support vector machines, in Proc. of WSLP (Mumbai, India, 2003)
58.
go back to reference S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages, in Proc. of ICISIP (Chennai, India, 2005), pp. 387–391 S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages, in Proc. of ICISIP (Chennai, India, 2005), pp. 387–391
59.
go back to reference K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)CrossRef K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)CrossRef
60.
go back to reference E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Comm. 16, 175–205 (1995)CrossRef E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Comm. 16, 175–205 (1995)CrossRef
61.
go back to reference M.R. Portnoff, Time-scale modification of speech based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. 29, 374–390 (1981)CrossRefMathSciNet M.R. Portnoff, Time-scale modification of speech based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. 29, 374–390 (1981)CrossRefMathSciNet
62.
go back to reference H.G. Ilk, S. Guler, Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications. Signal Process. 86, 127–139 (2006)CrossRefMATH H.G. Ilk, S. Guler, Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications. Signal Process. 86, 127–139 (2006)CrossRefMATH
63.
go back to reference K.S. Rao, Real time prosody modification. J. Signal Inform. Process. 50–62 (2010) K.S. Rao, Real time prosody modification. J. Signal Inform. Process. 50–62 (2010)
64.
go back to reference T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Signal Process. 40, 497–510 (1992)CrossRef T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Signal Process. 40, 497–510 (1992)CrossRef
65.
go back to reference J. di Marino, Y. Laprie, Supression of phasiness for time-scale modifications of speech signals based on a shape invarience property, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Saltlake city, Utah, USA, 2001) J. di Marino, Y. Laprie, Supression of phasiness for time-scale modifications of speech signals based on a shape invarience property, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Saltlake city, Utah, USA, 2001)
66.
go back to reference E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Comm. 9, 453–467 (1990)CrossRef E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Comm. 9, 453–467 (1990)CrossRef
67.
go back to reference M. Slaney, M. Covell, B. Lassiter, Automatic audio morphing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Atlanta, GA, USA, 1996) M. Slaney, M. Covell, B. Lassiter, Automatic audio morphing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Atlanta, GA, USA, 1996)
68.
go back to reference O. Donnellan, E. Jung, E. Coyle, Speech-adaptive time-scale modification for computer assisted language-learning, in Proc. of 3rd IEEE Int. Conf. on Advanced Learning Technologies (ICALT03) (Aix-en-Provence, France, 2003) O. Donnellan, E. Jung, E. Coyle, Speech-adaptive time-scale modification for computer assisted language-learning, in Proc. of 3rd IEEE Int. Conf. on Advanced Learning Technologies (ICALT03) (Aix-en-Provence, France, 2003)
69.
go back to reference A. Klapuri, Sound onset detection by applying psychoacoustic knowledge, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Washington, DC, USA, 1999), pp. 3089–3092 A. Klapuri, Sound onset detection by applying psychoacoustic knowledge, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Washington, DC, USA, 1999), pp. 3089–3092
70.
go back to reference C. Duxbury, M.E. Davies, M.B. Sandler, Separation of transient information in musical audio using multiresolution analysis techniques, in Proc. of Int. Conf. Digital Audio Effects (DAFX) Limerick (Limerick, 2001), pp. 1–4 C. Duxbury, M.E. Davies, M.B. Sandler, Separation of transient information in musical audio using multiresolution analysis techniques, in Proc. of Int. Conf. Digital Audio Effects (DAFX) Limerick (Limerick, 2001), pp. 1–4
71.
go back to reference J. Bonada, Automatic technique in frequency domain for near-lossless time-scale modification of audio, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Berlin, Germany, 2000), pp. 396–399 J. Bonada, Automatic technique in frequency domain for near-lossless time-scale modification of audio, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Berlin, Germany, 2000), pp. 396–399
72.
go back to reference C. Duxbury, M.E. Davies, M. Sandler, Improved time-scaling of musical audio using phase locking at transients, in Proc. of Audio Engineering Society Convention 11 (Munich, Germany, 2002), paper 5530 C. Duxbury, M.E. Davies, M. Sandler, Improved time-scaling of musical audio using phase locking at transients, in Proc. of Audio Engineering Society Convention 11 (Munich, Germany, 2002), paper 5530
73.
go back to reference A. Roebel, A new approach to transient processing in the phase vocoder, in Proc. of Int. Conf. Digital Audio Effects (DAFX) (London, 2003), pp. 344–349 A. Roebel, A new approach to transient processing in the phase vocoder, in Proc. of Int. Conf. Digital Audio Effects (DAFX) (London, 2003), pp. 344–349
74.
go back to reference X. Rodet, F. Jaillet, Detection and modeling of fast attack transients, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Havana, Cuba, 2001), pp. 30–33 X. Rodet, F. Jaillet, Detection and modeling of fast attack transients, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Havana, Cuba, 2001), pp. 30–33
75.
go back to reference S. Hainsworth, M. Macleod, P. Wolfe, Analysis of reassigned spectrograms for musical transcription, in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, 2001), pp. 23–26 S. Hainsworth, M. Macleod, P. Wolfe, Analysis of reassigned spectrograms for musical transcription, in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, 2001), pp. 23–26
76.
go back to reference S. Grofit, Y. Lavner, Time-scale modification of audio signals using enhanced WSOLA with management of transients. IEEE Trans. Audio Speech Lang. Process. 16, 106–115 (2008)CrossRef S. Grofit, Y. Lavner, Time-scale modification of audio signals using enhanced WSOLA with management of transients. IEEE Trans. Audio Speech Lang. Process. 16, 106–115 (2008)CrossRef
77.
go back to reference J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus linguistic data consortium, in Proc. of IEEE ICISIP (Philadelphia, PA, 1993) J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus linguistic data consortium, in Proc. of IEEE ICISIP (Philadelphia, PA, 1993)
78.
go back to reference S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Spotting multilingual consonant-vowel units of speech using neural networks, in An ISCA Tutorial and Research Workshop on Non-linear Speech Processing, pp. 287–297, 2005 S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Spotting multilingual consonant-vowel units of speech using neural networks, in An ISCA Tutorial and Research Workshop on Non-linear Speech Processing, pp. 287–297, 2005
79.
go back to reference R.M. Hegde, H.A. Murthy, V. Gadde, Continuous speech recognition using joint features derived from the modified group delay function and MFCC, in Proc. of INTERSPEECH-Int. Conf. Spoken Language Processing (Jeju Island, Korea, 2004), pp. 905–908 R.M. Hegde, H.A. Murthy, V. Gadde, Continuous speech recognition using joint features derived from the modified group delay function and MFCC, in Proc. of INTERSPEECH-Int. Conf. Spoken Language Processing (Jeju Island, Korea, 2004), pp. 905–908
80.
go back to reference K.S. Rao, B. Yegnanarayana, Intonation modeling for Indian languages. Comput. Speech Lang. 23, 240–256 (2009)CrossRef K.S. Rao, B. Yegnanarayana, Intonation modeling for Indian languages. Comput. Speech Lang. 23, 240–256 (2009)CrossRef
81.
go back to reference K.S. Rao, B. Yegnanarayana, Modeling durations of syllables using neural networks. Comput. Speech Lang. (Elsevier) 21, 282–295 (2007)CrossRef K.S. Rao, B. Yegnanarayana, Modeling durations of syllables using neural networks. Comput. Speech Lang. (Elsevier) 21, 282–295 (2007)CrossRef
82.
go back to reference K.S. Rao, S.G. Koolagudi, Selection of suitable features for modeling the durations of syllables. J. Softw. Eng. Appl. 1107–1117 (2010) K.S. Rao, S.G. Koolagudi, Selection of suitable features for modeling the durations of syllables. J. Softw. Eng. Appl. 1107–1117 (2010)
83.
go back to reference K.S. Rao, Role of neural network models for developing speech systems. SADHANA (Springer) 36, 783–836 (2011)CrossRef K.S. Rao, Role of neural network models for developing speech systems. SADHANA (Springer) 36, 783–836 (2011)CrossRef
84.
go back to reference L. Mary, K.S. Rao, B. Yegnanarayana, Neural Network Classifiers for Language Identification using Syntactic and Prosodic features, in Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing (Chennai, India, 2005), pp. 404–408 L. Mary, K.S. Rao, B. Yegnanarayana, Neural Network Classifiers for Language Identification using Syntactic and Prosodic features, in Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing (Chennai, India, 2005), pp. 404–408
85.
go back to reference L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Comm. 50, 782–796 (2008)CrossRef L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Comm. 50, 782–796 (2008)CrossRef
86.
go back to reference K.S. Rao, Acquisition and incorporation of prosody knowledge for speech systems in indian languages, PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, May 2005 K.S. Rao, Acquisition and incorporation of prosody knowledge for speech systems in indian languages, PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, May 2005
87.
go back to reference A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)CrossRef A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)CrossRef
88.
go back to reference S.R.M. Kodukula, Significance of excitation source information for speech analysis. PhD thesis, IIT Madras, March 2009 S.R.M. Kodukula, Significance of excitation source information for speech analysis. PhD thesis, IIT Madras, March 2009
89.
go back to reference S. Guruprasad, Exploring features and scoring methods for speaker recognition, Master’s thesis, MS Thesis, IIT Madras, 2004 S. Guruprasad, Exploring features and scoring methods for speaker recognition, Master’s thesis, MS Thesis, IIT Madras, 2004
90.
go back to reference P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)CrossRef P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)CrossRef
91.
go back to reference K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)CrossRef K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)CrossRef
92.
go back to reference K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRef K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRef
93.
go back to reference A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of speech coding on epoch extraction, in Proc. of IEEE Int. Conf. on Devices and Communications, (Mesra, India, 2011) A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of speech coding on epoch extraction, in Proc. of IEEE Int. Conf. on Devices and Communications, (Mesra, India, 2011)
94.
go back to reference A.K. Vuppala, K.S. Rao, S. Chakrabarti, Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int. J. Speech Tech. (Springer) 16(2), 229–235 (2013) A.K. Vuppala, K.S. Rao, S. Chakrabarti, Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int. J. Speech Tech. (Springer) 16(2), 229–235 (2013)
95.
go back to reference M.A. Joseph, S. Guruprasad, B. Yegnanarayana, Extracting formants from short segments of speech using group delay functions, in Proc. of Interspeech (Pittsburgh, PA, USA, 2006), pp. 1009–1012 M.A. Joseph, S. Guruprasad, B. Yegnanarayana, Extracting formants from short segments of speech using group delay functions, in Proc. of Interspeech (Pittsburgh, PA, USA, 2006), pp. 1009–1012
96.
go back to reference M.A. Joseph, Extracting formant frequencies from short segments of speech, Master’s thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Apr. 2008 M.A. Joseph, Extracting formant frequencies from short segments of speech, Master’s thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Apr. 2008
98.
go back to reference A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of noise on vowel onset point detection, in Proc. of Int. Conf. Contemporary Computing (Noida, India, 2011), pp. 201–211. Communications in Computer and Information Science (Springer) A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of noise on vowel onset point detection, in Proc. of Int. Conf. Contemporary Computing (Noida, India, 2011), pp. 201–211. Communications in Computer and Information Science (Springer)
99.
go back to reference A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on recognition of consonant-vowel (CV) units, in Proc. of Int. Conf. contemporary computing (Springer Communications in Computer and Information Science ISSN: 1865–0929), (Noida, India, 2010), pp. 284–294 A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on recognition of consonant-vowel (CV) units, in Proc. of Int. Conf. contemporary computing (Springer Communications in Computer and Information Science ISSN: 1865–0929), (Noida, India, 2010), pp. 284–294
100.
go back to reference A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved consonant-vowel recognition for low bit-rate coded speech. Wiley Int. J. Adapt. Contr. Signal Process. 26, 333–349 (2012)CrossRef A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved consonant-vowel recognition for low bit-rate coded speech. Wiley Int. J. Adapt. Contr. Signal Process. 26, 333–349 (2012)CrossRef
101.
go back to reference J.W. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81, 1215–1247 (1993)CrossRef J.W. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81, 1215–1247 (1993)CrossRef
102.
go back to reference S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.0 (Cambridge University Press, Cambridge, 2000) S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.0 (Cambridge University Press, Cambridge, 2000)
103.
go back to reference R. Collobert, S. Bengio, SVMTorch: support vector machines for large-scale regression problems. Proc. J. Mach. Learn. Res. 143–160 (2001) R. Collobert, S. Bengio, SVMTorch: support vector machines for large-scale regression problems. Proc. J. Mach. Learn. Res. 143–160 (2001)
104.
go back to reference A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEUE (Elsevier) 66, 697–700 (2012) A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEUE (Elsevier) 66, 697–700 (2012)
105.
go back to reference P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Comm. 53, 154–174 (2011)CrossRef P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Comm. 53, 154–174 (2011)CrossRef
106.
go back to reference S. Bell, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)CrossRef S. Bell, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)CrossRef
107.
go back to reference S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Orlando, USA, 2002) S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Orlando, USA, 2002)
108.
go back to reference Y. Ephrain, D. Malah, Speech enhancement using minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32, 1109–1121 (1984)CrossRef Y. Ephrain, D. Malah, Speech enhancement using minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32, 1109–1121 (1984)CrossRef
109.
go back to reference B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Comm. 28, 25–42 (1999)CrossRef B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Comm. 28, 25–42 (1999)CrossRef
110.
go back to reference B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using lp residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)CrossRef B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using lp residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)CrossRef
111.
go back to reference B. Yegnanarayana, S.R.M. Prasanna, R. Duraiswami, D. Zotkin, Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio Process. 13, 1110–1118 (2005)CrossRef B. Yegnanarayana, S.R.M. Prasanna, R. Duraiswami, D. Zotkin, Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio Process. 13, 1110–1118 (2005)CrossRef
112.
go back to reference A.K. Vuppala, K.S. Rao, S. Chakrabarti, P. Krishnamoorthy, S.R.M. Prasanna, Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. Int. J. Speech Tech. (Springer) 14(3), 259–272 (2011) A.K. Vuppala, K.S. Rao, S. Chakrabarti, P. Krishnamoorthy, S.R.M. Prasanna, Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. Int. J. Speech Tech. (Springer) 14(3), 259–272 (2011)
113.
go back to reference A.K. Vuppala, K.S. Rao, S. Chakrabarti, Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circ. Syst. Signal Process. (Springer) 31(4), 1459–1474 (2012) A.K. Vuppala, K.S. Rao, S. Chakrabarti, Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circ. Syst. Signal Process. (Springer) 31(4), 1459–1474 (2012)
114.
go back to reference A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imag. Syst. Eng. 6(3), 130–137 (2013)CrossRef A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imag. Syst. Eng. 6(3), 130–137 (2013)CrossRef
115.
go back to reference A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Wiley Int. J. Adapt. Contr. Signal Process. 29, 781–792 (2013)CrossRef A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Wiley Int. J. Adapt. Contr. Signal Process. 29, 781–792 (2013)CrossRef
116.
go back to reference A.K. Vuppala, S. Chakrabarti, K.S. Rao, L. Dutta, “Robust speaker recognition on mobile devices,” in Proc. of IEEE Int. Conf. on Signal Processing and Communications (Bangalore, India, 2010) A.K. Vuppala, S. Chakrabarti, K.S. Rao, L. Dutta, “Robust speaker recognition on mobile devices,” in Proc. of IEEE Int. Conf. on Signal Processing and Communications (Bangalore, India, 2010)
117.
go back to reference K.S. Prahallad, B. Yegnanarayana, S.V. Gangashetty, Online text-independent speaker verification system using autoassociative neural network models, in Proc. of INNS-IEEE Int. Joint Conf. Neural Networks (Washington DC, USA, 2001), pp. 1548–1553 K.S. Prahallad, B. Yegnanarayana, S.V. Gangashetty, Online text-independent speaker verification system using autoassociative neural network models, in Proc. of INNS-IEEE Int. Joint Conf. Neural Networks (Washington DC, USA, 2001), pp. 1548–1553
118.
go back to reference B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Network 15, 459–469 (2002)CrossRef B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Network 15, 459–469 (2002)CrossRef
119.
go back to reference A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on speaker identification, in Proc. of IEEE INDICON (Kolkata, India, 2010) A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on speaker identification, in Proc. of IEEE INDICON (Kolkata, India, 2010)
120.
go back to reference S. Sigurdsson, K.B. Petersen, T. Lehn-Schioler, Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music, in Proc. of Seventh Int. Conf. on Music Information Retrieval, 2006 S. Sigurdsson, K.B. Petersen, T. Lehn-Schioler, Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music, in Proc. of Seventh Int. Conf. on Music Information Retrieval, 2006
121.
go back to reference A.L. Edwards, An Introduction to Linear Regression and Correlation (W.H. Freeman and Company Ltd, Cranbury, NJ, 08512, USA, 1976) A.L. Edwards, An Introduction to Linear Regression and Correlation (W.H. Freeman and Company Ltd, Cranbury, NJ, 08512, USA, 1976)
122.
go back to reference J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals (Macmilan Publishing, New York, 1993) J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals (Macmilan Publishing, New York, 1993)
123.
go back to reference R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Publishing, New York, 1987) R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Publishing, New York, 1987)
124.
go back to reference S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, pp. 401–410, 2004 S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, pp. 401–410, 2004
125.
go back to reference J.R. Deller, J.H. Hansen, J.G. Proakis, Discrete Time Processing of Speech Signals, 1st edn. (Prentice Hall PTR, Upper Saddle River, NJ, 1993) J.R. Deller, J.H. Hansen, J.G. Proakis, Discrete Time Processing of Speech Signals, 1st edn. (Prentice Hall PTR, Upper Saddle River, NJ, 1993)
126.
go back to reference J. Benesty, M.M. Sondhi, Y.A. Huang, Springer Handbook of Speech Processing (Springer, New York, 2008)CrossRef J. Benesty, M.M. Sondhi, Y.A. Huang, Springer Handbook of Speech Processing (Springer, New York, 2008)CrossRef
127.
go back to reference J. Volkmann, S. Stevens, E. Newman, A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8, 185–190 (1937)CrossRef J. Volkmann, S. Stevens, E. Newman, A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8, 185–190 (1937)CrossRef
128.
go back to reference Z. Fang, Z. Guoliang, S. Zhanjiang, Comparison of different implementations of MFCC. J. Comput. Sci. Tech. 16(6), 582–589 (2001)CrossRefMATH Z. Fang, Z. Guoliang, S. Zhanjiang, Comparison of different implementations of MFCC. J. Comput. Sci. Tech. 16(6), 582–589 (2001)CrossRefMATH
129.
go back to reference G.K.T. Ganchev, N. Fakotakis, Comparative evaluation of various MFCC implementations on the speaker verification task, in Proc. of Int. Conf. on Speech and Computer (Patras, Greece, 2005), pp. 191–194 G.K.T. Ganchev, N. Fakotakis, Comparative evaluation of various MFCC implementations on the speaker verification task, in Proc. of Int. Conf. on Speech and Computer (Patras, Greece, 2005), pp. 191–194
130.
go back to reference L.R. Rabiner, B.H. Juang, Fundamentals of speech Recognition (Prentice Hall PTR, Englewood cliffs, NJ, 1993) L.R. Rabiner, B.H. Juang, Fundamentals of speech Recognition (Prentice Hall PTR, Englewood cliffs, NJ, 1993)
131.
go back to reference S. Furui, Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust. Speech Signal Process. 29(3), 342–350 (1981)CrossRef S. Furui, Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust. Speech Signal Process. 29(3), 342–350 (1981)CrossRef
132.
go back to reference J.S. Mason, X. Zhang, Velocity and acceleration features in speaker recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Toronto, Canada, 1991), pp. 3673–3676 J.S. Mason, X. Zhang, Velocity and acceleration features in speaker recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Toronto, Canada, 1991), pp. 3673–3676
133.
go back to reference W.C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders (Wiley, New York, 2003)CrossRef W.C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders (Wiley, New York, 2003)CrossRef
134.
go back to reference A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd edn. (Wiley, New York, 2004) A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd edn. (Wiley, New York, 2004)
135.
go back to reference H.L.J. Hansen, B.L. Pellom, An effective quality evaluation protocol for speech enhancement algorithm, in Proc. Int. Conf. Spoken Language Processing, pp. 2819–2822, 1998 H.L.J. Hansen, B.L. Pellom, An effective quality evaluation protocol for speech enhancement algorithm, in Proc. Int. Conf. Spoken Language Processing, pp. 2819–2822, 1998
136.
go back to reference L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in Proc. of IEEE, pp. 257–286, 1989 L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in Proc. of IEEE, pp. 257–286, 1989
137.
go back to reference S. Theodoridis, K. Koutroumbas, Pattern Recognition, 3rd edn. (Elsevier, Academic press, Waltham, Massachusetts, USA, 2006) S. Theodoridis, K. Koutroumbas, Pattern Recognition, 3rd edn. (Elsevier, Academic press, Waltham, Massachusetts, USA, 2006)
Metadata
Title
Speaker Identification and Time Scale Modification Using VOPs
Authors
K. Sreenivasa Rao
Anil Kumar Vuppala
Copyright Year
2014
DOI
https://doi.org/10.1007/978-3-319-03116-3_6