Top

Published in:

2014 | OriginalPaper | Chapter

6. Speaker Identification and Time Scale Modification Using VOPs

Authors : K. Sreenivasa Rao, Anil Kumar Vuppala

Published in: Speech Processing in Mobile Environments

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this chapter, the proposed two-stage VOP detection method is used for improving the Speaker Identification (SI) performance in the presence of coding. With the help of VOPs, the crucial regions of speech segments which mainly characterize speaker-specific information are determined. Features extracted from these crucial speech segments are used for speaker identification task for improving the recognition accuracy. The accurate VOPs determined from the proposed method are also explored for nonuniform time scale modification. The proposed nonuniform time scale modification method provides high quality speech while varying speech rate. In this method, vowel regions are modified nonuniformly based on the type of vowel, and consonant and transition regions are unaltered irrespective of speaking rate. Here, vowel onset points are used to determine consonant, vowel, and transition regions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Spotting and Recognition of Consonant–Vowel Units from Continuous Speech

next chapter Summary and Conclusions

K.N. Stevens, Acoustic Phonetics (MIT Press, Cambridge, MA, 1999)

D. Crystal, A Dictionary of Linguistics and Phonetics (Basil Blackwell, Cambridge, Massachusetts, 1985)

M.A. Jack, J. Laver, Aspects of Speech Technology (Edinburgh university press, Edinburgh, 1988)

S.R.M. Prasanna, Event-based analysis of speech, PhD thesis, IIT Madras, March 2004

S.R.M. Prasanna, S.V. Gangashetty, B. Yegnanarayana, Significance of vowel onset point for speech analysis, in Proc. of Int. Conf. Signal Processing and Communications, (Bangalore, India, 2001), pp. 81–88

K.S. Rao, Voice conversion by mapping the speaker-specific features using pitch synchronous approach. Comput. Speech Lang. 24, 474–494 (2010)CrossRef

D.J. Hermes, Vowel onset detection. J. Acoust. Soc. Am. 87, 866–873 (1990)CrossRef

J.-H. Wang, S.-H. Chen, A C/V segmentation algorithm for Mandarin speech using wavelet transforms, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Phoenix, Arizona, 1999), pp. 1261–1264

S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Detection of vowel onset points in continuous speech using autoassociative neural network models, in Proc. Int. Conf. Spoken Language Processing, (Jeju Island, Korea, 2004), pp. 401–410

10.

J.-F. Wang, C.H. Wu, S.H. Chang, J.Y. Lee, A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans. Signal Process. 39(9), 2141–2146 (1991)CrossRef

11.

S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana., Extraction of fixed dimension patterns from varying duration segments of consonant-vowel utterances, in Proc. of IEEE ICISIP, pp. 159–164, 2004

12.

S.R.M. Prasanna, B. Yegnanarayana, Detection of vowel onset point events using excitation source information, in Proc. of Interspeech (Lisbon, Portugal, 2005), pp. 1133–1136

13.

A. Kazemzadeh, J. Tepperman, J. Silva, H. You, S. Lee, A. Alwan, S. Narayanan, Automatic detection of voice onset time contrasts for use in pronunciation assessment, in Proc. Int. Conf. Spoken Language Processing (Pittsburgh, PA, USA, 2006)

14.

V. Stouten, H.V. hamme, Automatic voice onset time estimation from reassignment spectra. Speech Comm. 51, 1194–1205 (2009)

15.

S.R.M. Prasanna, B.V.S. Reddy, P. Krishnamoorthy, Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17, 556–565 (2009)CrossRef

16.

K.S. Rao, B. Yegnanarayana, Duration modification using glottal closure instants and vowel onset points. Speech Comm. 51, 1263–1269 (2009)CrossRef

17.

K.S. Rao, A.K. Vuppala, Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Comm. (Elsevier) 55(6), 745–756 (2013)

18.

J.H.L. Hansen, S.S. Gray, W. Kim, Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Comm. 52, 777–789 (2010)CrossRef

19.

C. Prakash, N. Dhananjaya, S. Gangashetty, Bessel features for detection of voice onset time using AM-FM signal, in Proc. of Int. Conf. on the Systems, Signals and Image Processing (IWSSIP), (IEEE, Sarajevo, Bosnia and Herzegovina, 2011), pp. 1–4

20.

D. Zaykovskiy, Survey of the speech recognition techniques for mobile devices, in Proc. of DS Publications, 2006

21.

Z.H. Tan, B. Lindberg, Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, London, 2008)CrossRefMATH

22.

J.M. Huerta, Speech recognition in mobile environments, PhD thesis, Carnegie Mellon University, Apr. 2000

23.

A.M. Peinado, J.C. Segura, Speech Recognition over Digital Channels (Wiley, New York, 2006)CrossRef

24.

S. Kafley, A.K. Vuppala, A. Chauhan, K.S. Rao, “Continuous digit recognition in mobile environment,” in Proc. of IEEE Techsym (IIT Kharagpur, India, 2010), pp. 217–222

25.

A.M. Gomez, A.M. Peinado, V. Sanchez, A.J. Rubio, Recognition of coded speech transmitted over wireless channels. IEEE Trans. Wireless Comm. 5, 2555–2562 (2006)CrossRef

26.

S. Euler, J. Zinke, The influence of speech coding algorithms on automatic speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994), pp. 621–624

27.

B.T. Lilly, K.K. Paliwal, Effect of speech coders on speech recognition performance, in Proc. Int. Conf. Spoken Language Processing (Philadelphia, PA, USA, 1996), pp. 2344–2347

28.

A. Gallardo-Antolin, C. Pelaez-Moreno, F.D. de Maria, Recognizing GSM digital speech. IEEE Trans. Speech Audio Process 13(6), 1186–1205 (2005)CrossRef

29.

F. Quatieri, E. Singer, R.B. Dunn, D.A. Reynolds, J.P. Campbell, Speaker and language recognition using speech codec parameters, in Proc. of Eurospeech (Budapest, Hungary, 1999), pp. 787–790

30.

R.B. Dunn, T.F. Quatieri, D.A. Reynolds, J.P. Campbell, Speaker recognition from coded speech in matched and mismatched condition, in Proc. of Speaker Recognition Workshop (Crete, Greece, 1999), pp. 115–120

31.

R. Dunn, T. Quatieri, D. Reynolds, J. Campbell, Speaker recognition from coded speech and the effects of score normalization, in Proc. of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (IEEE, Monterery, CA, USA, 2001), pp. 1562–1567

32.

A. Krobba, M. Debyeche, A. Amrouche, Evaluation of speaker identification system using GSM-EFR speech data, in Proc. of Int. Conf. on Design and Technology of Integrated Systems (Nanoscale Era Hammamet, 2010), pp. 1–5

33.

A. Janicki, T. Staroszczyk, Speaker recognition from coded speech using support vector machines, in Proc. of 4th Int. Conf. on Text, Speech and Dialogue (Springer, Pilsen, Czech Republic, 2011), pp. 291–298

34.

C. Mokbel, G. Chollet, Speech recognition in adverse environments: speech enhancement and spectral transformations, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Ontario, Canada, 1991)

35.

J.A. Nolazco-Flores, S. Young, CSS-PMC: a combined enhancement/compensation scheme for continuous speech recognition in noise. Cambridge University Engineering Department. Technical Report, 1993

36.

J. Huang, Y. Zhao, Energy-constrained signal subspace method for speech enhancement and recognition. IEEE Signal Process. Lett. 4, 283–285 (1997)CrossRef

37.

K. Hermus, W. Verhelst, P. Wambacq, Optimized subspace weighting for robust speech recognition in additive noise environments, in Proc. of ICSLP (Beijing, China, 2000), pp. 542–545

38.

K. Hermus, P. Wambacq, Assessment of signal subspace based speech enhancement for noise robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Montreal, Canada, 2004), pp. 945–948

39.

H. Kris, W. Patrick, V.H. Hugo, A review of signal subspace speech enhancement and its application to noise robust speech recognition. EURASIP J. Appl. Signal Process. 195–209 (2007)

40.

H. Hermanski, N. Morgan, H.G. Hirsch, Recognition of speech in additive and convolutional noise based on RASTA spectral processing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Adelaide, Australia, 1994)

41.

O. Viiki, B. Bye, K. Laurila, A recursive feature vector normalization approach for robust speech recognition in noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Seattle, USA, 1998)

42.

D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, A. Acero, A minimum-mean-square-error noise reduction algorithm on mel-frequency cepstra for robust speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Las Vegas, USA, 2008), pp. 4041–4044

43.

X. Cui, A. Alwan, Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR. IEEE Trans. Speech Audio Process. 13, 1161–1172 (2005)CrossRef

44.

F. Hilger, H. Ney, Quantile based histogram equalization for noise robust large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 14(3), 845–854 (2006)CrossRef

45.

A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Perez-Cordoba, M.C. Benitez, A.J. Rubio, Histogram equalization of speech representation for robust speech recognition. IEEE Trans. Speech Audio Process. 13(3), 355–366 (2005)CrossRef

46.

Y. Suh, M. Ji, H. Kim, Probabilistic class histogram equalization for robust speech recognition. IEEE Signal Process. Lett. 14(4), 287–290 (2007)CrossRef

47.

K. Ohkura, M. Sugiyama, Speech recognition in a noisy environment using a noise reduction neural network and a codebook mapping technique, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Toronto, Canada, 1991)

48.

M. Gales, S.Young, S.J. Young, Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)

49.

P.J. Moreno, Speech Recognition in Noisy Environments, PhD thesis, Carnegie Mellon University, 1996

50.

S.V. Vaseghi, B.P. Milner, Noise compensation methods for hidden Markov model speech recognition in adverse environments. IEEE Trans. Speech Audio Process. 5, 11–21 (1997)CrossRef

51.

H. Liao, M.J.F. Gales, Adaptive training with joint uncertainty decoding for robust recognition of noisy data, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Honolulu, USA, 2007), pp. 389–392

52.

O. Kalinli, M.L. Seltzer, J. Droppo, A. Acero, Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 18(8), 1889–1901 (2010)

53.

D.K. Kim, M.J.F. Gales, Noisy constrained maximum-likelihood linear regression for noise-robust speech recognition. IEEE Trans. Audio Speech Lang. Process. 19(2), 315–325 (2011)CrossRef

54.

S.V. Gangashetty, Neural network models for recognition of consonant-vowel units of speech in Multiple Languages, PhD thesis, IIT Madras, October 2004

55.

C.C. Sekhar, Neural Network models for recognition of stop consonant-vowel (SCV) segments in continuous speech, PhD thesis, IIT Madras, 1996

56.

K.S. Rao, Application of prosody models for developing speech systems in indian languages. Int. J. Speech Tech. (Springer) 14, 19–33 (2011)CrossRef

57.

C.C. Sekhar, W.F. Lee, K. Takeda, F. Itakura, Acoustic modeling of subword units using support vector machines, in Proc. of WSLP (Mumbai, India, 2003)

58.

S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Combining evidence from multiple classifiers for recognition of consonant-vowel units of speech in multiple languages, in Proc. of ICISIP (Chennai, India, 2005), pp. 387–391

59.

K.S. Rao, B. Yegnanarayana, Prosody modification using instants of significant excitation. IEEE Trans. Audio Speech Lang. Process. 14, 972–980 (2006)CrossRef

60.

E. Moulines, J. Laroche, Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Comm. 16, 175–205 (1995)CrossRef

61.

M.R. Portnoff, Time-scale modification of speech based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. 29, 374–390 (1981)CrossRefMathSciNet

62.

H.G. Ilk, S. Guler, Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications. Signal Process. 86, 127–139 (2006)CrossRefMATH

63.

K.S. Rao, Real time prosody modification. J. Signal Inform. Process. 50–62 (2010)

64.

T.F. Quatieri, R.J. McAulay, Shape invariant time-scale and pitch modification of speech. IEEE Signal Process. 40, 497–510 (1992)CrossRef

65.

J. di Marino, Y. Laprie, Supression of phasiness for time-scale modifications of speech signals based on a shape invarience property, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Saltlake city, Utah, USA, 2001)

66.

E. Moulines, F. Charpentier, Pitch-synchronous waveform processing techniques for text to speech synthesis using diphones. Speech Comm. 9, 453–467 (1990)CrossRef

67.

M. Slaney, M. Covell, B. Lassiter, Automatic audio morphing, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Atlanta, GA, USA, 1996)

68.

O. Donnellan, E. Jung, E. Coyle, Speech-adaptive time-scale modification for computer assisted language-learning, in Proc. of 3rd IEEE Int. Conf. on Advanced Learning Technologies (ICALT03) (Aix-en-Provence, France, 2003)

69.

A. Klapuri, Sound onset detection by applying psychoacoustic knowledge, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Washington, DC, USA, 1999), pp. 3089–3092

70.

C. Duxbury, M.E. Davies, M.B. Sandler, Separation of transient information in musical audio using multiresolution analysis techniques, in Proc. of Int. Conf. Digital Audio Effects (DAFX) Limerick (Limerick, 2001), pp. 1–4

71.

J. Bonada, Automatic technique in frequency domain for near-lossless time-scale modification of audio, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Berlin, Germany, 2000), pp. 396–399

72.

C. Duxbury, M.E. Davies, M. Sandler, Improved time-scaling of musical audio using phase locking at transients, in Proc. of Audio Engineering Society Convention 11 (Munich, Germany, 2002), paper 5530

73.

A. Roebel, A new approach to transient processing in the phase vocoder, in Proc. of Int. Conf. Digital Audio Effects (DAFX) (London, 2003), pp. 344–349

74.

X. Rodet, F. Jaillet, Detection and modeling of fast attack transients, in Proc. of Int. Conf. Computer Music Conference (ICMC) (Havana, Cuba, 2001), pp. 30–33

75.

S. Hainsworth, M. Macleod, P. Wolfe, Analysis of reassigned spectrograms for musical transcription, in Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY, 2001), pp. 23–26

76.

S. Grofit, Y. Lavner, Time-scale modification of audio signals using enhanced WSOLA with management of transients. IEEE Trans. Audio Speech Lang. Process. 16, 106–115 (2008)CrossRef

77.

J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, N.L. Dahlgren, V. Zue, TIMIT acoustic-phonetic continuous speech corpus linguistic data consortium, in Proc. of IEEE ICISIP (Philadelphia, PA, 1993)

78.

S.V. Gangashetty, C.C. Sekhar, B. Yegnanarayana, Spotting multilingual consonant-vowel units of speech using neural networks, in An ISCA Tutorial and Research Workshop on Non-linear Speech Processing, pp. 287–297, 2005

79.

R.M. Hegde, H.A. Murthy, V. Gadde, Continuous speech recognition using joint features derived from the modified group delay function and MFCC, in Proc. of INTERSPEECH-Int. Conf. Spoken Language Processing (Jeju Island, Korea, 2004), pp. 905–908

80.

K.S. Rao, B. Yegnanarayana, Intonation modeling for Indian languages. Comput. Speech Lang. 23, 240–256 (2009)CrossRef

81.

K.S. Rao, B. Yegnanarayana, Modeling durations of syllables using neural networks. Comput. Speech Lang. (Elsevier) 21, 282–295 (2007)CrossRef

82.

K.S. Rao, S.G. Koolagudi, Selection of suitable features for modeling the durations of syllables. J. Softw. Eng. Appl. 1107–1117 (2010)

83.

K.S. Rao, Role of neural network models for developing speech systems. SADHANA (Springer) 36, 783–836 (2011)CrossRef

84.

L. Mary, K.S. Rao, B. Yegnanarayana, Neural Network Classifiers for Language Identification using Syntactic and Prosodic features, in Proc. IEEE Int. Conf. Intelligent Sensing and Information Processing (Chennai, India, 2005), pp. 404–408

85.

L. Mary, B. Yegnanarayana, Extraction and representation of prosodic features for language and speaker recognition. Speech Comm. 50, 782–796 (2008)CrossRef

86.

K.S. Rao, Acquisition and incorporation of prosody knowledge for speech systems in indian languages, PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras, May 2005

87.

A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Vowel onset point detection for low bit rate coded speech. IEEE Trans. Audio Speech Lang. Process. 20(6), 1894–1903 (2012)CrossRef

88.

S.R.M. Kodukula, Significance of excitation source information for speech analysis. PhD thesis, IIT Madras, March 2009

89.

S. Guruprasad, Exploring features and scoring methods for speaker recognition, Master’s thesis, MS Thesis, IIT Madras, 2004

90.

P.S. Murthy, B. Yegnanarayana, Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals. IEEE Trans. Speech Audio Process. 7, 609–619 (1999)CrossRef

91.

K.S. Rao, S.R.M. Prasanna, B. Yegnanarayana, Determination of instants of significant excitation in speech using hilbert envelope and group delay function. IEEE Signal Process. Lett. 14, 762–765 (2007)CrossRef

92.

K.S.R. Murty, B. Yegnanarayana, Epoch extraction from speech signals. IEEE Trans. Audio Speech Lang. Process. 16(8), 1602–1613 (2008)CrossRef

93.

A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of speech coding on epoch extraction, in Proc. of IEEE Int. Conf. on Devices and Communications, (Mesra, India, 2011)

94.

A.K. Vuppala, K.S. Rao, S. Chakrabarti, Vowel onset point detection for noisy speech using spectral energy at formant frequencies. Int. J. Speech Tech. (Springer) 16(2), 229–235 (2013)

95.

M.A. Joseph, S. Guruprasad, B. Yegnanarayana, Extracting formants from short segments of speech using group delay functions, in Proc. of Interspeech (Pittsburgh, PA, USA, 2006), pp. 1009–1012

96.

M.A. Joseph, Extracting formant frequencies from short segments of speech, Master’s thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Apr. 2008

97.

Noisex-92: http://spib.rice.edu/spib/select_noise.html

98.

A.K. Vuppala, J. Yadav, K.S. Rao, S. Chakrabarti, Effect of noise on vowel onset point detection, in Proc. of Int. Conf. Contemporary Computing (Noida, India, 2011), pp. 201–211. Communications in Computer and Information Science (Springer)

99.

A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on recognition of consonant-vowel (CV) units, in Proc. of Int. Conf. contemporary computing (Springer Communications in Computer and Information Science ISSN: 1865–0929), (Noida, India, 2010), pp. 284–294

100.

A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved consonant-vowel recognition for low bit-rate coded speech. Wiley Int. J. Adapt. Contr. Signal Process. 26, 333–349 (2012)CrossRef

101.

J.W. Picone, Signal modeling techniques in speech recognition. Proc. IEEE 81, 1215–1247 (1993)CrossRef

102.

S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book Version 3.0 (Cambridge University Press, Cambridge, 2000)

103.

R. Collobert, S. Bengio, SVMTorch: support vector machines for large-scale regression problems. Proc. J. Mach. Learn. Res. 143–160 (2001)

104.

A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved vowel onset point detection using epoch intervals. AEUE (Elsevier) 66, 697–700 (2012)

105.

P. Krishnamoorthy, S.R.M. Prasanna, Enhancement of noisy speech by temporal and spectral processing. Speech Comm. 53, 154–174 (2011)CrossRef

106.

S. Bell, Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Process. 27, 113–120 (1979)CrossRef

107.

S. Kamath, P. Loizou, A multi-band spectral subtraction method for enhancing speech corrupted by colored noise, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (Orlando, USA, 2002)

108.

Y. Ephrain, D. Malah, Speech enhancement using minimum mean square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. 32, 1109–1121 (1984)CrossRef

109.

B. Yegnanarayana, C. Avendano, H. Hermansky, P.S. Murthy, Speech enhancement using linear prediction residual. Speech Comm. 28, 25–42 (1999)CrossRef

110.

B. Yegnanarayana, P.S. Murthy, Enhancement of reverberant speech using lp residual signal. IEEE Trans. Speech Audio Process. 8, 267–281 (2000)CrossRef

111.

B. Yegnanarayana, S.R.M. Prasanna, R. Duraiswami, D. Zotkin, Processing of reverberant speech for time-delay estimation. IEEE Trans. Speech Audio Process. 13, 1110–1118 (2005)CrossRef

112.

A.K. Vuppala, K.S. Rao, S. Chakrabarti, P. Krishnamoorthy, S.R.M. Prasanna, Recognition of consonant-vowel (CV) units under background noise using combined temporal and spectral preprocessing. Int. J. Speech Tech. (Springer) 14(3), 259–272 (2011)

113.

A.K. Vuppala, K.S. Rao, S. Chakrabarti, Spotting and recognition of consonant-vowel units from continuous speech using accurate vowel onset points. Circ. Syst. Signal Process. (Springer) 31(4), 1459–1474 (2012)

114.

A.K. Vuppala, K.S. Rao, S. Chakrabarti, Improved speaker identification in wireless environment. Int. J. Signal Imag. Syst. Eng. 6(3), 130–137 (2013)CrossRef

115.

A.K. Vuppala, K.S. Rao, Speaker identification under background noise using features extracted from steady vowel regions. Wiley Int. J. Adapt. Contr. Signal Process. 29, 781–792 (2013)CrossRef

116.

A.K. Vuppala, S. Chakrabarti, K.S. Rao, L. Dutta, “Robust speaker recognition on mobile devices,” in Proc. of IEEE Int. Conf. on Signal Processing and Communications (Bangalore, India, 2010)

117.

K.S. Prahallad, B. Yegnanarayana, S.V. Gangashetty, Online text-independent speaker verification system using autoassociative neural network models, in Proc. of INNS-IEEE Int. Joint Conf. Neural Networks (Washington DC, USA, 2001), pp. 1548–1553

118.

B. Yegnanarayana, S.P. Kishore, AANN an alternative to GMM for pattern recognition. Neural Network 15, 459–469 (2002)CrossRef

119.

A.K. Vuppala, S. Chakrabarti, K.S. Rao, Effect of speech coding on speaker identification, in Proc. of IEEE INDICON (Kolkata, India, 2010)

120.

S. Sigurdsson, K.B. Petersen, T. Lehn-Schioler, Mel frequency cepstral coefficients: An evaluation of robustness of MP3 encoded music, in Proc. of Seventh Int. Conf. on Music Information Retrieval, 2006

121.

A.L. Edwards, An Introduction to Linear Regression and Correlation (W.H. Freeman and Company Ltd, Cranbury, NJ, 08512, USA, 1976)

122.

J.R. Deller, J.G. Proakis, J.H.L. Hansen, Discrete-Time Processing of Speech Signals (Macmilan Publishing, New York, 1993)

123.

R.V. Hogg, J. Ledolter, Engineering Statistics (Macmillan Publishing, New York, 1987)

124.

125.

J.R. Deller, J.H. Hansen, J.G. Proakis, Discrete Time Processing of Speech Signals, 1st edn. (Prentice Hall PTR, Upper Saddle River, NJ, 1993)

126.

J. Benesty, M.M. Sondhi, Y.A. Huang, Springer Handbook of Speech Processing (Springer, New York, 2008)CrossRef

127.

J. Volkmann, S. Stevens, E. Newman, A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8, 185–190 (1937)CrossRef

128.

Z. Fang, Z. Guoliang, S. Zhanjiang, Comparison of different implementations of MFCC. J. Comput. Sci. Tech. 16(6), 582–589 (2001)CrossRefMATH

129.

G.K.T. Ganchev, N. Fakotakis, Comparative evaluation of various MFCC implementations on the speaker verification task, in Proc. of Int. Conf. on Speech and Computer (Patras, Greece, 2005), pp. 191–194

130.

L.R. Rabiner, B.H. Juang, Fundamentals of speech Recognition (Prentice Hall PTR, Englewood cliffs, NJ, 1993)

131.

S. Furui, Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Trans. Acoust. Speech Signal Process. 29(3), 342–350 (1981)CrossRef

132.

J.S. Mason, X. Zhang, Velocity and acceleration features in speaker recognition, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Toronto, Canada, 1991), pp. 3673–3676

133.

W.C. Chu, Speech Coding Algorithms: Foundation and Evolution of Standardized Coders (Wiley, New York, 2003)CrossRef

134.

A.M. Kondoz, Digital Speech: Coding for Low Bit Rate Communication Systems, 2nd edn. (Wiley, New York, 2004)

135.

H.L.J. Hansen, B.L. Pellom, An effective quality evaluation protocol for speech enhancement algorithm, in Proc. Int. Conf. Spoken Language Processing, pp. 2819–2822, 1998

136.

L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, in Proc. of IEEE, pp. 257–286, 1989

137.

S. Theodoridis, K. Koutroumbas, Pattern Recognition, 3rd edn. (Elsevier, Academic press, Waltham, Massachusetts, USA, 2006)

Title: Speaker Identification and Time Scale Modification Using VOPs
Authors: K. Sreenivasa Rao
Anil Kumar Vuppala
Publisher: Springer International Publishing
Book: Speech Processing in Mobile Environments
Print ISBN: 978-3-319-03115-6

Electronic ISBN: 978-3-319-03116-3

Copyright Year: 2014
DOI: https://doi.org/10.1007/978-3-319-03116-3_6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"