nach oben

Cognitive Computation

Erschienen in:

01.09.2010

Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation

verfasst von: Mohamed Anouar Ben Messaoud, Aïcha Bouzid, Noureddine Ellouze

Erschienen in: Cognitive Computation | Ausgabe 3/2010

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this work, we present an algorithm for voiced/unvoiced decision and pitch estimation from speech signals. Our approach is based on classifying the peaks provided by the autocorrelation of the speech multi-scale product. The multi-scale product is based on making the product of the speech wavelet transform coefficients at three successive dyadic scales. The autocorrelation function of the multi-scale product is calculated over frames of a specific length. The experimental results show the robustness and the effectiveness of our approach. Besides, the proposed method outperforms some existing algorithms in a clean and noisy environment.

Vorheriger Artikel Phoneme Recognition by Means of a Growing Hierarchical Recurrent Self-Organizing Model Based on Locally Adapting Neighborhood Radii

Nächster Artikel Reducing Features Using Discriminative Common Vectors

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Qi Y, Hunt BR. Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier. IEEE Trans Speech Audio Process. 1993;1(2):250–6.CrossRef

Martin A, Charlet D, Mauuary L. Robust speech/non-speech detection using LDA applied to MFCC. IEEE Int Conf Acoust Speech Signal Process. 2001;1:237–40.

Shaughnessy DO. Speech communications: human and machine. 2nd ed. Piscataway, NJ: IEEE Press; 1999.

Childers DG, Hahn M, Larar JN. Silent and voiced/unvoiced/mixed excitation classification of speech. IEEE Trans Acoust Speech Signal Process. 1989;37(11):1771–4.CrossRef

Liao L, Gregory M. Algorithms for speech classification. IEEE Int Conf Signal Process Appl. 1999;2:623–7.

Hess W. Pitch determination of speech signals: algorithms and devices. New York: Springer; 1983.

Bagshaw PC, Hiller SM, Jack MA. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. In: The 3rd European conference on speech communication and technology; 1993.

Talkin D. A robust algorithm for pitch tracking. In: Kleijn WB, Paliwal KK, editors. Speech coding and synthesis. Amsterdam: Elsevier; 1995. p. 495–518.

Rabiner L. On the use of autocorrelation analysis for pitch detection. IEEE Trans Acoust Speech Signal Process. 1977;25(1):24–33.CrossRef

10.

Krubsack DA, Niederjohn RJ. An autocorrelation pitch detector and voicing decision with confidence measures developed for noise-corrupted speech. IEEE Trans Signal Process. 1991;39(2):319–29.CrossRef

11.

De Cheveigné A, Kawahara H. YIN, a fundamental frequency estimator for speech and music. J Acoust Soc Amer. 2002;111(4):1917–30.CrossRef

12.

Boersma P. Accurate short-term analysis of the fundamental frequency and the harmonics-to-noise ratio of a sampled sound. Proc Inst Phon Sci. 1993;17:97–110.

13.

Noll AM. Cepstrum pitch determination. J Acoust Soc Amer. 1967;41(2):293–309.CrossRef

14.

Shimamura T, Takagi H. Noise-robust fundamental frequency extraction method based on exponentiated band-limited amplitude spectrum. IEEE Int Conf Midwest Symposium on Circuits and Systems. 2004;47(2):141–4.

15.

Shahnaz C, Zhu WP, Ahmad MO. A spectro-temporal algorithm for pitch frequency estimation from noisy observations. In: IEEE international symposium on circuits and systems. Seattle, WA; 2008. p. 1704–7.

16.

Ben Messaoud MA, Bouzid A, Ellouze N. Spectral multi-scale product analysis for pitch estimation from noisy speech signal. In: Solé-Casals J, Zaiats V, editors. Advances on non-linear speech processing, International conference on non-linear speech processing, NOLISP’09, LNAI, vol. 5933. Berlin: Springer; 2010. p. 95–102.

17.

Ben Messaoud MA, Bouzid A, Ellouze N. A new method for pitch tracking and voicing decision based on spectral multi-scale analysis. Signal Process: An Int J. 2009;3(5):144–9.

18.

Burrus CS, Gopinath RA, Guo H. Introduction to wavelets and wavelet transforms: a primer. Englewood Cliffs: Prentice Hall; 1998.

19.

Mallat S. A wavelet tour of signal processing: the sparse way. 3rd ed. Burlington, VT: Academic Press; 2008.

20.

Berman Z, Baras JS. Properties of the multiscale maxima and zero-crossings representations. IEEE Trans Signal Process. 1993;41(12):3216–31.CrossRef

21.

Kadambe S, Boudreaux-Bartels GF. Application of the wavelet transform for pitch detection of speech signals. IEEE Trans Inf Theory. 1992;38(2):917–8.CrossRef

22.

Bouzid A, Ellouze N. Voice source parameter measurement based on multi-scale analysis of electroglottographic signal. Speech Commun. 2009;51(9):782–92.CrossRef

23.

Bouzid A, Ellouze N. Open quotient measurements based on multiscale product of speech signal wavelet transform. New York: Hindawi Publishing Corp, Res Lett Signal Process; 2007. p. 1–6.

24.

Xu Y, Weaver JB, Healy DM, Lu J. Wavelet transform domain filters: a spatially selective noise filtration technique. IEEE Trans Image Process. 1994;3(6):747–58.CrossRefPubMed

25.

Sadler BM, Swami A. Analysis of multi-scale products for step detection and estimation. IEEE Trans Inf Theory. 1999;45(3):1043–9.CrossRef

26.

Meyer G, Plante F, Ainsworth WA. A pitch extraction reference database. The 4th European conference on speech communication and technology, EUROSPEECH. Madrid, Spain; 1995. p. 837–40.

27.

Keele Pitch Database. In: Psychology Home page-human machine perception. University of Liverpool. 1995. http://www.liv.ac.uk/Psychology/hmp/projects/pitch/speech/keele_pitch_database.html. Accessed 24 April 2010.

28.

Joho D, Bennewitz M, Behnke S. Pitch estimation using models of voiced speech on three levels. IEEE Int Conf Acoust Speech Signal Process. 2007;4:1077–80.

29.

Sha F, Saul LK. Real time pitch determination of one or more voices by nonnegative matrix factorization. In: Saul LK, Weiss Y, Bottou L, editors. Advances in neural information processing systems. Cambridge: MIT Press; 2005. p. 1233–40.

30.

Sha F, Burgoyne JA, Saul LK. Multiband statistical learning for f0 estimation in speech. IEEE Int Conf Acoust Speech Signal process. 2004;5:661–4.

31.

Nakatani T, Irino T. Robust and accurate fundamental frequency estimation based on dominant harmonic components. J Acoust Soc Amer. 2004;116(6):3690–700.CrossRef

32.

Noisex92. In: Signal Processing Information Base (SPIB). The Signal Processing Society and the National Science Foundation. 2007. http://spib.rice.edu/spib/select_noise.html. Accessed 24 April 2010.

Titel: Autocorrelation of the Speech Multi-Scale Product for Voicing Decision and Pitch Estimation
verfasst von: Mohamed Anouar Ben Messaoud
Aïcha Bouzid
Noureddine Ellouze
Publikationsdatum: 01.09.2010
Verlag: Springer-Verlag
Erschienen in: Cognitive Computation / Ausgabe 3/2010
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI: https://doi.org/10.1007/s12559-010-9048-1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2010

Phoneme Recognition by Means of a Growing Hierarchical Recurrent Self-Organizing Model Based on Locally Adapting Neighborhood Radii

Cognitive Architectures for Affect and Motivation

Flexible Latching: A Biologically-Inspired Mechanism for Improving the Management of Homeostatic Goals

Experimental Case Studies for Investigating E-Banking Phishing Techniques and Attack Strategies

Reducing Features Using Discriminative Common Vectors

Non-Linear and Non-Conventional Speech Processing: Alternative Techniques