Top

International Journal of Speech Technology

Published in:

28-11-2015

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Authors: Mohamed Anouar Ben Messaoud, Aïcha Bouzid

Published in: International Journal of Speech Technology | Issue 1/2016

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The pitch is a crucial parameter in speech and music signals. However, due to severe noisy conditions, missing harmonics, unsuitable physical vibration, the determination of pitch presents a great challenge when desiring to get a good accuracy. In this paper, we propose a method for pitch estimation of speech and music sounds. Our method is based on the fast Fourier transform (FFT) of the multi-scale product (MP) provided by a feature auditory model of the sound signals. The auditory model simulates the spectral behaviour of the cochlea by a gammachirp filter-bank, and the out/middle ear filtering by a low-pass filter. For the two output channels, the FFT function of the MP is computed over frames. The MP is based on constituting the product of the speech and music wavelet transform coefficients at three scales. The experimental results show that our method estimates the pitch with high accuracy. Besides, our proposed method outperforms several other pitch detection algorithms in clean and noisy environments.

previous article ILATalk: a new multilingual text-to-speech synthesizer with machine learning

next article Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Bello, J. P., Daudet, L., Abdallah, S., & Duxbury, C. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech, Audio Processing, 13, 1035–1048.CrossRef

Ben Messaoud, M. A., Bouzid, A., & Ellouze, N. (2015). Automatic segmentation of the clean speech signal. World Academy of Science, Engineering and Technology International Journal of Electrical, Computer, Electronics and Communication Engineering, 9, 114–117.

Brown, J., & Zhang, B. (1991). Musical frequency tracking using the methods of conventional and ’narrowed’ autocorrelation. Journal of the Acoustic Society of America, 89, 2346–2354.CrossRef

Camacho, A., & Harris, J. (2008). A sawtooth waveform inspired pitch estimator for speech and music. Journal of the Acoustic Society of America, 124, 1638–1652.CrossRef

De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustic Society of America, 111, 1917–1930.CrossRef

Gavat, I., Zira, M., & Sabac, B. (2002). Pitch estimation by block and instantaneous methods. International Journal of Speech Technology, 5, 269–279.CrossRefMATH

Hess, W. J. (1992). Pitch and voicing determination. In S. Furni, M. Sondhi, & M. Dekker (Eds.), Advances in speech signal processing. New York: Marcel Dekker, Inc.,

Irino, T., & Patterson, R. D. (2006). A dynamic compressive gammachirp auditory filterbank. IEEE Transactions on Audio, Speech and Language Processing, 14, 2222–2253.CrossRef

Kawahara, H., Katayose, H., De Cheveigné, A., & Patterson, R. D. (1999). Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. Proceedings 6th EUROSPEECH (pp. 2781–2784).

Klapuri, A. (2000). Qualitative and quantitative aspects in the design of periodicity estimation algorithms. European signal processing conference proceedings (pp. 2069–2072).

Klapuri, A. (2004). Automatic music transcription as we know it today. Journal of New Music Research, 33, 269–282.CrossRef

Kunieda, N., Shimamura, T., & Suzuki, J. (1996). Robust method of measurement of fundamental frequency by aclos: autocorrelation of log spectrum. International conference on acoustics, speech, and signal processing proceedings (pp. 232–235). Atlanta, GA.

Li, H., Dai, B., & Lu, W. (2006). A pitch detection algorithm based on AMDF and ACF. International conference on acoustics, speech and signal processing proceedings. Toulouse (pp. 377–380).

Lyon, R. F., Katsiamis, A. G., & Drakakis, E. M. (2010). History and future of auditory filter models. Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) (pp. 3809–3820).

Mahmoodzadeh, A., Abutalebi, H. R., Soltanian-Zadeh, H., Sheikhzadeh, H. (2012). Single channel speech separation with a frame-based pitch range estimation method in modulation frequency. International symposium on telecommunications (pp. 609–613).

Mallat, S. (1999). A wavelet tour of signal processing. San Diego: Academic Press.MATH

Meddis, R., Lopez-Poveda, E. A., Fay, R. R., & Popper, A. N. (2010). Computational models of the auditory system., Springer Handbook of Auditory Research New York: Springer.CrossRef

Meddis, R., & O’Mard, L. (1997). A unitary model for pitch perception. Journal of the Acoustic Society of America, 102, 1811–1820.CrossRef

Meyer, G., Plante, F., & Ainsworth, W. A. (1995). A pitch extraction reference database. 4th European Conference on Speech Communication and Technology. EUROSPEECH’95, Madrid, pp. 837–840.

Muller, M., Ellis, D., Klapuri, A., & Richard, G. (2011). Signal processing for music analysis. IEEE Journal of Selected Topics in Signal Processing, 5, 1088–1110.CrossRef

Patterson, R. D., Unoki, M., & Irino, T. (2003). Extending the domain of centre frequencies for the compressive gammachirp auditory filter. Journal of the Acoustic Society of America, 114, 1529–1570.CrossRef

Prasanna, S. R. M., & Yegnanarayana, B. (2004). Extraction of pitch in adverse conditions. International conference on acoustics, speech and signal processing proceedings (pp. 109–112).

Roy, S. J., Molla, M. K. I., Hirose, K., & Hasan, M. K. (2011). Harmonic modification and data adaptive filtering based approach to robust pitch estimation. International Journal of Speech Technology, 14, 339–349.CrossRef

Shahnaz, C., Zhu, W. P., & Ahmad, M. O. (2007). A robust pitch estimation algorithm in noise. International conference on acoustics, speech and signal processing proceedings (pp. 1037–1076).

Shahnaz, C. Zhu, W. P., & Ahmad, M. O. (2008). A pitch extraction algorithm in noise based on temporal and spectral representations. International conference on acoustics, speech and signal processing proceedings (pp. 4477–4480).

Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9, 727–730.CrossRef

Sun, X. (2000). A pitch determination algorithm based on subharmonic-to-harmonic ratio. International conference on spoken language processing proceedings (pp. 676–679). Beijing.

Tolonen, M., & Karjalainen, M. (2000). A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Process, 8, 708–716.CrossRef

University of lowa. (2012). Electronic music studios. http://theremin.music.uiowa.edu.

Van Immerseel, L. M., & Martens, J. P. (1992). Pitch and voiced/unvoiced determination with an auditory model. Journal of the Acoustic Society of America, 91, 3511–3526.CrossRef

Varga, A. (1993). Assessment for automatic speech recognition: II. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Elsevier Speech Communication, 12, 247–251.CrossRef

Wang, D. L., & Brown, G. J. (2006). Principles, computational auditory scene analysis: Algorithms, and applications. Hoboken, NJ: Wiley/IEEE Press.CrossRef

Xu, Y., Weaver, J., Healy, D., & Lu, J. (1994). Wavelet transform domain filters: A spatially selective noise filtration technique. IEEE Transactions on Image Processing, 3, 747–758.CrossRef

Title: Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction
Authors: Mohamed Anouar Ben Messaoud
Aïcha Bouzid
Publication date: 28-11-2015
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 1/2016
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-015-9325-1

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2016

Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement

Efficient feature combination techniques for emotional speech classification

ILATalk: a new multilingual text-to-speech synthesizer with machine learning

Combining the evidences of temporal and spectral enhancement techniques for improving the performance of Indian language identification system in the presence of background noise

Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features

Automatic speech segmentation in syllable centric speech recognition system