Skip to main content
Top
Published in: International Journal of Speech Technology 1/2016

28-11-2015

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction

Authors: Mohamed Anouar Ben Messaoud, Aïcha Bouzid

Published in: International Journal of Speech Technology | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The pitch is a crucial parameter in speech and music signals. However, due to severe noisy conditions, missing harmonics, unsuitable physical vibration, the determination of pitch presents a great challenge when desiring to get a good accuracy. In this paper, we propose a method for pitch estimation of speech and music sounds. Our method is based on the fast Fourier transform (FFT) of the multi-scale product (MP) provided by a feature auditory model of the sound signals. The auditory model simulates the spectral behaviour of the cochlea by a gammachirp filter-bank, and the out/middle ear filtering by a low-pass filter. For the two output channels, the FFT function of the MP is computed over frames. The MP is based on constituting the product of the speech and music wavelet transform coefficients at three scales. The experimental results show that our method estimates the pitch with high accuracy. Besides, our proposed method outperforms several other pitch detection algorithms in clean and noisy environments.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bello, J. P., Daudet, L., Abdallah, S., & Duxbury, C. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech, Audio Processing, 13, 1035–1048.CrossRef Bello, J. P., Daudet, L., Abdallah, S., & Duxbury, C. (2005). A tutorial on onset detection in music signals. IEEE Transactions on Speech, Audio Processing, 13, 1035–1048.CrossRef
go back to reference Ben Messaoud, M. A., Bouzid, A., & Ellouze, N. (2015). Automatic segmentation of the clean speech signal. World Academy of Science, Engineering and Technology International Journal of Electrical, Computer, Electronics and Communication Engineering, 9, 114–117. Ben Messaoud, M. A., Bouzid, A., & Ellouze, N. (2015). Automatic segmentation of the clean speech signal. World Academy of Science, Engineering and Technology International Journal of Electrical, Computer, Electronics and Communication Engineering, 9, 114–117.
go back to reference Brown, J., & Zhang, B. (1991). Musical frequency tracking using the methods of conventional and ’narrowed’ autocorrelation. Journal of the Acoustic Society of America, 89, 2346–2354.CrossRef Brown, J., & Zhang, B. (1991). Musical frequency tracking using the methods of conventional and ’narrowed’ autocorrelation. Journal of the Acoustic Society of America, 89, 2346–2354.CrossRef
go back to reference Camacho, A., & Harris, J. (2008). A sawtooth waveform inspired pitch estimator for speech and music. Journal of the Acoustic Society of America, 124, 1638–1652.CrossRef Camacho, A., & Harris, J. (2008). A sawtooth waveform inspired pitch estimator for speech and music. Journal of the Acoustic Society of America, 124, 1638–1652.CrossRef
go back to reference De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustic Society of America, 111, 1917–1930.CrossRef De Cheveigné, A., & Kawahara, H. (2002). YIN, a fundamental frequency estimator for speech and music. Journal of the Acoustic Society of America, 111, 1917–1930.CrossRef
go back to reference Gavat, I., Zira, M., & Sabac, B. (2002). Pitch estimation by block and instantaneous methods. International Journal of Speech Technology, 5, 269–279.CrossRefMATH Gavat, I., Zira, M., & Sabac, B. (2002). Pitch estimation by block and instantaneous methods. International Journal of Speech Technology, 5, 269–279.CrossRefMATH
go back to reference Hess, W. J. (1992). Pitch and voicing determination. In S. Furni, M. Sondhi, & M. Dekker (Eds.), Advances in speech signal processing. New York: Marcel Dekker, Inc., Hess, W. J. (1992). Pitch and voicing determination. In S. Furni, M. Sondhi, & M. Dekker (Eds.), Advances in speech signal processing. New York: Marcel Dekker, Inc.,
go back to reference Irino, T., & Patterson, R. D. (2006). A dynamic compressive gammachirp auditory filterbank. IEEE Transactions on Audio, Speech and Language Processing, 14, 2222–2253.CrossRef Irino, T., & Patterson, R. D. (2006). A dynamic compressive gammachirp auditory filterbank. IEEE Transactions on Audio, Speech and Language Processing, 14, 2222–2253.CrossRef
go back to reference Kawahara, H., Katayose, H., De Cheveigné, A., & Patterson, R. D. (1999). Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. Proceedings 6th EUROSPEECH (pp. 2781–2784). Kawahara, H., Katayose, H., De Cheveigné, A., & Patterson, R. D. (1999). Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. Proceedings 6th EUROSPEECH (pp. 2781–2784).
go back to reference Klapuri, A. (2000). Qualitative and quantitative aspects in the design of periodicity estimation algorithms. European signal processing conference proceedings (pp. 2069–2072). Klapuri, A. (2000). Qualitative and quantitative aspects in the design of periodicity estimation algorithms. European signal processing conference proceedings (pp. 2069–2072).
go back to reference Klapuri, A. (2004). Automatic music transcription as we know it today. Journal of New Music Research, 33, 269–282.CrossRef Klapuri, A. (2004). Automatic music transcription as we know it today. Journal of New Music Research, 33, 269–282.CrossRef
go back to reference Kunieda, N., Shimamura, T., & Suzuki, J. (1996). Robust method of measurement of fundamental frequency by aclos: autocorrelation of log spectrum. International conference on acoustics, speech, and signal processing proceedings (pp. 232–235). Atlanta, GA. Kunieda, N., Shimamura, T., & Suzuki, J. (1996). Robust method of measurement of fundamental frequency by aclos: autocorrelation of log spectrum. International conference on acoustics, speech, and signal processing proceedings (pp. 232–235). Atlanta, GA.
go back to reference Li, H., Dai, B., & Lu, W. (2006). A pitch detection algorithm based on AMDF and ACF. International conference on acoustics, speech and signal processing proceedings. Toulouse (pp. 377–380). Li, H., Dai, B., & Lu, W. (2006). A pitch detection algorithm based on AMDF and ACF. International conference on acoustics, speech and signal processing proceedings. Toulouse (pp. 377–380).
go back to reference Lyon, R. F., Katsiamis, A. G., & Drakakis, E. M. (2010). History and future of auditory filter models. Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) (pp. 3809–3820). Lyon, R. F., Katsiamis, A. G., & Drakakis, E. M. (2010). History and future of auditory filter models. Proceedings of 2010 IEEE international symposium on circuits and systems (ISCAS) (pp. 3809–3820).
go back to reference Mahmoodzadeh, A., Abutalebi, H. R., Soltanian-Zadeh, H., Sheikhzadeh, H. (2012). Single channel speech separation with a frame-based pitch range estimation method in modulation frequency. International symposium on telecommunications (pp. 609–613). Mahmoodzadeh, A., Abutalebi, H. R., Soltanian-Zadeh, H., Sheikhzadeh, H. (2012). Single channel speech separation with a frame-based pitch range estimation method in modulation frequency. International symposium on telecommunications (pp. 609–613).
go back to reference Mallat, S. (1999). A wavelet tour of signal processing. San Diego: Academic Press.MATH Mallat, S. (1999). A wavelet tour of signal processing. San Diego: Academic Press.MATH
go back to reference Meddis, R., Lopez-Poveda, E. A., Fay, R. R., & Popper, A. N. (2010). Computational models of the auditory system., Springer Handbook of Auditory Research New York: Springer.CrossRef Meddis, R., Lopez-Poveda, E. A., Fay, R. R., & Popper, A. N. (2010). Computational models of the auditory system., Springer Handbook of Auditory Research New York: Springer.CrossRef
go back to reference Meddis, R., & O’Mard, L. (1997). A unitary model for pitch perception. Journal of the Acoustic Society of America, 102, 1811–1820.CrossRef Meddis, R., & O’Mard, L. (1997). A unitary model for pitch perception. Journal of the Acoustic Society of America, 102, 1811–1820.CrossRef
go back to reference Meyer, G., Plante, F., & Ainsworth, W. A. (1995). A pitch extraction reference database. 4th European Conference on Speech Communication and Technology. EUROSPEECH’95, Madrid, pp. 837–840. Meyer, G., Plante, F., & Ainsworth, W. A. (1995). A pitch extraction reference database. 4th European Conference on Speech Communication and Technology. EUROSPEECH’95, Madrid, pp. 837–840.
go back to reference Muller, M., Ellis, D., Klapuri, A., & Richard, G. (2011). Signal processing for music analysis. IEEE Journal of Selected Topics in Signal Processing, 5, 1088–1110.CrossRef Muller, M., Ellis, D., Klapuri, A., & Richard, G. (2011). Signal processing for music analysis. IEEE Journal of Selected Topics in Signal Processing, 5, 1088–1110.CrossRef
go back to reference Patterson, R. D., Unoki, M., & Irino, T. (2003). Extending the domain of centre frequencies for the compressive gammachirp auditory filter. Journal of the Acoustic Society of America, 114, 1529–1570.CrossRef Patterson, R. D., Unoki, M., & Irino, T. (2003). Extending the domain of centre frequencies for the compressive gammachirp auditory filter. Journal of the Acoustic Society of America, 114, 1529–1570.CrossRef
go back to reference Prasanna, S. R. M., & Yegnanarayana, B. (2004). Extraction of pitch in adverse conditions. International conference on acoustics, speech and signal processing proceedings (pp. 109–112). Prasanna, S. R. M., & Yegnanarayana, B. (2004). Extraction of pitch in adverse conditions. International conference on acoustics, speech and signal processing proceedings (pp. 109–112).
go back to reference Roy, S. J., Molla, M. K. I., Hirose, K., & Hasan, M. K. (2011). Harmonic modification and data adaptive filtering based approach to robust pitch estimation. International Journal of Speech Technology, 14, 339–349.CrossRef Roy, S. J., Molla, M. K. I., Hirose, K., & Hasan, M. K. (2011). Harmonic modification and data adaptive filtering based approach to robust pitch estimation. International Journal of Speech Technology, 14, 339–349.CrossRef
go back to reference Shahnaz, C., Zhu, W. P., & Ahmad, M. O. (2007). A robust pitch estimation algorithm in noise. International conference on acoustics, speech and signal processing proceedings (pp. 1037–1076). Shahnaz, C., Zhu, W. P., & Ahmad, M. O. (2007). A robust pitch estimation algorithm in noise. International conference on acoustics, speech and signal processing proceedings (pp. 1037–1076).
go back to reference Shahnaz, C. Zhu, W. P., & Ahmad, M. O. (2008). A pitch extraction algorithm in noise based on temporal and spectral representations. International conference on acoustics, speech and signal processing proceedings (pp. 4477–4480). Shahnaz, C. Zhu, W. P., & Ahmad, M. O. (2008). A pitch extraction algorithm in noise based on temporal and spectral representations. International conference on acoustics, speech and signal processing proceedings (pp. 4477–4480).
go back to reference Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9, 727–730.CrossRef Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions on Speech and Audio Processing, 9, 727–730.CrossRef
go back to reference Sun, X. (2000). A pitch determination algorithm based on subharmonic-to-harmonic ratio. International conference on spoken language processing proceedings (pp. 676–679). Beijing. Sun, X. (2000). A pitch determination algorithm based on subharmonic-to-harmonic ratio. International conference on spoken language processing proceedings (pp. 676–679). Beijing.
go back to reference Tolonen, M., & Karjalainen, M. (2000). A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Process, 8, 708–716.CrossRef Tolonen, M., & Karjalainen, M. (2000). A computationally efficient multipitch analysis model. IEEE Transactions on Speech and Audio Process, 8, 708–716.CrossRef
go back to reference Van Immerseel, L. M., & Martens, J. P. (1992). Pitch and voiced/unvoiced determination with an auditory model. Journal of the Acoustic Society of America, 91, 3511–3526.CrossRef Van Immerseel, L. M., & Martens, J. P. (1992). Pitch and voiced/unvoiced determination with an auditory model. Journal of the Acoustic Society of America, 91, 3511–3526.CrossRef
go back to reference Varga, A. (1993). Assessment for automatic speech recognition: II. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Elsevier Speech Communication, 12, 247–251.CrossRef Varga, A. (1993). Assessment for automatic speech recognition: II. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems. Elsevier Speech Communication, 12, 247–251.CrossRef
go back to reference Wang, D. L., & Brown, G. J. (2006). Principles, computational auditory scene analysis: Algorithms, and applications. Hoboken, NJ: Wiley/IEEE Press.CrossRef Wang, D. L., & Brown, G. J. (2006). Principles, computational auditory scene analysis: Algorithms, and applications. Hoboken, NJ: Wiley/IEEE Press.CrossRef
go back to reference Xu, Y., Weaver, J., Healy, D., & Lu, J. (1994). Wavelet transform domain filters: A spatially selective noise filtration technique. IEEE Transactions on Image Processing, 3, 747–758.CrossRef Xu, Y., Weaver, J., Healy, D., & Lu, J. (1994). Wavelet transform domain filters: A spatially selective noise filtration technique. IEEE Transactions on Image Processing, 3, 747–758.CrossRef
Metadata
Title
Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction
Authors
Mohamed Anouar Ben Messaoud
Aïcha Bouzid
Publication date
28-11-2015
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2016
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9325-1

Other articles of this Issue 1/2016

International Journal of Speech Technology 1/2016 Go to the issue