Top

International Journal of Speech Technology

Published in:

19-09-2017

Pitch segmentation of speech signals based on short-time energy waveform

Authors: Sopon Wiriyarattanakul, Nawapak Eua-anant

Published in: International Journal of Speech Technology | Issue 4/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In general, speech is constituted of quasi-repetitive patterns called pitches representing the speech fundamental period and tonal information of the voice. Extraction of pitch information that is crucial for many speech processing techniques, usually faces a noise problem and interference caused by high-order harmonic components. This paper introduces a novel, noise-robust method for determining speech fundamental frequency and pitch segmentation, based on a short-time energy waveform (SEW), defined as a moving average squared signal. When applying a moving average filter with a window size closed to the fundamental period, nearly repetitive patterns, with fewer ripples, synchronizing with actual pitches can clearly be observed in the SEW. The DC component in the SEW is removed using morphological top-hat and bottom-hat transforms. The fundamental frequency is determined as the frequency corresponding to the largest peak of the power spectrum of the DC-removed SEW. Finally, a time-domain window search is then performed to locate local extrema associated with pitches. Compared to traditional pitch detection techniques, the proposed technique yields pitch segmentation results with a higher rate of accuracy and greater noise robustness.

previous article Modelling speech emotion recognition using logistic regression and decision trees

next article Mono- and multi-lingual depression prediction based on speech processing

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Bereksi-Reguig, F., & Taouli, S. A. (2013). ECG signal denoising by morphological top-hat transform. Global Journal of Computer Science and Technology, 13(5).

Antonios (2012). An improved time domain pitch detection algorithm for pathological voice. American Journal of Applied Sciences, 9(1), 93–102.CrossRef

Chamnongthai, K., Pichitwong, W., & Ayudhya,N. P. (2005). Final consonant segmentation for Thai syllable by using vowel characteristics and wavelet packet transform. ECTI-CIT Transactions on Communications and Information Technology, 1(1), 50–62.

de Cheveigneb, A., & Kawahara, H. (2002). Yin, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111, 1917–1930.CrossRef

Eddins, D. A., Anand, S., Camacho, A., & Shrivastav, R. (2016). Modeling of breathy voice quality using pitch-strength estimates. Journal of Voice, 30(6), 43–52.CrossRef

Gerhard, D. (2002). Pitch extraction and fundamental frequency: History and current techniques. Technical Report TR-CS.

Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., & Khudanpur, S. (2014). A pitch extraction algorithm tuned for automatic speech recognition. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2494–2498).

Huang, Q., Wang, D., & Lu, Y. (2009) Single channel speech enhancement based on prominent pitch estimation. In IET international communication conference on wireless mobile and computing (CCWMC) (pp. 205–208).

Hui, L., Dai, B.-Q., & Wei, L. (2006). A pitch detection algorithm based on AMDF and ACF. In IEEE international conference on acoustics speech and signal processing proceedings (Vol. 1).

Hunt, M., & Lefebvre, C. (1987). Speech recognition using an auditory model with pitch-synchronous analysis. In IEEE international conference on acoustics, speech, and signal processing (ICASSP) (Vol. 12, pp. 813–816).

Hyun, K. H., Kim, E. H., & Kwak, Y. K. (2005). Improvement of emotion recognition by Bayesian classifier using non-zero-pitch concept. In IEEE international workshop on robot and human interactive communication (pp. 312–316).

Jdira, M. B., Jemâa, I., & Ouni, K. (2014). Speaker recognition system based on pitch estimation. In International conference on electrical sciences and technologies (CISTEM) (pp. 1–5).

Kammoun, M., & Ellouze, N. (2006) Pitch and energy contribution in emotion and speaking styles recognition enhancement. In IMACS multiconference on computational engineering in systems applications (Vol. 1, pp. 97–100).

Khulage, A. A. (2012). Extraction of pitch, duration and formant frequencies for emotion recognition system. In Communication and computing (ARTCom2012) (pp. 7–9).

Kim, S., Eriksson, T., Kang, H.-G., & Youn, D. H. (2004). A pitch synchronous feature extraction method for speaker recognition. In IEEE international conference on acoustics, speech, and signal processing, 2004. Proceedings (ICASSP’04) (Vol. 1, p. I-405-8).

Krishnakumar, S., Kumar, K. R. P., & Balakrishnan, N. (2003). Pitch maxima for robust speaker recognition. In IEEE international conference on acoustics, speech, and signal processing (ICASSP) (Vol. 2, p. II-201-4).

Li, D., Yang, Y., & Huang, T. (2009). Pitch envelope based frame level score reweighed algorithm for emotion robust speaker recognition. In 2009 3rd international conference on affective computing and intelligent interaction and workshops (pp. 1–4).

McLaughlin, S., Leith, D., & Mann, I. (2002). Using Gaussian processes to synthesize voiced speech with natural pitch variations. In International conference on digital signal processing.

Muhammad, G. (2010). Noise-robust pitch detection using auto-correlation function with enhancements. Journal of King Saud University Computer and Information Sciences, 22, 13–28.CrossRef

Perez-Pueyo, R., Soneira, M. J., & Ruiz-Moreno, S. (2010). Morphology-based automated baseline removal for Raman spectra of artistic pigments. Applied Spectroscopy, 64(6), 595–600.CrossRef

Qiang, H., & Youwei, Z. (1998). On prefiltering and endpoint detection of speech signal. In International conference on signal processing proceedings (Vol. 1, pp. 749–752).

Rabiner, L. (1977). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech and Signal Processing, 25(1), 24–33.CrossRef

Rabiner, L. R., & Sambur, M. R. (1975). An algorithm for determining the endpoints of isolated utterances. Bell System Technical Journal, 54(2), 297–315.CrossRef

Ramalho, M. A., & Mammone, R. J. (1993). New speech enhancement techniques using the pitch mode modulation model. In Proceedings of the 36th midwest symposium on circuits and systems (Vol. 2, pp. 1531–1534).

Ru-Wei, L., Long-Tao, C., & Yang, L. (2013). Pitch detection method for noisy speech signals based on wavelet transform and autocorrelation function. In Ninth international conference on intelligent information hiding and multimedia signal processing (pp. 153–156).

Shimamura, T. (2010). An efficient pitch estimation method using windowless and normalized autocorrelation functions in noisy environments. ResearchGate, 6(3), 197–204.

Shimamura, T., & Kobayashi, H. (2001). Weighted autocorrelation for pitch extraction of noisy speech. IEEE Transactions Speech and Audio Processing, 9(7), 727–730.CrossRef

Stephenson, T. A., Escofet, J., Magimai-Doss, M., & Bourlard, H. (2002). Dynamic Bayesian network based speech recognition with pitch and energy as auxiliary variables. In Proceedings 12th IEEE workshop on neural networks for signal processing (pp. 637–646).

Sun, Y., Chan, K. L., & Krishnan, S. M. (2002). ECG signal conditioning by morphological filtering. Computers in Biology and Medicine, 32(6), 465–479.CrossRef

Swee, T. T., Salleh, S. H. S., & Jamaludin, M. R. (2010). Speech pitch detection using short-time energy. In International conference on computer and communication engineering (ICCCE) (pp. 1–6).

Tabrikian, J., Dubnov, S., & Dickalov, Y. (2002). Speech enhancement by harmonic modeling via map pitch tracking. In IEEE international conference on acoustics, speech, and signal processing (ICASSP) (Vol. 1, pp. I-549–I-552).

Wang, Y. B., Li, S. W., & s Lee, L. (2006). An experimental analysis on integrating multi-stream spectro-temporal, cepstral and pitch information for mandarin speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 21(10), 2006–2014.CrossRef

Xu, X., Zhang, T. Q, Shi, S., & Zhang, Y. (2014). An improved pitch detection of speech combined with speech enhancement. In 7th international congress on image and signal processing (CISP) (pp. 778–782).

Zhu, J., Sun, S., Liu, X., & Lei, B. (2009). Pitch in speaker recognition. In Ninth international conference on hybrid intelligent systems (Vol. 1, pp. 33–36).

Zilca, R. D., Kingsbury, B., Navratil, J., & Ramaswamy, G. N. (2006). Pseudo pitch synchronous analysis of speech with applications to speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 14(2), 467–478.CrossRef

Title: Pitch segmentation of speech signals based on short-time energy waveform
Authors: Sopon Wiriyarattanakul
Nawapak Eua-anant
Publication date: 19-09-2017
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 4/2017
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-017-9459-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 4/2017

Spoken character classification using abductive network

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

Modelling speech emotion recognition using logistic regression and decision trees

Significance of incorporating excitation source parameters for improved emotion recognition from speech and electroglottographic signals

Power distance and verbal index in Kazakh business discourse

Clean speech/speech with background music classification using HNGD spectrum