Skip to main content
Top
Published in: International Journal of Speech Technology 1/2016

24-11-2015

Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features

Authors: Mohamed Y. Abbass, Amr M. Gody, Safy A. Shehata, Tamer M. Baraket, Said S. Haggag

Published in: International Journal of Speech Technology | Issue 1/2016

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Over the past several years there has been considerable attention focused on coding and enhancement of speech signals. This interest is progressed towards the development of new techniques capable of producing good quality speech at the output. Speech coding is a process of converting human speech into efficient encoded representations that can be decoded to produce a close approximation of the original signal. This paper deals with the problem of speech coding. It proposes novel approach called Best Tree Encoding (BTE) to encode the wavelet packet Best Tree Structure into a vector of four elements. This research is introducing BTE for solving another problem for speech compression and syntheses. Tree node data coefficients are encoded using LPC Filters and trigonometric features. The encoded vector consists of 4 elements from BTE analysis as well as LPC and trigonometric vector for each leaf node. The quality of the reproduced speech is evaluated for both understanding and quality. The quality of speech signal is measured on the basis of signal to noise ratio, log likelihood ratio, and spectral distortion.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Childers, D. G. (2000). Speech processing and synthesis toolboxes. New York: Wiley. Childers, D. G. (2000). Speech processing and synthesis toolboxes. New York: Wiley.
go back to reference Chun-Lin, L. (2010). A tutorial of the wavelet transform. Chun-Lin, L. (2010). A tutorial of the wavelet transform.
go back to reference Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718.CrossRefMATH Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718.CrossRefMATH
go back to reference Gody, A. M., AbulSeoud, R. A., & Elmaghraby, E. E. (2012). Automatic speech recognition of Arabic Phones using optimal-depth-split-energy best tree encoding. In The twelfth conference on language engineering (pp. 144–156), Ain-Shams University, December 2012, Cairo, Egypt. Gody, A. M., AbulSeoud, R. A., & Elmaghraby, E. E. (2012). Automatic speech recognition of Arabic Phones using optimal-depth-split-energy best tree encoding. In The twelfth conference on language engineering (pp. 144–156), Ain-Shams University, December 2012, Cairo, Egypt.
go back to reference Hammam, H., & Abu, A. E. (2010). El-Azm and M. E. El Halawany, Blind separation of audio signals using trigonometric transforms and wavelet denoising. International Journal of Speech Technology, 13(1), 1–12.CrossRef Hammam, H., & Abu, A. E. (2010). El-Azm and M. E. El Halawany, Blind separation of audio signals using trigonometric transforms and wavelet denoising. International Journal of Speech Technology, 13(1), 1–12.CrossRef
go back to reference Haydar, A., Demirekler, M., & Yurtseven, M. K. (1998). Speaker identification through use of features selected using genetic algorithm. Electronics Letters, 34, 39–40.CrossRef Haydar, A., Demirekler, M., & Yurtseven, M. K. (1998). Speaker identification through use of features selected using genetic algorithm. Electronics Letters, 34, 39–40.CrossRef
go back to reference Joder, C., Weninger, F., Eyben, F., Virette, D., & Schuller, B. (2012). Real-time speech separation by semi-supervised nonnegative matrix factorization. In LVA/ICA (pp. 322–329). Joder, C., Weninger, F., Eyben, F., Virette, D., & Schuller, B. (2012). Real-time speech separation by semi-supervised nonnegative matrix factorization. In LVA/ICA (pp. 322–329).
go back to reference Karam, J. (2008). End point detection for wavelet based speech compression. International Journal of Biological and Life Sciences, 27, 167–170. Karam, J. (2008). End point detection for wavelet based speech compression. International Journal of Biological and Life Sciences, 27, 167–170.
go back to reference Khademul, I. M., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 893–900.CrossRef Khademul, I. M., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 893–900.CrossRef
go back to reference Loizou, P. C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC Press. Loizou, P. C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC Press.
go back to reference Loizou, P. C. (2008). A geometric approach to spectral subtraction. Speech Communication, 50, 453–466.CrossRef Loizou, P. C. (2008). A geometric approach to spectral subtraction. Speech Communication, 50, 453–466.CrossRef
go back to reference Lou, X., & Loparo, K. A. (2004). Bearing fault diagnosis on wavelet transform and fuzzy inference. Mechanical System and Signal Processing, 18, 1077–1095.CrossRef Lou, X., & Loparo, K. A. (2004). Bearing fault diagnosis on wavelet transform and fuzzy inference. Mechanical System and Signal Processing, 18, 1077–1095.CrossRef
go back to reference Lung, S. Y. (2006). Wavelet feature selection based neural networks with application to the text independent speaker identification. Pattern Recognition, 39, 1518–1521.CrossRefMATH Lung, S. Y. (2006). Wavelet feature selection based neural networks with application to the text independent speaker identification. Pattern Recognition, 39, 1518–1521.CrossRefMATH
go back to reference Martucci, S., Sodagar, I., Chang, T., & Zhang, Y. (1997). A zero tree wavelet video coder. IEEE Transactions on Circuits and Systems for Video Technology, 7(1), 109–118.CrossRef Martucci, S., Sodagar, I., Chang, T., & Zhang, Y. (1997). A zero tree wavelet video coder. IEEE Transactions on Circuits and Systems for Video Technology, 7(1), 109–118.CrossRef
go back to reference Memon, Q., & Kasparis, T. (1997). transform coding of signals using approximate trigonometric expansions. Journal of Electronic Imaging, 6(4), 494–503.CrossRef Memon, Q., & Kasparis, T. (1997). transform coding of signals using approximate trigonometric expansions. Journal of Electronic Imaging, 6(4), 494–503.CrossRef
go back to reference Perez-Meana, H. (2007). Advances in audio and speech signal processing: Technologies and applications. Hershey: IGI Global.CrossRef Perez-Meana, H. (2007). Advances in audio and speech signal processing: Technologies and applications. Hershey: IGI Global.CrossRef
go back to reference Shinde, V. D., Patil, C. G., & Sachin, S. D. (2012). Wavelet based multi-scale principal component analysis for speech enhancement. International Journal of Engineering Trends and Technology, 3, 397–400. Shinde, V. D., Patil, C. G., & Sachin, S. D. (2012). Wavelet based multi-scale principal component analysis for speech enhancement. International Journal of Engineering Trends and Technology, 3, 397–400.
go back to reference Sroka, J. J., & Braida, L. D. (2005). Human and machine consonant recognition. Speech Communication, 45, 401–423.CrossRef Sroka, J. J., & Braida, L. D. (2005). Human and machine consonant recognition. Speech Communication, 45, 401–423.CrossRef
go back to reference Strang, G. (1996). Wavelets and filter banks (pp. 37–86). Wellesley, MA: Wellesley-Cambridge Press. ISBN 0-9614088-7-1.MATH Strang, G. (1996). Wavelets and filter banks (pp. 37–86). Wellesley, MA: Wellesley-Cambridge Press. ISBN 0-9614088-7-1.MATH
go back to reference Vincent, E., Gribonval, R., & Fevotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 14(4), 1462–1469.CrossRef Vincent, E., Gribonval, R., & Fevotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 14(4), 1462–1469.CrossRef
go back to reference Wu, J. D., & Lin, B. F. (2009). Speaker identification using discrete wavelet packet transform technique with irregular decomposition. Expert Systems with Applications, 36, 3136–3143.CrossRefMathSciNet Wu, J. D., & Lin, B. F. (2009). Speaker identification using discrete wavelet packet transform technique with irregular decomposition. Expert Systems with Applications, 36, 3136–3143.CrossRefMathSciNet
go back to reference Wu, J. D., & Ye, S. H. (2009). Driver identification based on voice signal using continuous wavelet transform and artificial neural network techniques. Expert Systems with Applications, 36, 1061–1069.CrossRef Wu, J. D., & Ye, S. H. (2009). Driver identification based on voice signal using continuous wavelet transform and artificial neural network techniques. Expert Systems with Applications, 36, 1061–1069.CrossRef
go back to reference Yeung, R. W. (2002). A first course in information theory. New York: Kluwer/Plenum Publishers.CrossRef Yeung, R. W. (2002). A first course in information theory. New York: Kluwer/Plenum Publishers.CrossRef
Metadata
Title
Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features
Authors
Mohamed Y. Abbass
Amr M. Gody
Safy A. Shehata
Tamer M. Baraket
Said S. Haggag
Publication date
24-11-2015
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 1/2016
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-015-9323-3

Other articles of this Issue 1/2016

International Journal of Speech Technology 1/2016 Go to the issue