Top

International Journal of Speech Technology

Published in:

24-11-2015

Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features

Authors: Mohamed Y. Abbass, Amr M. Gody, Safy A. Shehata, Tamer M. Baraket, Said S. Haggag

Published in: International Journal of Speech Technology | Issue 1/2016

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Over the past several years there has been considerable attention focused on coding and enhancement of speech signals. This interest is progressed towards the development of new techniques capable of producing good quality speech at the output. Speech coding is a process of converting human speech into efficient encoded representations that can be decoded to produce a close approximation of the original signal. This paper deals with the problem of speech coding. It proposes novel approach called Best Tree Encoding (BTE) to encode the wavelet packet Best Tree Structure into a vector of four elements. This research is introducing BTE for solving another problem for speech compression and syntheses. Tree node data coefficients are encoded using LPC Filters and trigonometric features. The encoded vector consists of 4 elements from BTE analysis as well as LPC and trigonometric vector for each leaf node. The quality of the reproduced speech is evaluated for both understanding and quality. The quality of speech signal is measured on the basis of signal to noise ratio, log likelihood ratio, and spectral distortion.

previous article Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement

next article Sub-vector based biometric speaker verification using MLLR super-vector

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Childers, D. G. (2000). Speech processing and synthesis toolboxes. New York: Wiley.

Chun-Lin, L. (2010). A tutorial of the wavelet transform.

Coifman, R. R., & Wickerhauser, M. V. (1992). Entropy-based algorithms for best basis selection. IEEE Transactions on Information Theory, 38(2), 713–718.CrossRefMATH

Cox, R. V., & Kroon, P. (1996). Low bit-rate speech coders for multimedia communication. IEEE Communications Magazine, 34, 34–40. http://www.bell-labs.com.

Gody, A. M., AbulSeoud, R. A., & Elmaghraby, E. E. (2012). Automatic speech recognition of Arabic Phones using optimal-depth-split-energy best tree encoding. In The twelfth conference on language engineering (pp. 144–156), Ain-Shams University, December 2012, Cairo, Egypt.

Hammam, H., & Abu, A. E. (2010). El-Azm and M. E. El Halawany, Blind separation of audio signals using trigonometric transforms and wavelet denoising. International Journal of Speech Technology, 13(1), 1–12.CrossRef

Haydar, A., Demirekler, M., & Yurtseven, M. K. (1998). Speaker identification through use of features selected using genetic algorithm. Electronics Letters, 34, 39–40.CrossRef

Joder, C., Weninger, F., Eyben, F., Virette, D., & Schuller, B. (2012). Real-time speech separation by semi-supervised nonnegative matrix factorization. In LVA/ICA (pp. 322–329).

Karam, J. (2008). End point detection for wavelet based speech compression. International Journal of Biological and Life Sciences, 27, 167–170.

Khademul, I. M., & Hirose, K. (2007). Single-mixture audio source separation by subspace decomposition of Hilbert spectrum. IEEE Transactions on Audio, Speech and Language Processing, 15(3), 893–900.CrossRef

Loizou, P. C. (2007). Speech enhancement: Theory and practice. Boca Raton, FL: CRC Press.

Loizou, P. C. (2008). A geometric approach to spectral subtraction. Speech Communication, 50, 453–466.CrossRef

Lou, X., & Loparo, K. A. (2004). Bearing fault diagnosis on wavelet transform and fuzzy inference. Mechanical System and Signal Processing, 18, 1077–1095.CrossRef

Lung, S. Y. (2006). Wavelet feature selection based neural networks with application to the text independent speaker identification. Pattern Recognition, 39, 1518–1521.CrossRefMATH

Martucci, S., Sodagar, I., Chang, T., & Zhang, Y. (1997). A zero tree wavelet video coder. IEEE Transactions on Circuits and Systems for Video Technology, 7(1), 109–118.CrossRef

MatLab. http://www.mathworks.com/access/helpdesk/help/toolbox/wavelet/ch06_a11.html.

Memon, Q., & Kasparis, T. (1997). transform coding of signals using approximate trigonometric expansions. Journal of Electronic Imaging, 6(4), 494–503.CrossRef

Perez-Meana, H. (2007). Advances in audio and speech signal processing: Technologies and applications. Hershey: IGI Global.CrossRef

Shinde, V. D., Patil, C. G., & Sachin, S. D. (2012). Wavelet based multi-scale principal component analysis for speech enhancement. International Journal of Engineering Trends and Technology, 3, 397–400.

Sroka, J. J., & Braida, L. D. (2005). Human and machine consonant recognition. Speech Communication, 45, 401–423.CrossRef

Strang, G. (1996). Wavelets and filter banks (pp. 37–86). Wellesley, MA: Wellesley-Cambridge Press. ISBN 0-9614088-7-1.MATH

Vincent, E., Gribonval, R., & Fevotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech and Language Processing, 14(4), 1462–1469.CrossRef

Wu, J. D., & Lin, B. F. (2009). Speaker identification using discrete wavelet packet transform technique with irregular decomposition. Expert Systems with Applications, 36, 3136–3143.CrossRefMathSciNet

Wu, J. D., & Ye, S. H. (2009). Driver identification based on voice signal using continuous wavelet transform and artificial neural network techniques. Expert Systems with Applications, 36, 1061–1069.CrossRef

Yeung, R. W. (2002). A first course in information theory. New York: Kluwer/Plenum Publishers.CrossRef

Title: Speech coding using Best Tree Encoding (BTE) technique based on LPC and trigonometric features
Authors: Mohamed Y. Abbass
Amr M. Gody
Safy A. Shehata
Tamer M. Baraket
Said S. Haggag
Publication date: 24-11-2015
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 1/2016
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-015-9323-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 1/2016

Efficient audio integrity verification algorithm using discrete cosine transform

Automatic speech segmentation in syllable centric speech recognition system

Efficient feature combination techniques for emotional speech classification

Automatic prosodic tone choice classification with Brazil’s intonation model

MFCC-GMM based accent recognition system for Telugu speech signals

Pitch estimation of speech and music sound based on multi-scale product with auditory feature extraction