nach oben

Erschienen in:

2021 | OriginalPaper | Buchkapitel

Transformation of Voice Signals to Spatial Domain for Code Optimization in Digital Image Processing

verfasst von : Akram Alsubari, Ghanshyam D. Ramteke, Rakesh J. Ramteke

Erschienen in: Recent Trends in Image Processing and Pattern Recognition

Verlag: Springer Singapore

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The paper is intended to transform the voice-signal from the frequency domain into a spatial domain in form of grayscale image and applied the image processing techniques. To satisfy our hypothesis, two models of signal processing were carried out in this research: Speaker Recognition and Signal Segmentation. For applying the image processing techniques on the voice-signal, two methodologies were proposed to convert the signal into grayscale-image: signal-range based and fuzzy-based. The signal-range based is to convert the signal range from (−1 ↔ 1) into (0 ↔ 256). The second method of conversion, Fuzzy Gaussian Membership Function is applied to convert the signal range into (0 ↔ 1), then multiply them by 255 to be in the range of grayscale image. In the Speaker Recognition, the LBP is used as pre-processing for filtering the intensity of the signal image. The HOG is used to extract the features of signal-image. So, the total length of features-vector is 324. The classification learner tool in MATLAB was used for classifying the feature-vectors and the results were found to be satisfactory. The automatic word segmentation was proposed based on thresholding and morphology operators. The segmentation accuracy is 93.67% in the Marathi-language. The highest recognition rate in speaker identification system is 96.9%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Forensic Identification of Birds from Feathers Using Hue and Saturation Histogram

Nächstes Kapitel Automated Disease Identification in Chilli Leaves Using FCM and PSO Techniques

Narasimha, M., Susheela Devi, V.: Pattern Recogniiton: An Algorithm Approach, pp. 1–6. Springer, London (2011). https://doi.org/10.1007/978-0-85729-495-1

Mukherjee, H., Obaidullah, S.M., Santosh, K.C., Phadikar, S., Roy, K.: Line spectral frequency-based features and extreme learning machine for voice activity detection from audio signal. Int. J. Speech Technol. 21(4), 753–760 (2018). https://doi.org/10.1007/s10772-018-9525-6CrossRef

Mukherjee, H., et al.: Deep learning for spoken language identification: Can we visualize speech signal patterns? Neural Comput. Appl. 31(12), 8483–8501 (2019)CrossRef

Zhang, Z., Wang, L., Kai, A.: Distant-talking speaker identification by generalized spectral subtraction-based dereverberation and its efficient computation. EURASIP J. Audio Speech Music Process. 2014(1), 1–12 (2014). https://doi.org/10.1186/1687-4722-2014-15CrossRef

Ramos-Castro, D., Fierrez-Aguilar, J., Gonzalez-Rodriguez, J., Ortega-Garcia, J.: Speaker verification using speaker and text-dependent fast score normalization. Pattern Recognit. Lett. 28, 90–98 (2007)CrossRef

Mathur,S., Choudhary, S.K., Vyas, J.M.: Speaker recognition system and its forensic implications 2(4), 1–6 (2013)

Damper, R.I., Higgins, J.E.: Improving speaker identification in noise by subband processing and decision fusion. Patter Recognition Lett. 24, 2167–2173 (2003)CrossRef

Farrell, K.R., Mammone, R.M., Assaleh, K.T.: Speaker recognition using neural netoworks and conventional classfiers. IEEE Trans. Speech Audio Process. 2(1), 194–205 (1994)CrossRef

Reynolds, D.A.: Speaker identification and verifiation using Gaussian mixture speaker models. Speech Commun. 17, 91–108 (1995)CrossRef

10.

Alsubari, A., Lonkhande, P., Ramteke, R.J.: Fuzzy-based classification for fusion of palmprint and iris biometric traits. In: Bhattacharyya, S., Pal, S.K., Pan, I., Das, A. (eds.) Recent Trends in Signal and Image Processing. AISC, vol. 922, pp. 113–123. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-6783-0_11CrossRef

11.

Ramteke, R.J., Alsubari, A.: Extraction of palmprint texture features using combined DWT-DCT and local binary pattern. In: 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun, pp. 748–753 (2016)

12.

Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005),1063–6919/05 (2005)

13.

Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distribution. Pattern Recognit. 29(1), 51–59 (1996)CrossRef

14.

Alsubari, A., Satange, D.N., Ramteke, R.J.: Facial expression recognition using wavelet transform and local binary pattern. In: 2nd International Conference for Convergence in Technology (I2CT) (2017)

15.

Obuchi, Y.: PDA speech database, carnegie mellon university. https://www.speech.cs.cmu.edu/databases/pda/index.html

Titel: Transformation of Voice Signals to Spatial Domain for Code Optimization in Digital Image Processing
verfasst von: Akram Alsubari
Ghanshyam D. Ramteke
Rakesh J. Ramteke
Verlag: Springer Singapore
Buch: Recent Trends in Image Processing and Pattern Recognition
Print ISBN: 978-981-16-0492-8

Electronic ISBN: 978-981-16-0493-5

Copyright-Jahr: 2021
DOI: https://doi.org/10.1007/978-981-16-0493-5_18

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner