Skip to main content

2021 | OriginalPaper | Buchkapitel

Transformation of Voice Signals to Spatial Domain for Code Optimization in Digital Image Processing

verfasst von : Akram Alsubari, Ghanshyam D. Ramteke, Rakesh J. Ramteke

Erschienen in: Recent Trends in Image Processing and Pattern Recognition

Verlag: Springer Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The paper is intended to transform the voice-signal from the frequency domain into a spatial domain in form of grayscale image and applied the image processing techniques. To satisfy our hypothesis, two models of signal processing were carried out in this research: Speaker Recognition and Signal Segmentation. For applying the image processing techniques on the voice-signal, two methodologies were proposed to convert the signal into grayscale-image: signal-range based and fuzzy-based. The signal-range based is to convert the signal range from (−1 ↔ 1) into (0 ↔ 256). The second method of conversion, Fuzzy Gaussian Membership Function is applied to convert the signal range into (0 ↔ 1), then multiply them by 255 to be in the range of grayscale image. In the Speaker Recognition, the LBP is used as pre-processing for filtering the intensity of the signal image. The HOG is used to extract the features of signal-image. So, the total length of features-vector is 324. The classification learner tool in MATLAB was used for classifying the feature-vectors and the results were found to be satisfactory. The automatic word segmentation was proposed based on thresholding and morphology operators. The segmentation accuracy is 93.67% in the Marathi-language. The highest recognition rate in speaker identification system is 96.9%.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Mukherjee, H., et al.: Deep learning for spoken language identification: Can we visualize speech signal patterns? Neural Comput. Appl. 31(12), 8483–8501 (2019)CrossRef Mukherjee, H., et al.: Deep learning for spoken language identification: Can we visualize speech signal patterns? Neural Comput. Appl. 31(12), 8483–8501 (2019)CrossRef
5.
Zurück zum Zitat Ramos-Castro, D., Fierrez-Aguilar, J., Gonzalez-Rodriguez, J., Ortega-Garcia, J.: Speaker verification using speaker and text-dependent fast score normalization. Pattern Recognit. Lett. 28, 90–98 (2007)CrossRef Ramos-Castro, D., Fierrez-Aguilar, J., Gonzalez-Rodriguez, J., Ortega-Garcia, J.: Speaker verification using speaker and text-dependent fast score normalization. Pattern Recognit. Lett. 28, 90–98 (2007)CrossRef
6.
Zurück zum Zitat Mathur,S., Choudhary, S.K., Vyas, J.M.: Speaker recognition system and its forensic implications 2(4), 1–6 (2013) Mathur,S., Choudhary, S.K., Vyas, J.M.: Speaker recognition system and its forensic implications 2(4), 1–6 (2013)
7.
Zurück zum Zitat Damper, R.I., Higgins, J.E.: Improving speaker identification in noise by subband processing and decision fusion. Patter Recognition Lett. 24, 2167–2173 (2003)CrossRef Damper, R.I., Higgins, J.E.: Improving speaker identification in noise by subband processing and decision fusion. Patter Recognition Lett. 24, 2167–2173 (2003)CrossRef
8.
Zurück zum Zitat Farrell, K.R., Mammone, R.M., Assaleh, K.T.: Speaker recognition using neural netoworks and conventional classfiers. IEEE Trans. Speech Audio Process. 2(1), 194–205 (1994)CrossRef Farrell, K.R., Mammone, R.M., Assaleh, K.T.: Speaker recognition using neural netoworks and conventional classfiers. IEEE Trans. Speech Audio Process. 2(1), 194–205 (1994)CrossRef
9.
Zurück zum Zitat Reynolds, D.A.: Speaker identification and verifiation using Gaussian mixture speaker models. Speech Commun. 17, 91–108 (1995)CrossRef Reynolds, D.A.: Speaker identification and verifiation using Gaussian mixture speaker models. Speech Commun. 17, 91–108 (1995)CrossRef
10.
Zurück zum Zitat Alsubari, A., Lonkhande, P., Ramteke, R.J.: Fuzzy-based classification for fusion of palmprint and iris biometric traits. In: Bhattacharyya, S., Pal, S.K., Pan, I., Das, A. (eds.) Recent Trends in Signal and Image Processing. AISC, vol. 922, pp. 113–123. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-6783-0_11CrossRef Alsubari, A., Lonkhande, P., Ramteke, R.J.: Fuzzy-based classification for fusion of palmprint and iris biometric traits. In: Bhattacharyya, S., Pal, S.K., Pan, I., Das, A. (eds.) Recent Trends in Signal and Image Processing. AISC, vol. 922, pp. 113–123. Springer, Singapore (2019). https://​doi.​org/​10.​1007/​978-981-13-6783-0_​11CrossRef
11.
Zurück zum Zitat Ramteke, R.J., Alsubari, A.: Extraction of palmprint texture features using combined DWT-DCT and local binary pattern. In: 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun, pp. 748–753 (2016) Ramteke, R.J., Alsubari, A.: Extraction of palmprint texture features using combined DWT-DCT and local binary pattern. In: 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun, pp. 748–753 (2016)
12.
Zurück zum Zitat Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005),1063–6919/05 (2005) Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005),1063–6919/05 (2005)
13.
Zurück zum Zitat Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distribution. Pattern Recognit. 29(1), 51–59 (1996)CrossRef Ojala, T., Pietikinen, M., Harwood, D.: A comparative study of texture measures with classification based on featured distribution. Pattern Recognit. 29(1), 51–59 (1996)CrossRef
14.
Zurück zum Zitat Alsubari, A., Satange, D.N., Ramteke, R.J.: Facial expression recognition using wavelet transform and local binary pattern. In: 2nd International Conference for Convergence in Technology (I2CT) (2017) Alsubari, A., Satange, D.N., Ramteke, R.J.: Facial expression recognition using wavelet transform and local binary pattern. In: 2nd International Conference for Convergence in Technology (I2CT) (2017)
Metadaten
Titel
Transformation of Voice Signals to Spatial Domain for Code Optimization in Digital Image Processing
verfasst von
Akram Alsubari
Ghanshyam D. Ramteke
Rakesh J. Ramteke
Copyright-Jahr
2021
Verlag
Springer Singapore
DOI
https://doi.org/10.1007/978-981-16-0493-5_18

Premium Partner