Skip to main content
Top
Published in: Microsystem Technologies 4/2019

25-10-2018 | Technical Paper

Detection of vowel-like speech: an efficient hardware architecture and it's FPGA prototype

Authors: Nagapuri Srinivas, Gayadhar Pradhan, Puli Kishore Kumar

Published in: Microsystem Technologies | Issue 4/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, a robust vowel-like speech (VLS) detection method using modified non-local means normalization factor (MNLM-NF) and it’s FPGA prototype is proposed. In the original NLM algorithm, at each instant of time, the NLM-NF is estimated by accumulating the weight values (WVs) computed over the search neighborhood. During the computation of WVs, one frame is kept as fixed while the other frame is slided over the search neighborhood. In this approach, each WV is computed by first accumulating the square of the difference between the signal amplitudes belonging to two different analysis frames and non-linearly mapping by using negative exponential function. The exponential operation for finding WVs requires significantly more hardware and delay the overall process. To address this issue, in this paper, first the WVs are computed without negative exponential operation. The MNLM-NF is then computed by mapping the accumulated WVs one time using negative exponential function. The MNLM-NF have same nature as the original NLM-NF. The MNLM-NF used as frond-feature for detecting VLS. The experimental results presented on the TIMIT database show that the proposed approach provides significantly improved performance in terms of identification rate and spurious rate when compared to the state-of-the art VLS detection methods. The hardware architecture of the proposed method is designed and verified by implementing it on Virtex-7(\(xc7vx690tffg1761-2\)) FPGA using Xilinx system generator.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2):490–530MathSciNetCrossRefMATH Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2):490–530MathSciNetCrossRefMATH
go back to reference Daqrouq K, Tutunji TA (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl Soft Comput 27:231–239CrossRef Daqrouq K, Tutunji TA (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl Soft Comput 27:231–239CrossRef
go back to reference Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput 99:1–15 Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput 99:1–15
go back to reference Garofolo J, Lamel L, Fisher W, Fiscus J, Pallett D, Dahlgren N, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus LDC93S1, vol 33. Linguistic Data Consortium Garofolo J, Lamel L, Fisher W, Fiscus J, Pallett D, Dahlgren N, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus LDC93S1, vol 33. Linguistic Data Consortium
go back to reference Kim LW, Asaad S, Linsker R (2014) A fully pipelined FPGA architecture of a factored restricted boltzmann machine artificial neural network. ACM Trans Reconfigurable Technol Syst 7(1):1–23CrossRef Kim LW, Asaad S, Linsker R (2014) A fully pipelined FPGA architecture of a factored restricted boltzmann machine artificial neural network. ACM Trans Reconfigurable Technol Syst 7(1):1–23CrossRef
go back to reference Kumar A, Pradhan G (2018) Detection of vowel onset and offset points using non-local similarity between dwt approximation coefficients. Electron Lett 54(11):722–724CrossRef Kumar A, Pradhan G (2018) Detection of vowel onset and offset points using non-local similarity between dwt approximation coefficients. Electron Lett 54(11):722–724CrossRef
go back to reference Kumar A, Shahnawazuddin S, Pradhan G (2016a) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proc. National Conf. on Communication (NCC), pp 1–5 Kumar A, Shahnawazuddin S, Pradhan G (2016a) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proc. National Conf. on Communication (NCC), pp 1–5
go back to reference Kumar A, Shahnawazuddin S, Pradhan G (2016b) Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits Syst Signal Process 36:1–26MathSciNet Kumar A, Shahnawazuddin S, Pradhan G (2016b) Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits Syst Signal Process 36:1–26MathSciNet
go back to reference Kumar A, Shahnawazuddin S, Pradhan G (2017) Non-local estimation of speech signal for vowel onset point detection in varied environments. In: Proc. INTERSPEECH, pp 429–433 Kumar A, Shahnawazuddin S, Pradhan G (2017) Non-local estimation of speech signal for vowel onset point detection in varied environments. In: Proc. INTERSPEECH, pp 429–433
go back to reference Monmasson E, Idkhajine L, Cirstea MN, Bahri I, Tisan A, Naouar MW (2011) FPGAs in industrial control applications. IEEE Trans Ind Inform 7(2):224–243CrossRef Monmasson E, Idkhajine L, Cirstea MN, Bahri I, Tisan A, Naouar MW (2011) FPGAs in industrial control applications. IEEE Trans Ind Inform 7(2):224–243CrossRef
go back to reference Ortega-Zamorano F, Jerez JM, Franco L (2014) FPGA implementation of the c-mantec neural network constructive algorithm. IEEE Trans Ind Inform 10(2):1154–1161CrossRef Ortega-Zamorano F, Jerez JM, Franco L (2014) FPGA implementation of the c-mantec neural network constructive algorithm. IEEE Trans Ind Inform 10(2):1154–1161CrossRef
go back to reference Ortega-Zamorano F, Jerez JM, Munoz DU, Luque-Baena RM, Franco L (2016) Efficient implementation of the backpropagation algorithm in FPGAs and microcontrollers. IEEE Trans Neural Netw Learn Syst 27(9):1840–1850MathSciNetCrossRef Ortega-Zamorano F, Jerez JM, Munoz DU, Luque-Baena RM, Franco L (2016) Efficient implementation of the backpropagation algorithm in FPGAs and microcontrollers. IEEE Trans Neural Netw Learn Syst 27(9):1840–1850MathSciNetCrossRef
go back to reference Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18CrossRef Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18CrossRef
go back to reference Pinto SJ, Panda G, Peesapati R (2017) An implementation of hybrid control strategy for distributed generation system interface using xilinx system generator. IEEE Trans Ind Inform 13(5):2735–2745CrossRef Pinto SJ, Panda G, Peesapati R (2017) An implementation of hybrid control strategy for distributed generation system interface using xilinx system generator. IEEE Trans Ind Inform 13(5):2735–2745CrossRef
go back to reference Pradhan G, Haris B, Prasanna SRM, Sinha R (2012) Speaker verification in sensor and acoustic environment mismatch conditions. Int J Speech Technol 15(3):381–392CrossRef Pradhan G, Haris B, Prasanna SRM, Sinha R (2012) Speaker verification in sensor and acoustic environment mismatch conditions. Int J Speech Technol 15(3):381–392CrossRef
go back to reference Pradhan G, Prasanna SRM (2013) Speaker verification by vowel and nonvowel like segmentation. IEEE Trans Audio Speech Lang Process 21(4):854–867CrossRef Pradhan G, Prasanna SRM (2013) Speaker verification by vowel and nonvowel like segmentation. IEEE Trans Audio Speech Lang Process 21(4):854–867CrossRef
go back to reference Prasanna SRM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565CrossRef Prasanna SRM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565CrossRef
go back to reference Prasanna SRM, Reddy BVS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565CrossRef Prasanna SRM, Reddy BVS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565CrossRef
go back to reference Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information. In: Proc. Interspeech, pp 1133–1136 Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information. In: Proc. Interspeech, pp 1133–1136
go back to reference Rao KS, Vuppala AK (2013) Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun 55(6):745–756CrossRef Rao KS, Vuppala AK (2013) Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun 55(6):745–756CrossRef
go back to reference Reddy BS, Rao KV, Prasanna SRM (2008) Keyword spotting using vowel onset point, vector quantization and hidden markov modeling based techniques. In: Proc. TENCON, pp 1–4 Reddy BS, Rao KV, Prasanna SRM (2008) Keyword spotting using vowel onset point, vector quantization and hidden markov modeling based techniques. In: Proc. TENCON, pp 1–4
go back to reference Redif S, Kasap S (2015) Novel reconfigurable hardware architecture for polynomial matrix multiplications. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(3):454–465 Redif S, Kasap S (2015) Novel reconfigurable hardware architecture for polynomial matrix multiplications. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(3):454–465
go back to reference Sabine S, Wenke V, Uwe S (2011) Vowel articulation in parkinson’s disease. J Voice 25(4):467–472CrossRef Sabine S, Wenke V, Uwe S (2011) Vowel articulation in parkinson’s disease. J Voice 25(4):467–472CrossRef
go back to reference Sakshi S, Kumar A, Pradhan G (2018) Analysis of variational mode functions for robust detection of vowels. Proc Interspeech 2018:756–760CrossRef Sakshi S, Kumar A, Pradhan G (2018) Analysis of variational mode functions for robust detection of vowels. Proc Interspeech 2018:756–760CrossRef
go back to reference Singh P, Pradhan G (2018) Exploring the non-local similarity present in variational mode functions for effective ECG denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 861–865. IEEE Singh P, Pradhan G (2018) Exploring the non-local similarity present in variational mode functions for effective ECG denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 861–865. IEEE
go back to reference Singh P, Pradhan G, Shahnawazuddin S (2017) Denoising of ecg signal by non-local estimation of approximation coefficients in dwt. Biocybern Biomed Eng 37(3):599–610CrossRef Singh P, Pradhan G, Shahnawazuddin S (2017) Denoising of ecg signal by non-local estimation of approximation coefficients in dwt. Biocybern Biomed Eng 37(3):599–610CrossRef
go back to reference Singh P, Shahnawazuddin S, Pradhan G (2018) An efficient ecg denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst Signal Process:1–21 Singh P, Shahnawazuddin S, Pradhan G (2018) An efficient ecg denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst Signal Process:1–21
go back to reference Srinivas N, Pradhan G, Kumar PK (2018) An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63:185–195CrossRef Srinivas N, Pradhan G, Kumar PK (2018) An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63:185–195CrossRef
go back to reference Srinivas N, Pradhan G, Shahnawazuddin S (2018) Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018:1156–1160CrossRef Srinivas N, Pradhan G, Shahnawazuddin S (2018) Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018:1156–1160CrossRef
go back to reference Stefan S, Lucas GM, Gratch J, Rizzo AS, Morency LP (2016) Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews. IEEE Trans Affect Comput 7(1):59–73CrossRef Stefan S, Lucas GM, Gratch J, Rizzo AS, Morency LP (2016) Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews. IEEE Trans Affect Comput 7(1):59–73CrossRef
go back to reference Stevens KN (2000) Acoustic phonetics. The MIT Press Cambridge, London Stevens KN (2000) Acoustic phonetics. The MIT Press Cambridge, London
go back to reference Themistocleous C (2017) Dialect classification using vowel acoustic parameters. Speech Commun 92:13–22CrossRef Themistocleous C (2017) Dialect classification using vowel acoustic parameters. Speech Commun 92:13–22CrossRef
go back to reference Tiwari VK, Jain SK (2016) Hardware implementation of polyphase-decomposition-based wavelet filters for power system harmonics estimation. IEEE Trans Instrum Meas 65(7):1585–1595CrossRef Tiwari VK, Jain SK (2016) Hardware implementation of polyphase-decomposition-based wavelet filters for power system harmonics estimation. IEEE Trans Instrum Meas 65(7):1585–1595CrossRef
go back to reference Tracey BH, Miller EL (2012) Nonlocal means denoising of ECG signals. IEEE Trans Biomed Eng 59(9):2383–2386CrossRef Tracey BH, Miller EL (2012) Nonlocal means denoising of ECG signals. IEEE Trans Biomed Eng 59(9):2383–2386CrossRef
go back to reference Van De Ville D, Kocher M (2009) Sure-based non-local means. IEEE Signal Process Lett 16(11):973–976CrossRef Van De Ville D, Kocher M (2009) Sure-based non-local means. IEEE Signal Process Lett 16(11):973–976CrossRef
go back to reference Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Commun 12(3):247–251 Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Commun 12(3):247–251
go back to reference Vuppala A, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903CrossRef Vuppala A, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903CrossRef
go back to reference Vuppala AK, Rao KS, Chakrabarti S (2011) Improved consonant-vowel recognition for low bit-rate coded speech. Int J Adapt Control Signal Process 26(4):333–349CrossRef Vuppala AK, Rao KS, Chakrabarti S (2011) Improved consonant-vowel recognition for low bit-rate coded speech. Int J Adapt Control Signal Process 26(4):333–349CrossRef
go back to reference Wang J, Hu C, Hung S, Lee J (1991) A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans Signal Process 39(9):2141–2146CrossRef Wang J, Hu C, Hung S, Lee J (1991) A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans Signal Process 39(9):2141–2146CrossRef
go back to reference Wolfe V, Cornell R, Fitch J (1995) Sentence/vowel correlation in the evaluation of dysphonia. J Voice 9(3):297–303CrossRef Wolfe V, Cornell R, Fitch J (1995) Sentence/vowel correlation in the evaluation of dysphonia. J Voice 9(3):297–303CrossRef
Metadata
Title
Detection of vowel-like speech: an efficient hardware architecture and it's FPGA prototype
Authors
Nagapuri Srinivas
Gayadhar Pradhan
Puli Kishore Kumar
Publication date
25-10-2018
Publisher
Springer Berlin Heidelberg
Published in
Microsystem Technologies / Issue 4/2019
Print ISSN: 0946-7076
Electronic ISSN: 1432-1858
DOI
https://doi.org/10.1007/s00542-018-4192-8

Other articles of this Issue 4/2019

Microsystem Technologies 4/2019 Go to the issue