Skip to main content
Erschienen in: Microsystem Technologies 4/2019

25.10.2018 | Technical Paper

Detection of vowel-like speech: an efficient hardware architecture and it's FPGA prototype

verfasst von: Nagapuri Srinivas, Gayadhar Pradhan, Puli Kishore Kumar

Erschienen in: Microsystem Technologies | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, a robust vowel-like speech (VLS) detection method using modified non-local means normalization factor (MNLM-NF) and it’s FPGA prototype is proposed. In the original NLM algorithm, at each instant of time, the NLM-NF is estimated by accumulating the weight values (WVs) computed over the search neighborhood. During the computation of WVs, one frame is kept as fixed while the other frame is slided over the search neighborhood. In this approach, each WV is computed by first accumulating the square of the difference between the signal amplitudes belonging to two different analysis frames and non-linearly mapping by using negative exponential function. The exponential operation for finding WVs requires significantly more hardware and delay the overall process. To address this issue, in this paper, first the WVs are computed without negative exponential operation. The MNLM-NF is then computed by mapping the accumulated WVs one time using negative exponential function. The MNLM-NF have same nature as the original NLM-NF. The MNLM-NF used as frond-feature for detecting VLS. The experimental results presented on the TIMIT database show that the proposed approach provides significantly improved performance in terms of identification rate and spurious rate when compared to the state-of-the art VLS detection methods. The hardware architecture of the proposed method is designed and verified by implementing it on Virtex-7(\(xc7vx690tffg1761-2\)) FPGA using Xilinx system generator.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Model Simul 4(2):490–530MathSciNetCrossRefMATH Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Model Simul 4(2):490–530MathSciNetCrossRefMATH
Zurück zum Zitat Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2):490–530MathSciNetCrossRefMATH Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2):490–530MathSciNetCrossRefMATH
Zurück zum Zitat Daqrouq K, Tutunji TA (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl Soft Comput 27:231–239CrossRef Daqrouq K, Tutunji TA (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl Soft Comput 27:231–239CrossRef
Zurück zum Zitat Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput 99:1–15 Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput 99:1–15
Zurück zum Zitat Garofolo J, Lamel L, Fisher W, Fiscus J, Pallett D, Dahlgren N, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus LDC93S1, vol 33. Linguistic Data Consortium Garofolo J, Lamel L, Fisher W, Fiscus J, Pallett D, Dahlgren N, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus LDC93S1, vol 33. Linguistic Data Consortium
Zurück zum Zitat Hermes DJ (1990) Vowel onset detection. J Acoust Soc Am 87(2):866–873CrossRef Hermes DJ (1990) Vowel onset detection. J Acoust Soc Am 87(2):866–873CrossRef
Zurück zum Zitat Kim LW, Asaad S, Linsker R (2014) A fully pipelined FPGA architecture of a factored restricted boltzmann machine artificial neural network. ACM Trans Reconfigurable Technol Syst 7(1):1–23CrossRef Kim LW, Asaad S, Linsker R (2014) A fully pipelined FPGA architecture of a factored restricted boltzmann machine artificial neural network. ACM Trans Reconfigurable Technol Syst 7(1):1–23CrossRef
Zurück zum Zitat Kumar A, Pradhan G (2018) Detection of vowel onset and offset points using non-local similarity between dwt approximation coefficients. Electron Lett 54(11):722–724CrossRef Kumar A, Pradhan G (2018) Detection of vowel onset and offset points using non-local similarity between dwt approximation coefficients. Electron Lett 54(11):722–724CrossRef
Zurück zum Zitat Kumar A, Shahnawazuddin S, Pradhan G (2016a) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proc. National Conf. on Communication (NCC), pp 1–5 Kumar A, Shahnawazuddin S, Pradhan G (2016a) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proc. National Conf. on Communication (NCC), pp 1–5
Zurück zum Zitat Kumar A, Shahnawazuddin S, Pradhan G (2016b) Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits Syst Signal Process 36:1–26MathSciNet Kumar A, Shahnawazuddin S, Pradhan G (2016b) Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits Syst Signal Process 36:1–26MathSciNet
Zurück zum Zitat Kumar A, Shahnawazuddin S, Pradhan G (2017) Non-local estimation of speech signal for vowel onset point detection in varied environments. In: Proc. INTERSPEECH, pp 429–433 Kumar A, Shahnawazuddin S, Pradhan G (2017) Non-local estimation of speech signal for vowel onset point detection in varied environments. In: Proc. INTERSPEECH, pp 429–433
Zurück zum Zitat Monmasson E, Idkhajine L, Cirstea MN, Bahri I, Tisan A, Naouar MW (2011) FPGAs in industrial control applications. IEEE Trans Ind Inform 7(2):224–243CrossRef Monmasson E, Idkhajine L, Cirstea MN, Bahri I, Tisan A, Naouar MW (2011) FPGAs in industrial control applications. IEEE Trans Ind Inform 7(2):224–243CrossRef
Zurück zum Zitat Ortega-Zamorano F, Jerez JM, Franco L (2014) FPGA implementation of the c-mantec neural network constructive algorithm. IEEE Trans Ind Inform 10(2):1154–1161CrossRef Ortega-Zamorano F, Jerez JM, Franco L (2014) FPGA implementation of the c-mantec neural network constructive algorithm. IEEE Trans Ind Inform 10(2):1154–1161CrossRef
Zurück zum Zitat Ortega-Zamorano F, Jerez JM, Munoz DU, Luque-Baena RM, Franco L (2016) Efficient implementation of the backpropagation algorithm in FPGAs and microcontrollers. IEEE Trans Neural Netw Learn Syst 27(9):1840–1850MathSciNetCrossRef Ortega-Zamorano F, Jerez JM, Munoz DU, Luque-Baena RM, Franco L (2016) Efficient implementation of the backpropagation algorithm in FPGAs and microcontrollers. IEEE Trans Neural Netw Learn Syst 27(9):1840–1850MathSciNetCrossRef
Zurück zum Zitat Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18CrossRef Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18CrossRef
Zurück zum Zitat Pinto SJ, Panda G, Peesapati R (2017) An implementation of hybrid control strategy for distributed generation system interface using xilinx system generator. IEEE Trans Ind Inform 13(5):2735–2745CrossRef Pinto SJ, Panda G, Peesapati R (2017) An implementation of hybrid control strategy for distributed generation system interface using xilinx system generator. IEEE Trans Ind Inform 13(5):2735–2745CrossRef
Zurück zum Zitat Pradhan G, Haris B, Prasanna SRM, Sinha R (2012) Speaker verification in sensor and acoustic environment mismatch conditions. Int J Speech Technol 15(3):381–392CrossRef Pradhan G, Haris B, Prasanna SRM, Sinha R (2012) Speaker verification in sensor and acoustic environment mismatch conditions. Int J Speech Technol 15(3):381–392CrossRef
Zurück zum Zitat Pradhan G, Prasanna SRM (2013) Speaker verification by vowel and nonvowel like segmentation. IEEE Trans Audio Speech Lang Process 21(4):854–867CrossRef Pradhan G, Prasanna SRM (2013) Speaker verification by vowel and nonvowel like segmentation. IEEE Trans Audio Speech Lang Process 21(4):854–867CrossRef
Zurück zum Zitat Prasanna SRM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565CrossRef Prasanna SRM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565CrossRef
Zurück zum Zitat Prasanna SRM, Reddy BVS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565CrossRef Prasanna SRM, Reddy BVS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565CrossRef
Zurück zum Zitat Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information. In: Proc. Interspeech, pp 1133–1136 Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information. In: Proc. Interspeech, pp 1133–1136
Zurück zum Zitat Rao KS, Vuppala AK (2013) Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun 55(6):745–756CrossRef Rao KS, Vuppala AK (2013) Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun 55(6):745–756CrossRef
Zurück zum Zitat Reddy BS, Rao KV, Prasanna SRM (2008) Keyword spotting using vowel onset point, vector quantization and hidden markov modeling based techniques. In: Proc. TENCON, pp 1–4 Reddy BS, Rao KV, Prasanna SRM (2008) Keyword spotting using vowel onset point, vector quantization and hidden markov modeling based techniques. In: Proc. TENCON, pp 1–4
Zurück zum Zitat Redif S, Kasap S (2015) Novel reconfigurable hardware architecture for polynomial matrix multiplications. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(3):454–465 Redif S, Kasap S (2015) Novel reconfigurable hardware architecture for polynomial matrix multiplications. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(3):454–465
Zurück zum Zitat Sabine S, Wenke V, Uwe S (2011) Vowel articulation in parkinson’s disease. J Voice 25(4):467–472CrossRef Sabine S, Wenke V, Uwe S (2011) Vowel articulation in parkinson’s disease. J Voice 25(4):467–472CrossRef
Zurück zum Zitat Sakshi S, Kumar A, Pradhan G (2018) Analysis of variational mode functions for robust detection of vowels. Proc Interspeech 2018:756–760CrossRef Sakshi S, Kumar A, Pradhan G (2018) Analysis of variational mode functions for robust detection of vowels. Proc Interspeech 2018:756–760CrossRef
Zurück zum Zitat Singh P, Pradhan G (2018) Exploring the non-local similarity present in variational mode functions for effective ECG denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 861–865. IEEE Singh P, Pradhan G (2018) Exploring the non-local similarity present in variational mode functions for effective ECG denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 861–865. IEEE
Zurück zum Zitat Singh P, Pradhan G, Shahnawazuddin S (2017) Denoising of ecg signal by non-local estimation of approximation coefficients in dwt. Biocybern Biomed Eng 37(3):599–610CrossRef Singh P, Pradhan G, Shahnawazuddin S (2017) Denoising of ecg signal by non-local estimation of approximation coefficients in dwt. Biocybern Biomed Eng 37(3):599–610CrossRef
Zurück zum Zitat Singh P, Shahnawazuddin S, Pradhan G (2018) An efficient ecg denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst Signal Process:1–21 Singh P, Shahnawazuddin S, Pradhan G (2018) An efficient ecg denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst Signal Process:1–21
Zurück zum Zitat Srinivas N, Pradhan G, Kumar PK (2018) An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63:185–195CrossRef Srinivas N, Pradhan G, Kumar PK (2018) An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63:185–195CrossRef
Zurück zum Zitat Srinivas N, Pradhan G, Shahnawazuddin S (2018) Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018:1156–1160CrossRef Srinivas N, Pradhan G, Shahnawazuddin S (2018) Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018:1156–1160CrossRef
Zurück zum Zitat Stefan S, Lucas GM, Gratch J, Rizzo AS, Morency LP (2016) Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews. IEEE Trans Affect Comput 7(1):59–73CrossRef Stefan S, Lucas GM, Gratch J, Rizzo AS, Morency LP (2016) Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews. IEEE Trans Affect Comput 7(1):59–73CrossRef
Zurück zum Zitat Stevens KN (2000) Acoustic phonetics. The MIT Press Cambridge, London Stevens KN (2000) Acoustic phonetics. The MIT Press Cambridge, London
Zurück zum Zitat Themistocleous C (2017) Dialect classification using vowel acoustic parameters. Speech Commun 92:13–22CrossRef Themistocleous C (2017) Dialect classification using vowel acoustic parameters. Speech Commun 92:13–22CrossRef
Zurück zum Zitat Tiwari VK, Jain SK (2016) Hardware implementation of polyphase-decomposition-based wavelet filters for power system harmonics estimation. IEEE Trans Instrum Meas 65(7):1585–1595CrossRef Tiwari VK, Jain SK (2016) Hardware implementation of polyphase-decomposition-based wavelet filters for power system harmonics estimation. IEEE Trans Instrum Meas 65(7):1585–1595CrossRef
Zurück zum Zitat Tracey BH, Miller EL (2012) Nonlocal means denoising of ECG signals. IEEE Trans Biomed Eng 59(9):2383–2386CrossRef Tracey BH, Miller EL (2012) Nonlocal means denoising of ECG signals. IEEE Trans Biomed Eng 59(9):2383–2386CrossRef
Zurück zum Zitat Van De Ville D, Kocher M (2009) Sure-based non-local means. IEEE Signal Process Lett 16(11):973–976CrossRef Van De Ville D, Kocher M (2009) Sure-based non-local means. IEEE Signal Process Lett 16(11):973–976CrossRef
Zurück zum Zitat Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Commun 12(3):247–251 Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Commun 12(3):247–251
Zurück zum Zitat Vuppala A, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903CrossRef Vuppala A, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903CrossRef
Zurück zum Zitat Vuppala AK, Rao KS, Chakrabarti S (2011) Improved consonant-vowel recognition for low bit-rate coded speech. Int J Adapt Control Signal Process 26(4):333–349CrossRef Vuppala AK, Rao KS, Chakrabarti S (2011) Improved consonant-vowel recognition for low bit-rate coded speech. Int J Adapt Control Signal Process 26(4):333–349CrossRef
Zurück zum Zitat Wang J, Hu C, Hung S, Lee J (1991) A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans Signal Process 39(9):2141–2146CrossRef Wang J, Hu C, Hung S, Lee J (1991) A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans Signal Process 39(9):2141–2146CrossRef
Zurück zum Zitat Wolfe V, Cornell R, Fitch J (1995) Sentence/vowel correlation in the evaluation of dysphonia. J Voice 9(3):297–303CrossRef Wolfe V, Cornell R, Fitch J (1995) Sentence/vowel correlation in the evaluation of dysphonia. J Voice 9(3):297–303CrossRef
Metadaten
Titel
Detection of vowel-like speech: an efficient hardware architecture and it's FPGA prototype
verfasst von
Nagapuri Srinivas
Gayadhar Pradhan
Puli Kishore Kumar
Publikationsdatum
25.10.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Microsystem Technologies / Ausgabe 4/2019
Print ISSN: 0946-7076
Elektronische ISSN: 1432-1858
DOI
https://doi.org/10.1007/s00542-018-4192-8

Weitere Artikel der Ausgabe 4/2019

Microsystem Technologies 4/2019 Zur Ausgabe

Neuer Inhalt