Top

Microsystem Technologies

Published in:

25-10-2018 | Technical Paper

Detection of vowel-like speech: an efficient hardware architecture and it's FPGA prototype

Authors: Nagapuri Srinivas, Gayadhar Pradhan, Puli Kishore Kumar

Published in: Microsystem Technologies | Issue 4/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

In this paper, a robust vowel-like speech (VLS) detection method using modified non-local means normalization factor (MNLM-NF) and it’s FPGA prototype is proposed. In the original NLM algorithm, at each instant of time, the NLM-NF is estimated by accumulating the weight values (WVs) computed over the search neighborhood. During the computation of WVs, one frame is kept as fixed while the other frame is slided over the search neighborhood. In this approach, each WV is computed by first accumulating the square of the difference between the signal amplitudes belonging to two different analysis frames and non-linearly mapping by using negative exponential function. The exponential operation for finding WVs requires significantly more hardware and delay the overall process. To address this issue, in this paper, first the WVs are computed without negative exponential operation. The MNLM-NF is then computed by mapping the accumulated WVs one time using negative exponential function. The MNLM-NF have same nature as the original NLM-NF. The MNLM-NF used as frond-feature for detecting VLS. The experimental results presented on the TIMIT database show that the proposed approach provides significantly improved performance in terms of identification rate and spurious rate when compared to the state-of-the art VLS detection methods. The hardware architecture of the proposed method is designed and verified by implementing it on Virtex-7(\(xc7vx690tffg1761-2\)) FPGA using Xilinx system generator.

previous article Unsteady nonlinear convection on Eyring–Powell radiated flow with suspended graphene and dust particles

next article Nanoparticle delivery through single walled carbon nanotube subjected to various boundary conditions

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Model Simul 4(2):490–530MathSciNetCrossRefMATH

Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2):490–530MathSciNetCrossRefMATH

Daqrouq K, Tutunji TA (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl Soft Comput 27:231–239CrossRef

Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput 99:1–15

Garofolo J, Lamel L, Fisher W, Fiscus J, Pallett D, Dahlgren N, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus LDC93S1, vol 33. Linguistic Data Consortium

Hermes DJ (1990) Vowel onset detection. J Acoust Soc Am 87(2):866–873CrossRef

Kim LW, Asaad S, Linsker R (2014) A fully pipelined FPGA architecture of a factored restricted boltzmann machine artificial neural network. ACM Trans Reconfigurable Technol Syst 7(1):1–23CrossRef

Kumar A, Pradhan G (2018) Detection of vowel onset and offset points using non-local similarity between dwt approximation coefficients. Electron Lett 54(11):722–724CrossRef

Kumar A, Shahnawazuddin S, Pradhan G (2016a) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proc. National Conf. on Communication (NCC), pp 1–5

Kumar A, Shahnawazuddin S, Pradhan G (2016b) Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits Syst Signal Process 36:1–26MathSciNet

Kumar A, Shahnawazuddin S, Pradhan G (2017) Non-local estimation of speech signal for vowel onset point detection in varied environments. In: Proc. INTERSPEECH, pp 429–433

Monmasson E, Idkhajine L, Cirstea MN, Bahri I, Tisan A, Naouar MW (2011) FPGAs in industrial control applications. IEEE Trans Ind Inform 7(2):224–243CrossRef

Ortega-Zamorano F, Jerez JM, Franco L (2014) FPGA implementation of the c-mantec neural network constructive algorithm. IEEE Trans Ind Inform 10(2):1154–1161CrossRef

Ortega-Zamorano F, Jerez JM, Munoz DU, Luque-Baena RM, Franco L (2016) Efficient implementation of the backpropagation algorithm in FPGAs and microcontrollers. IEEE Trans Neural Netw Learn Syst 27(9):1840–1850MathSciNetCrossRef

Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18CrossRef

Pinto SJ, Panda G, Peesapati R (2017) An implementation of hybrid control strategy for distributed generation system interface using xilinx system generator. IEEE Trans Ind Inform 13(5):2735–2745CrossRef

Pradhan G, Haris B, Prasanna SRM, Sinha R (2012) Speaker verification in sensor and acoustic environment mismatch conditions. Int J Speech Technol 15(3):381–392CrossRef

Pradhan G, Prasanna SRM (2013) Speaker verification by vowel and nonvowel like segmentation. IEEE Trans Audio Speech Lang Process 21(4):854–867CrossRef

Prasanna SRM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565CrossRef

Prasanna SRM, Reddy BVS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565CrossRef

Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information. In: Proc. Interspeech, pp 1133–1136

Rao KS, Vuppala AK (2013) Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun 55(6):745–756CrossRef

Reddy BS, Rao KV, Prasanna SRM (2008) Keyword spotting using vowel onset point, vector quantization and hidden markov modeling based techniques. In: Proc. TENCON, pp 1–4

Redif S, Kasap S (2015) Novel reconfigurable hardware architecture for polynomial matrix multiplications. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(3):454–465

Sabine S, Wenke V, Uwe S (2011) Vowel articulation in parkinson’s disease. J Voice 25(4):467–472CrossRef

Sakshi S, Kumar A, Pradhan G (2018) Analysis of variational mode functions for robust detection of vowels. Proc Interspeech 2018:756–760CrossRef

Singh P, Pradhan G (2018) Exploring the non-local similarity present in variational mode functions for effective ECG denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 861–865. IEEE

Singh P, Pradhan G, Shahnawazuddin S (2017) Denoising of ecg signal by non-local estimation of approximation coefficients in dwt. Biocybern Biomed Eng 37(3):599–610CrossRef

Singh P, Shahnawazuddin S, Pradhan G (2018) An efficient ecg denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst Signal Process:1–21

Srinivas N, Pradhan G, Kumar PK (2018) An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63:185–195CrossRef

Srinivas N, Pradhan G, Shahnawazuddin S (2018) Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018:1156–1160CrossRef

Stefan S, Lucas GM, Gratch J, Rizzo AS, Morency LP (2016) Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews. IEEE Trans Affect Comput 7(1):59–73CrossRef

Stevens KN (2000) Acoustic phonetics. The MIT Press Cambridge, London

Themistocleous C (2017) Dialect classification using vowel acoustic parameters. Speech Commun 92:13–22CrossRef

Tiwari VK, Jain SK (2016) Hardware implementation of polyphase-decomposition-based wavelet filters for power system harmonics estimation. IEEE Trans Instrum Meas 65(7):1585–1595CrossRef

Tracey BH, Miller EL (2012) Nonlocal means denoising of ECG signals. IEEE Trans Biomed Eng 59(9):2383–2386CrossRef

Van De Ville D, Kocher M (2009) Sure-based non-local means. IEEE Signal Process Lett 16(11):973–976CrossRef

Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Commun 12(3):247–251

Vuppala A, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903CrossRef

Vuppala AK, Rao KS, Chakrabarti S (2011) Improved consonant-vowel recognition for low bit-rate coded speech. Int J Adapt Control Signal Process 26(4):333–349CrossRef

Wang J, Hu C, Hung S, Lee J (1991) A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans Signal Process 39(9):2141–2146CrossRef

Wolfe V, Cornell R, Fitch J (1995) Sentence/vowel correlation in the evaluation of dysphonia. J Voice 9(3):297–303CrossRef

Title: Detection of vowel-like speech: an efficient hardware architecture and it's FPGA prototype
Authors: Nagapuri Srinivas
Gayadhar Pradhan
Puli Kishore Kumar
Publication date: 25-10-2018
Publisher: Springer Berlin Heidelberg
Published in: Microsystem Technologies / Issue 4/2019
Print ISSN: 0946-7076
Electronic ISSN: 1432-1858
DOI: https://doi.org/10.1007/s00542-018-4192-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Other articles of this Issue 4/2019

Study on formation control system for underwater spherical multi-robot

Exact analytical solution of a homogeneous anisotropic piezo-thermoelasic half-space of a hexagonal type under different fields with three theories

Unsteady nonlinear convection on Eyring–Powell radiated flow with suspended graphene and dust particles

Comment on the paper “Microsystem Technologies: (2018) 24:2919–2928”

Peristaltic pumping with double diffusive natural convective nanofluid in a lopsided channel with accounting thermophoresis and Brownian moment

Dual-wavelength transmission system using double micro-resonator system for EMI healthcare applications