nach oben

Microsystem Technologies

Erschienen in:

25.10.2018 | Technical Paper

Detection of vowel-like speech: an efficient hardware architecture and it's FPGA prototype

verfasst von: Nagapuri Srinivas, Gayadhar Pradhan, Puli Kishore Kumar

Erschienen in: Microsystem Technologies | Ausgabe 4/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, a robust vowel-like speech (VLS) detection method using modified non-local means normalization factor (MNLM-NF) and it’s FPGA prototype is proposed. In the original NLM algorithm, at each instant of time, the NLM-NF is estimated by accumulating the weight values (WVs) computed over the search neighborhood. During the computation of WVs, one frame is kept as fixed while the other frame is slided over the search neighborhood. In this approach, each WV is computed by first accumulating the square of the difference between the signal amplitudes belonging to two different analysis frames and non-linearly mapping by using negative exponential function. The exponential operation for finding WVs requires significantly more hardware and delay the overall process. To address this issue, in this paper, first the WVs are computed without negative exponential operation. The MNLM-NF is then computed by mapping the accumulated WVs one time using negative exponential function. The MNLM-NF have same nature as the original NLM-NF. The MNLM-NF used as frond-feature for detecting VLS. The experimental results presented on the TIMIT database show that the proposed approach provides significantly improved performance in terms of identification rate and spurious rate when compared to the state-of-the art VLS detection methods. The hardware architecture of the proposed method is designed and verified by implementing it on Virtex-7(\(xc7vx690tffg1761-2\)) FPGA using Xilinx system generator.

Vorheriger Artikel Unsteady nonlinear convection on Eyring–Powell radiated flow with suspended graphene and dust particles

Nächster Artikel Nanoparticle delivery through single walled carbon nanotube subjected to various boundary conditions

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Model Simul 4(2):490–530MathSciNetCrossRefMATH

Buades A, Coll B, Morel JM (2005) A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation 4(2):490–530MathSciNetCrossRefMATH

Daqrouq K, Tutunji TA (2015) Speaker identification using vowels features through a combined method of formants, wavelets, and neural network classifiers. Appl Soft Comput 27:231–239CrossRef

Deb S, Dandapat S (2017) Emotion classification using segmentation of vowel-like and non-vowel-like regions. IEEE Trans Affect Comput 99:1–15

Garofolo J, Lamel L, Fisher W, Fiscus J, Pallett D, Dahlgren N, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus LDC93S1, vol 33. Linguistic Data Consortium

Hermes DJ (1990) Vowel onset detection. J Acoust Soc Am 87(2):866–873CrossRef

Kim LW, Asaad S, Linsker R (2014) A fully pipelined FPGA architecture of a factored restricted boltzmann machine artificial neural network. ACM Trans Reconfigurable Technol Syst 7(1):1–23CrossRef

Kumar A, Pradhan G (2018) Detection of vowel onset and offset points using non-local similarity between dwt approximation coefficients. Electron Lett 54(11):722–724CrossRef

Kumar A, Shahnawazuddin S, Pradhan G (2016a) Exploring different acoustic modeling techniques for the detection of vowels in speech signal. In: Proc. National Conf. on Communication (NCC), pp 1–5

Kumar A, Shahnawazuddin S, Pradhan G (2016b) Improvements in the detection of vowel onset and offset points in a speech sequence. Circuits Syst Signal Process 36:1–26MathSciNet

Kumar A, Shahnawazuddin S, Pradhan G (2017) Non-local estimation of speech signal for vowel onset point detection in varied environments. In: Proc. INTERSPEECH, pp 429–433

Monmasson E, Idkhajine L, Cirstea MN, Bahri I, Tisan A, Naouar MW (2011) FPGAs in industrial control applications. IEEE Trans Ind Inform 7(2):224–243CrossRef

Ortega-Zamorano F, Jerez JM, Franco L (2014) FPGA implementation of the c-mantec neural network constructive algorithm. IEEE Trans Ind Inform 10(2):1154–1161CrossRef

Ortega-Zamorano F, Jerez JM, Munoz DU, Luque-Baena RM, Franco L (2016) Efficient implementation of the backpropagation algorithm in FPGAs and microcontrollers. IEEE Trans Neural Netw Learn Syst 27(9):1840–1850MathSciNetCrossRef

Panda SP, Nayak AK (2016) Automatic speech segmentation in syllable centric speech recognition system. Int J Speech Technol 19(1):9–18CrossRef

Pinto SJ, Panda G, Peesapati R (2017) An implementation of hybrid control strategy for distributed generation system interface using xilinx system generator. IEEE Trans Ind Inform 13(5):2735–2745CrossRef

Pradhan G, Haris B, Prasanna SRM, Sinha R (2012) Speaker verification in sensor and acoustic environment mismatch conditions. Int J Speech Technol 15(3):381–392CrossRef

Pradhan G, Prasanna SRM (2013) Speaker verification by vowel and nonvowel like segmentation. IEEE Trans Audio Speech Lang Process 21(4):854–867CrossRef

Prasanna SRM, Pradhan G (2011) Significance of vowel-like regions for speaker verification under degraded conditions. IEEE Trans Audio Speech Lang Process 19(8):2552–2565CrossRef

Prasanna SRM, Reddy BVS, Krishnamoorthy P (2009) Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans Audio Speech Lang Process 17(4):556–565CrossRef

Prasanna SRM, Yegnanarayana B (2005) Detection of vowel onset point events using excitation source information. In: Proc. Interspeech, pp 1133–1136

Rao KS, Vuppala AK (2013) Non-uniform time scale modification using instants of significant excitation and vowel onset points. Speech Commun 55(6):745–756CrossRef

Reddy BS, Rao KV, Prasanna SRM (2008) Keyword spotting using vowel onset point, vector quantization and hidden markov modeling based techniques. In: Proc. TENCON, pp 1–4

Redif S, Kasap S (2015) Novel reconfigurable hardware architecture for polynomial matrix multiplications. IEEE Trans Very Large Scale Integr (VLSI) Syst 23(3):454–465

Sabine S, Wenke V, Uwe S (2011) Vowel articulation in parkinson’s disease. J Voice 25(4):467–472CrossRef

Sakshi S, Kumar A, Pradhan G (2018) Analysis of variational mode functions for robust detection of vowels. Proc Interspeech 2018:756–760CrossRef

Singh P, Pradhan G (2018) Exploring the non-local similarity present in variational mode functions for effective ECG denoising. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 861–865. IEEE

Singh P, Pradhan G, Shahnawazuddin S (2017) Denoising of ecg signal by non-local estimation of approximation coefficients in dwt. Biocybern Biomed Eng 37(3):599–610CrossRef

Singh P, Shahnawazuddin S, Pradhan G (2018) An efficient ecg denoising technique based on non-local means estimation and modified empirical mode decomposition. Circuits Syst Signal Process:1–21

Srinivas N, Pradhan G, Kumar PK (2018) An efficient hardware architecture for detection of vowel-like regions in speech signal. Integration 63:185–195CrossRef

Srinivas N, Pradhan G, Shahnawazuddin S (2018) Enhancement of noisy speech signal by non-local means estimation of variational mode functions. Proc. Interspeech 2018:1156–1160CrossRef

Stefan S, Lucas GM, Gratch J, Rizzo AS, Morency LP (2016) Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews. IEEE Trans Affect Comput 7(1):59–73CrossRef

Stevens KN (2000) Acoustic phonetics. The MIT Press Cambridge, London

Themistocleous C (2017) Dialect classification using vowel acoustic parameters. Speech Commun 92:13–22CrossRef

Tiwari VK, Jain SK (2016) Hardware implementation of polyphase-decomposition-based wavelet filters for power system harmonics estimation. IEEE Trans Instrum Meas 65(7):1585–1595CrossRef

Tracey BH, Miller EL (2012) Nonlocal means denoising of ECG signals. IEEE Trans Biomed Eng 59(9):2383–2386CrossRef

Van De Ville D, Kocher M (2009) Sure-based non-local means. IEEE Signal Process Lett 16(11):973–976CrossRef

Varga A, Steeneken HJM (1993) Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effct of additive noise on speech recognition systems. Speech Commun 12(3):247–251

Vuppala A, Yadav J, Chakrabarti S, Rao KS (2012) Vowel onset point detection for low bit rate coded speech. IEEE Trans Audio Speech Lang Process 20(6):1894–1903CrossRef

Vuppala AK, Rao KS, Chakrabarti S (2011) Improved consonant-vowel recognition for low bit-rate coded speech. Int J Adapt Control Signal Process 26(4):333–349CrossRef

Wang J, Hu C, Hung S, Lee J (1991) A hierarchical neural network based C/V segmentation algorithm for Mandarin speech recognition. IEEE Trans Signal Process 39(9):2141–2146CrossRef

Wolfe V, Cornell R, Fitch J (1995) Sentence/vowel correlation in the evaluation of dysphonia. J Voice 9(3):297–303CrossRef

Titel: Detection of vowel-like speech: an efficient hardware architecture and it's FPGA prototype
verfasst von: Nagapuri Srinivas
Gayadhar Pradhan
Puli Kishore Kumar
Publikationsdatum: 25.10.2018
Verlag: Springer Berlin Heidelberg
Erschienen in: Microsystem Technologies / Ausgabe 4/2019
Print ISSN: 0946-7076
Elektronische ISSN: 1432-1858
DOI: https://doi.org/10.1007/s00542-018-4192-8

Neuer Inhalt

Bildnachweise

VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Jonas Klose/© Pine Valley Capital GmbH, Carina Kießling von der Strategieberatung Roland Berger/© Monika Walther Fotografie | ATZ, Beijing Auto Show 2024: Deutsche Hersteller wollen angreifen./© EKH-Pictures / Generated with AI / Stock.adobe.com, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell, ATZ-Webinar: Prototypenfreie Entwicklung durch Offline- und Driver-in-the-Loop-HiL-Tests /© (c) VI-grade

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 4/2019

MHD flow with heat and mass transfer of Williamson nanofluid over stretching sheet through porous medium

Experimental study of fabricating a four-layers Cantor fractal microfluidic chip by CO2 laser system

Design and analysis of MEMS based piezoelectric energy harvester for machine monitoring application

Robust and reliable design of bio-nanorobotic systems

Uncertainty of shape memory alloy micro-actuator using generalized polynomial chaos method

Design of flexible textile antenna using FR4, jeans cotton and teflon substrates

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.