nach oben

Computing

Erschienen in:

26.08.2019

English speech recognition based on deep learning with multiple features

verfasst von: Zhaojuan Song

Erschienen in: Computing | Ausgabe 3/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

English is one of the widely used languages, with the shrinking of the global village, the smart home, the in-vehicle voice system and voice recognition software with English as the recognition language have gradually entered people’s field of vision, and have obtained the majority of users’ love by the practical accuracy. And deep learning technology in many tasks with its hierarchical feature learning ability and data modeling capabilities has achieved more than the performance of shallow learning technology. Therefore, this paper takes English speech as the research object, and proposes a deep learning speech recognition algorithm that combines speech features and speech attributes. Firstly, the deep neural network supervised learning method is used to extract the high-level features of the speech, select the output of the fixed hidden layer as the new speech feature for the newly generated network, and train the GMM–HMM acoustic model with the new speech features; secondly, the speech attribute extractor based on deep neural network is trained for multiple speech attributes, and the extracted speech attributes are classified into phoneme by deep neural network; finally, speech features and speech attribute features are merged into the same CNN framework by the neural network based on the linear feature fusion algorithm. The experimental results show that the proposed English speech recognition algorithm based on deep neural network with multiple features can directly and effectively combine the two methods by combining the speech features and the speech attributes of the speaker in the input layer of the deep neural network, and it can improve the performance of the English speech recognition system significantly.

Vorheriger Artikel Research on statistical machine translation model based on deep neural network

Nächster Artikel Chinese text classification based on attention mechanism and feature-enhanced fusion neural network

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nassif AB, Shahin I, Attili I et al (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7(99):19143–19165CrossRef

Toth L, Hoffmann I, Gosztolya G et al (2018) A speech recognition-based solution for the automatic detection of mild cognitive impairment from spontaneous speech. Curr Alzheimer Res 15(2):130–138CrossRef

Schillingmann L, Ernst J, Keite V et al (2018) AlignTool: the automatic temporal alignment of spoken utterances in German, Dutch, and British English for psycholinguistic purposes. Behav Res Methods 50(2):466–489CrossRef

Coutrot A, Hsiao JH, Chan AB (2018) Scanpath modeling and classification with hidden Markov models. Behav Res Methods 50(1):362–379CrossRef

Ali Z, Abbas AW, Thasleema TM, Uddin B, Raaz T, Abid SAR (2015) Database development and automatic speech recognition of isolated Pashto spoken digits using MFCC and K-NN. Int J Speech Technol 18(2):271–275CrossRef

Satori H, Zealouk O, Satori K et al (2017) Voice comparison between smokers and non-smokers using HMM speech recognition system. Int J Speech Technol 20(12):1–7

Bocchieri E (2017) System and method for speech recognition modeling for mobile voice search. Jersey Citynj Usphiladelphiapa Uschathamnj Us 47(10):4888–4891

Telmem M, Ghanou Y (2018) Estimation of the optimal HMM parameters for amazigh speech recognition system using CMU-Sphinx. Procedia Comput Sci 127:92–101CrossRef

Siniscalchi SM, Salerno VM (2017) Adaptation to new microphones using artificial neural networks with trainable activation functions. IEEE Trans Neural Netw Learn Syst 28(8):1959–1965MathSciNetCrossRef

10.

Enarvi S, Smit P, Virpioja S et al (2017) Automatic speech recognition with very large conversational finnish and estonian vocabularies. IEEE/ACM Trans Audio Speech Lang Process 25(11):2085–2097CrossRef

11.

Yan Z, Qiang H, Jian X (2013) A scalable approach to using DNN-derived features in GMM–HMM based acoustic modeling for LVCSR. Math Comput 44(170):519–521

12.

Sailor HB, Patil HA, Sailor HB et al (2016) Novel unsupervised auditory filterbank learning using convolutional RBM for speech recognition. IEEE/ACM Trans Audio Speech Lang Process 24(12):2341–2353CrossRef

13.

Cairong Z, Xinran Z, Cheng Z et al (2016) A novel DBN feature fusion model for cross-corpus speech emotion recognition. J Electr Comput Eng 2016(4):1–11

14.

Affonso ET, Rosa RL, Rodríguez DZ (2017) Speech quality assessment over lossy transmission channels using deep belief networks. IEEE Signal Process Lett 25(1):70–74CrossRef

15.

Ali H, Tran SN, Benetos E et al (2018) Speaker recognition with hybrid features from a deep belief network. Neural Comput Appl 29(6):13–19CrossRef

16.

Jian L, Li Z, Yang X et al (2019) Combining unmanned aerial vehicles with artificial-intelligence technology for traffic-congestion recognition: electronic eyes in the skies to spot clogged roads. IEEE Consum Electron Mag 8(3):81–86CrossRef

17.

Toshitatsu T, Masumura R, Sakauchi S et al (2018) New report preparation system for endoscopic procedures using speech recognition technology. Endosc Int Open 6(6):E676–E687CrossRef

18.

Ishimitsu S (2018) Speech recognition method and speech recognition apparatus. J Acoust Soc Am 94(109):3538

19.

Abdelaziz AH (2018) Comparing fusion models for DNN-based audiovisual continuous speech recognition. IEEE/ACM Trans Audio Speech Lang Process 26(3):475–484CrossRef

20.

Fadlullah ZM, Tang F, Mao B et al (2017) State-of-the-art deep learning: evolving machine intelligence toward tomorrow’s intelligent network traffic control systems. IEEE Commun Surv Tutor 19(4):2432–2455CrossRef

21.

Tang D, Bing Q, Liu T (2015) Deep learning for sentiment analysis: successful approaches and future challenges. Wiley Interdiscip Rev Data Min Knowl Discov 5(6):292–303CrossRef

22.

Chen Miaochao, Shengqi Lu, Liu Qilin (2018) Global regularity for a 2D model of electro-kinetic fluid in a bounded domain. Acta Math Appl Sin Engl Ser 34(2):398–403MathSciNetCrossRef

23.

Tomczak JM, Gonczarek A (2017) Learning invariant features using subspace restricted boltzmann machine. Neural Process Lett 45(1):173–182CrossRef

24.

Zhang F, Mao Q, Shen X et al (2018) Spatially coherent feature learning for pose-invariant facial expression recognition. ACM Trans Multimed Comput Commun Appl 14(1s):1–19CrossRef

25.

Yin J (2019) Study on the progress of neural mechanism of positive emotions. Transl Neurosci 10(1):93–98. https://doi.org/10.1515/tnsci-2019-0016 CrossRef

Titel: English speech recognition based on deep learning with multiple features
verfasst von: Zhaojuan Song
Publikationsdatum: 26.08.2019
Verlag: Springer Vienna
Erschienen in: Computing / Ausgabe 3/2020
Print ISSN: 0010-485X
Elektronische ISSN: 1436-5057
DOI: https://doi.org/10.1007/s00607-019-00753-0

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 3/2020

Towards emotion-sensitive learning cognitive state analysis of big data in education: deep learning-based facial expression analysis using ordinal information

Study on text representation method based on deep learning and topic information

Ontology-based discovery of time-series data sources for landslide early warning system

A mosaic method for multi-temporal data registration by using convolutional neural networks for forestry remote sensing applications

RNN-based signal classification for hybrid audio data compression

Editorial for the special issue on big time series data

Premium Partner