Skip to main content

2018 | OriginalPaper | Buchkapitel

Architectural Approaches for Phonemes Recognition Systems

verfasst von : Luis Wanumen, Hector Florez

Erschienen in: Applied Informatics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Based on the literature, it is possible to build voice recognition systems by using voice synthesizers and voice command controls. In addition, phonemes recognition can be made by implementing algorithms already created for this kinds of tasks. Nevertheless, phonemes recognition might generate some errors, when the implementation of such algorithms is unsuitable. Then, the possibility to perform phonemes recognition based on open source APIs arises. In the work presented in this paper, we used open source APIs for voice commands recognition. Thus, we propose an architecture that allows the construction of a system for phonemes recognition and voice synthesizers. The results have been implemented and validated in order to illustrate the reliability of the proposed architecture.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Huang, X., Acero, A., Hon, H.W., Reddy, R.: Spoken Language Processing: A Guide To Theory, Algorithm, and System Development, vol. 95. Prentice hall PTR, Upper Saddle River (2001) Huang, X., Acero, A., Hon, H.W., Reddy, R.: Spoken Language Processing: A Guide To Theory, Algorithm, and System Development, vol. 95. Prentice hall PTR, Upper Saddle River (2001)
2.
Zurück zum Zitat He, X., Deng, L.: Speech-centric information processing: an optimization-oriented approach. Proc. IEEE 101(5), 1116–1135 (2013)CrossRef He, X., Deng, L.: Speech-centric information processing: an optimization-oriented approach. Proc. IEEE 101(5), 1116–1135 (2013)CrossRef
3.
Zurück zum Zitat Deng, L., et al.: Distributed speech processing in mipad’s multimodal user interface. IEEE Trans. Speech Audio Process. 10(8), 605–619 (2002)CrossRef Deng, L., et al.: Distributed speech processing in mipad’s multimodal user interface. IEEE Trans. Speech Audio Process. 10(8), 605–619 (2002)CrossRef
4.
Zurück zum Zitat Kumatani, K., McDonough, J., Raj, B.: Microphone array processing for distant speech recognition: from close-talking microphones to far-field sensors. IEEE Signal Process. Mag. 29(6), 127–140 (2012)CrossRef Kumatani, K., McDonough, J., Raj, B.: Microphone array processing for distant speech recognition: from close-talking microphones to far-field sensors. IEEE Signal Process. Mag. 29(6), 127–140 (2012)CrossRef
5.
Zurück zum Zitat Zhang, B., Gan, Y., Song, Y., Tang, B.: Application of pronunciation knowledge on phoneme recognition by LSTM neural network. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2906–2911. IEEE (2016) Zhang, B., Gan, Y., Song, Y., Tang, B.: Application of pronunciation knowledge on phoneme recognition by LSTM neural network. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 2906–2911. IEEE (2016)
6.
Zurück zum Zitat Karan, G., Kumar, D., Pai, K., Manikandan, J.: Design of a phoneme based voice controlled home automation system. In: 2017 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 31–35. IEEE (2017) Karan, G., Kumar, D., Pai, K., Manikandan, J.: Design of a phoneme based voice controlled home automation system. In: 2017 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 31–35. IEEE (2017)
7.
Zurück zum Zitat Grossinho, A., Guimaraes, I., Magalhaes, J., Cavaco, S.: Robust phoneme recognition for a speech therapy environment. In: 2016 IEEE International Conference on Serious Games and Applications for Health (SeGAH), pp. 1–7. IEEE (2016) Grossinho, A., Guimaraes, I., Magalhaes, J., Cavaco, S.: Robust phoneme recognition for a speech therapy environment. In: 2016 IEEE International Conference on Serious Games and Applications for Health (SeGAH), pp. 1–7. IEEE (2016)
8.
Zurück zum Zitat Jahan, M., Khan, M.: Sub-vocal phoneme-based EMG pattern recognition and its application in diagnosis. In: 2015 Annual IEEE India Conference (INDICON), pp. 1–4. IEEE (2015) Jahan, M., Khan, M.: Sub-vocal phoneme-based EMG pattern recognition and its application in diagnosis. In: 2015 Annual IEEE India Conference (INDICON), pp. 1–4. IEEE (2015)
9.
Zurück zum Zitat Wu, T., Yang, Y., Wu, Z., Li, D.: Masc: a speech corpus in mandarin for emotion analysis and affective speaker recognition. In: IEEE Odyssey 2006: The Speaker and Language Recognition Workshop, pp. 1–5. IEEE (2006) Wu, T., Yang, Y., Wu, Z., Li, D.: Masc: a speech corpus in mandarin for emotion analysis and affective speaker recognition. In: IEEE Odyssey 2006: The Speaker and Language Recognition Workshop, pp. 1–5. IEEE (2006)
10.
Zurück zum Zitat Ichino, M., Sakano, H., Komatsu, N.: Text-indicated speaker recognition using kernel mutual subspace method. In: 10th International Conference on Control, Automation, Robotics and Vision, ICARCV 2008, 957–961. IEEE (2008) Ichino, M., Sakano, H., Komatsu, N.: Text-indicated speaker recognition using kernel mutual subspace method. In: 10th International Conference on Control, Automation, Robotics and Vision, ICARCV 2008, 957–961. IEEE (2008)
11.
Zurück zum Zitat Miyuki, Y., Hagiwara, Y., Taniguchi, T.: Unsupervised learning for spoken word production based on simultaneous word and phoneme discovery without transcribed data. In: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 156–163. IEEE (2017) Miyuki, Y., Hagiwara, Y., Taniguchi, T.: Unsupervised learning for spoken word production based on simultaneous word and phoneme discovery without transcribed data. In: 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob), pp. 156–163. IEEE (2017)
12.
Zurück zum Zitat Kharchenko, O., Raichev, I., Bodnarchuk, I., Zagorodna, N.: Optimization of software architecture selection for the system under design and reengineering. In: 2018 14th International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), pp. 1245–1248. IEEE (2018) Kharchenko, O., Raichev, I., Bodnarchuk, I., Zagorodna, N.: Optimization of software architecture selection for the system under design and reengineering. In: 2018 14th International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), pp. 1245–1248. IEEE (2018)
13.
Zurück zum Zitat Hochgeschwender, N., Biggs, G., Voos, H.: A reference architecture for deploying component-based robot software and comparison with existing tools. In: 2018 Second IEEE International Conference on Robotic Computing (IRC), pp. 121–128. IEEE (2018) Hochgeschwender, N., Biggs, G., Voos, H.: A reference architecture for deploying component-based robot software and comparison with existing tools. In: 2018 Second IEEE International Conference on Robotic Computing (IRC), pp. 121–128. IEEE (2018)
14.
Zurück zum Zitat Deng, L., O’Shaughnessy, D.: Speech Processing: A Dynamic and Optimization-Oriented Approach. CRC Press (2003) Deng, L., O’Shaughnessy, D.: Speech Processing: A Dynamic and Optimization-Oriented Approach. CRC Press (2003)
15.
Zurück zum Zitat Acero, A.: Acoustical and Environmental Robustness in Automatic Speech Recognition, vol. 201. Springer Science & Business Media (2012) Acero, A.: Acoustical and Environmental Robustness in Automatic Speech Recognition, vol. 201. Springer Science & Business Media (2012)
18.
Zurück zum Zitat Hualde, J.I.: The Sounds of Spanish with Audio CD. Cambridge University Press (2005) Hualde, J.I.: The Sounds of Spanish with Audio CD. Cambridge University Press (2005)
19.
Zurück zum Zitat Dziadzio, S., Nabożny, A., Smywiński-Pohl, A., Ziółko, B.: Comparison of language models trained on written texts and speech transcripts in the context of automatic speech recognition. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 193–197. IEEE (2015) Dziadzio, S., Nabożny, A., Smywiński-Pohl, A., Ziółko, B.: Comparison of language models trained on written texts and speech transcripts in the context of automatic speech recognition. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 193–197. IEEE (2015)
Metadaten
Titel
Architectural Approaches for Phonemes Recognition Systems
verfasst von
Luis Wanumen
Hector Florez
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01535-0_20

Premium Partner