Skip to main content
Top
Published in: International Journal of Speech Technology 2/2020

24-02-2020

RETRACTED ARTICLE: Preserving learnability and intelligibility at the point of care with assimilation of different speech recognition techniques

Authors: Sukumar Rajendran, Prabhu Jayagopal

Published in: International Journal of Speech Technology | Issue 2/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The neoteric introduction of 5G technology in mobile internet is transforming Internet of mobile Things (IoMT) massively by addressing low latency, support for a large number of IoMT devices, and less power consumption, thereby delivering cost-effective solutions to low-end devices. This transformational technology enables a ubiquitous connected critical communication network between the healthcare system and IoT, as they largely depend on low-end devices for gathering data at the point of care. Gathering and interpolation of data from things are moved over to the cloud, making the extraction of knowledge and decision-making capabilities more robust. Vocal signals form the basis of communication between human beings with the transfer of complex data with variations in thrust, pitch, and tones. The representation and recognition of these analog signals by digital systems prove to be quite exciting and challenging. The spoken language models are converted to digital signals to be identifiable based on different cues like phonetic, prosodic, phonotactic, and lexical features. Voice patterns tend to be specific for every individual with a slight orientation towards the language spoken by the individual of a particular region. While speech patterns tend to alter the meaning of words with tones, high and low pitches in the utterance of the words, NLP tends to learn specific associations of words through vectors. The focus on learned networks in solving the problem of speech synthesis to text with minimal loss and high predictability of syllable of word, sentence, and paraphrase is needed. The creation of a knowledge base corpus of learned variable prosody of features helps in the learnability of interestingness directly without any perturbations. The learning algorithms to realize the degree of understandability of speech with the word, sentence identified, and transcription with substantial noise interference. The transfer of the acoustic features learned by algorithms proves to be quiet challenging as they are distorted by sudden environmental changes. Syllable extracted from the speech translation may or may not represent the Sentiment of the word, with different phonetical modulation. Utilization of the MobileNets and DistillBERT to transfer the language extraction and the edge reducing the time of processing and reducing the corpus of the size, reducing the adversial learning of the voice features and the patterns, reducing the Transfer of learned corpus and patterns.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Alhussein, M., & Muhammad, G. (2019). Automatic voice pathology monitoring using parallel deep models for smart healthcare. IEEE Access, 7, 46474–46479.CrossRef Alhussein, M., & Muhammad, G. (2019). Automatic voice pathology monitoring using parallel deep models for smart healthcare. IEEE Access, 7, 46474–46479.CrossRef
go back to reference Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4), 929–965.MathSciNetCrossRef Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 36(4), 929–965.MathSciNetCrossRef
go back to reference Camastra, F. (2007). Machine learning for audio, image and video analysis. London: Springer.CrossRef Camastra, F. (2007). Machine learning for audio, image and video analysis. London: Springer.CrossRef
go back to reference Chen, Y., Skiena, S. (2014). Building sentiment lexicons for all major languages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), (pp. 383–389). Chen, Y., Skiena, S. (2014). Building sentiment lexicons for all major languages. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), (pp. 383–389).
go back to reference Dong, L., Guo, Q., & Wu, W. (2019). Speech corpora subset selection based on time-continuous utterances features. Journal of Combinatorial Optimization, 37(4), 1237–1248.MathSciNetCrossRef Dong, L., Guo, Q., & Wu, W. (2019). Speech corpora subset selection based on time-continuous utterances features. Journal of Combinatorial Optimization, 37(4), 1237–1248.MathSciNetCrossRef
go back to reference Hadian, M., Altuwaiyan, T., Liang, X., & Li, W. (2019). Privacy-preserving voice-based search over mHealth data. Smart Heal., 12, 24–34.CrossRef Hadian, M., Altuwaiyan, T., Liang, X., & Li, W. (2019). Privacy-preserving voice-based search over mHealth data. Smart Heal., 12, 24–34.CrossRef
go back to reference Hou, Y., Kong, Q., Li, S. & Plumbley, M. D. (2019). Sound event detection with sequentially labelled data based on connectionist temporal classification and unsupervised clustering. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing-Proceedings (vol. 2019, pp. 46–50). Hou, Y., Kong, Q., Li, S. & Plumbley, M. D. (2019). Sound event detection with sequentially labelled data based on connectionist temporal classification and unsupervised clustering. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing-Proceedings (vol. 2019, pp. 46–50).
go back to reference Howard, A. G. et al. (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv1704.04861. Howard, A. G. et al. (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv1704.​04861.
go back to reference Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., & Wu, Y. (2016). Exploring the limits of language modeling. arXiv1602.02410. Jozefowicz, R., Vinyals, O., Schuster, M., Shazeer, N., & Wu, Y. (2016). Exploring the limits of language modeling. arXiv1602.​02410.
go back to reference Lee, Y., & Kim, T. (2019). Robust and fine-grained prosody control of end-to-end speech synthesis. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing-Proceedings (Vol. 2019, pp. 5911–5915). Lee, Y., & Kim, T. (2019). Robust and fine-grained prosody control of end-to-end speech synthesis. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing-Proceedings (Vol. 2019, pp. 5911–5915).
go back to reference Likhomanenko, T., Synnaeve, G., & Collobert, R. (2019). Who needs words? Lexicon-free speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (vol. 2019, pp. 3915–3919). Likhomanenko, T., Synnaeve, G., & Collobert, R. (2019). Who needs words? Lexicon-free speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (vol. 2019, pp. 3915–3919).
go back to reference Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th International Conference On Machine Learning, ICML 2009, (pp. 689–696). Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th International Conference On Machine Learning, ICML 2009, (pp. 689–696).
go back to reference Neekhara, P., Hussain, S., Pandey, P., Dubnov, S., McAuley, J., & Koushanfar, F. (2019). Universal adversarial perturbations for speech recognition systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (vol. 2019, pp. 481–485). Neekhara, P., Hussain, S., Pandey, P., Dubnov, S., McAuley, J., & Koushanfar, F. (2019). Universal adversarial perturbations for speech recognition systems. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (vol. 2019, pp. 481–485).
go back to reference Park, D. S. et al. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (vol. 2019, pp. 2613–2617). Park, D. S. et al. (2019). Specaugment: A simple data augmentation method for automatic speech recognition. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH (vol. 2019, pp. 2613–2617).
go back to reference Rahulamathavan, Y., Sutharsini, K. R., Ray, I. G., Lu, R., & Rajarajan, M. (2019). Privacy-preserving iVector-based speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language, 27(3), 496–506.CrossRef Rahulamathavan, Y., Sutharsini, K. R., Ray, I. G., Lu, R., & Rajarajan, M. (2019). Privacy-preserving iVector-based speaker verification. IEEE/ACM Transactions on Audio, Speech, and Language, 27(3), 496–506.CrossRef
go back to reference Ran, Z.-Y., & Hu, B.-G. (2017). Parameter identifiability in statistical machine learning: a review. Neural Computation, 29(5), 1151–1203.MathSciNetCrossRef Ran, Z.-Y., & Hu, B.-G. (2017). Parameter identifiability in statistical machine learning: a review. Neural Computation, 29(5), 1151–1203.MathSciNetCrossRef
go back to reference Reddy, V. R., & Rao, K. S. (2016). Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks. Neurocomputing, 171, 1323–1334.CrossRef Reddy, V. R., & Rao, K. S. (2016). Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks. Neurocomputing, 171, 1323–1334.CrossRef
go back to reference Salem, M., Taheri, S., & Yuan, J. S. (2019). Utilizing transfer learning and homomorphic encryption in a privacy preserving and secure biometric recognition system. Computers, 8(1), 3.CrossRef Salem, M., Taheri, S., & Yuan, J. S. (2019). Utilizing transfer learning and homomorphic encryption in a privacy preserving and secure biometric recognition system. Computers, 8(1), 3.CrossRef
go back to reference Sengupta, S., Yasmin, G., & Ghosal, A. (2019). Speaker recognition using occurrence pattern of speech signal. In Advances in Intelligent Systems and Computing, (vol. 727, pp. 207–216), Springer. Sengupta, S., Yasmin, G., & Ghosal, A. (2019). Speaker recognition using occurrence pattern of speech signal. In Advances in Intelligent Systems and Computing, (vol. 727, pp. 207–216), Springer.
go back to reference Son, H. X., Nguyen, M. H., Vo, H. K., & Nguyen, T. P. (2020). Toward an privacy protection based on access control model in hybrid cloud for healthcare systems. Advances in Intelligent Systems and Computing, 951, 77–86.CrossRef Son, H. X., Nguyen, M. H., Vo, H. K., & Nguyen, T. P. (2020). Toward an privacy protection based on access control model in hybrid cloud for healthcare systems. Advances in Intelligent Systems and Computing, 951, 77–86.CrossRef
go back to reference Vaidya, T., & Sherr, M. (2019). You talk too much: Limiting privacy exposure via voice input. In Proceedings-2019 IEEE Symposium on Security and Privacy Workshops, SPW 2019, (pp. 84–91). Vaidya, T., & Sherr, M. (2019). You talk too much: Limiting privacy exposure via voice input. In Proceedings-2019 IEEE Symposium on Security and Privacy Workshops, SPW 2019, (pp. 84–91).
go back to reference Wang, H., Wang, P., Song, L., Ren, B., & Cui, L. (2019). A novel feature enhancement method based on improved constraint model of online dictionary learning. IEEE Access, 7, 17599–17607.CrossRef Wang, H., Wang, P., Song, L., Ren, B., & Cui, L. (2019). A novel feature enhancement method based on improved constraint model of online dictionary learning. IEEE Access, 7, 17599–17607.CrossRef
Metadata
Title
RETRACTED ARTICLE: Preserving learnability and intelligibility at the point of care with assimilation of different speech recognition techniques
Authors
Sukumar Rajendran
Prabhu Jayagopal
Publication date
24-02-2020
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 2/2020
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-020-09687-x

Other articles of this Issue 2/2020

International Journal of Speech Technology 2/2020 Go to the issue