Skip to main content
Erschienen in: International Journal of Speech Technology 4/2022

18.10.2021

Construction of complex environment speech signal communication system based on 5G and AI driven feature extraction techniques

verfasst von: Yi Jiang, ErLi Cheng, YongHao Li, Yali Zhang

Erschienen in: International Journal of Speech Technology | Ausgabe 4/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In daily life, the most direct and important way of human communication is voice. With the rise of the Internet and the development of communication technology, the proportion of non voice signals such as image and data in communication system is increasing. However, in most communication systems, voice transmission function is generally required, so it is still one of the necessary functions for many communication systems to transmit voice information effectively. With the rapid development of the Internet, especially the recent 5G technology commercial and future civil, human–computer interaction will become more and more intelligent in the future, which also poses a greater challenge to speech recognition as a human–computer interface. Noise interference is one of the biggest obstacles to the practical application of speech system. Although a large number of noisy data based on deep learning can solve part of the noise robustness problem, non-stationary noise interference is still a great challenge for speech recognition system in very low SNR complex scenes. In addition, in the multi information fusion communication system, there are many kinds of data to be transmitted, and there are high requirements for bandwidth and storage space. Hence, this paper studies the construction of voice signal communication system based on the artificial intelligence and 5G technology. The model is designed and implemented considering the complex scenarios. The performance is validated through the simulations compared with the state-of-the-art methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Cernak, M., Lazaridis, A., Asaei, A., et al. (2016). Composition of deep and spiking neural networks for very low bit rate speech coding[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2301–2312.CrossRef Cernak, M., Lazaridis, A., Asaei, A., et al. (2016). Composition of deep and spiking neural networks for very low bit rate speech coding[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2301–2312.CrossRef
Zurück zum Zitat Chen, J., & Wang, D. L. (2017). Long short-term memory for speaker generalization in supervised speech separation[J]. The Journal of the Acoustical Society of America, 141(6), 4705–4714.MathSciNetCrossRef Chen, J., & Wang, D. L. (2017). Long short-term memory for speaker generalization in supervised speech separation[J]. The Journal of the Acoustical Society of America, 141(6), 4705–4714.MathSciNetCrossRef
Zurück zum Zitat Erdogan H, Hershey J R, Watanabe S, et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks[C]. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Brisbane, QLD, Australia, 2015, 4: 708–712. Erdogan H, Hershey J R, Watanabe S, et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks[C]. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, Brisbane, QLD, Australia, 2015, 4: 708–712.
Zurück zum Zitat Hao, W., Yingxi, L., Bin, X., et al. (2019). Speech enhancement method based on convolution gated recurrent neural network [J]. Journal of Huazhong University of Science and Technology, 47(4), 13–18. Hao, W., Yingxi, L., Bin, X., et al. (2019). Speech enhancement method based on convolution gated recurrent neural network [J]. Journal of Huazhong University of Science and Technology, 47(4), 13–18.
Zurück zum Zitat Huang, P. S., Kim, M., Hasegawa-Johnson, M., et al. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), 2136–2147.CrossRef Huang, P. S., Kim, M., Hasegawa-Johnson, M., et al. (2015). Joint optimization of masks and deep recurrent neural networks for monaural source separation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(12), 2136–2147.CrossRef
Zurück zum Zitat Kim, S. K., Park, Y. J., & Lee, S. (2016). Voice activity detection based on deep belief networks using likelihood ratio[J]. Journal of Central South University, 23(1), 145–149.CrossRef Kim, S. K., Park, Y. J., & Lee, S. (2016). Voice activity detection based on deep belief networks using likelihood ratio[J]. Journal of Central South University, 23(1), 145–149.CrossRef
Zurück zum Zitat Li, Y., & Kang, S. (2016). Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation[J]. IET Signal Processing, 10(4), 422–427.CrossRef Li, Y., & Kang, S. (2016). Artificial bandwidth extension using deep neural network-based spectral envelope estimation and enhanced excitation estimation[J]. IET Signal Processing, 10(4), 422–427.CrossRef
Zurück zum Zitat Li, Z., Cadet, C., & Outbib, R. (2018). Diagnosis for PEMFC based on magnetic measurements and data-driven approach. IEEE Transactions on Energy Conversion, 34(2), 964–972.CrossRef Li, Z., Cadet, C., & Outbib, R. (2018). Diagnosis for PEMFC based on magnetic measurements and data-driven approach. IEEE Transactions on Energy Conversion, 34(2), 964–972.CrossRef
Zurück zum Zitat Ling, Z. H., Kang, S. Y., Zen, H., et al. (2015). Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends[J]. IEEE Signal Processing Magazine, 32(3), 35–52.CrossRef Ling, Z. H., Kang, S. Y., Zen, H., et al. (2015). Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends[J]. IEEE Signal Processing Magazine, 32(3), 35–52.CrossRef
Zurück zum Zitat Liu M, Wang Y, Wang J, et al. Speech Enhancement Method Based On LSTM Neural Network for Speech Recognition[C]. 2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, Beijing, China, 2018, 8: 245–249. Liu M, Wang Y, Wang J, et al. Speech Enhancement Method Based On LSTM Neural Network for Speech Recognition[C]. 2018 14th IEEE International Conference on Signal Processing (ICSP). IEEE, Beijing, China, 2018, 8: 245–249.
Zurück zum Zitat Luo, X. J., Oyedele, Lukumon O., Ajayi, Anuoluwapo O., Akinade, Olugbenga O., Owolabi, Hakeem A., & Ahmed, Ashraf. (2020). Feature extraction and genetic algorithm enhanced adaptive deep neural network for energy consumption prediction in buildings. Renewable and Sustainable Energy Reviews, 131, e109980.CrossRef Luo, X. J., Oyedele, Lukumon O., Ajayi, Anuoluwapo O., Akinade, Olugbenga O., Owolabi, Hakeem A., & Ahmed, Ashraf. (2020). Feature extraction and genetic algorithm enhanced adaptive deep neural network for energy consumption prediction in buildings. Renewable and Sustainable Energy Reviews, 131, e109980.CrossRef
Zurück zum Zitat Ma, Y., & Huang, B. (2017). Bayesian learning for dynamic feature extraction with application in soft sensing. IEEE Transactions on Industrial Electronics, 64(9), 7171–7180.CrossRef Ma, Y., & Huang, B. (2017). Bayesian learning for dynamic feature extraction with application in soft sensing. IEEE Transactions on Industrial Electronics, 64(9), 7171–7180.CrossRef
Zurück zum Zitat Michelsanti D, Tan Z H. Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification[J]. arXiv preprint arXiv: 1709.01703, 2017. Michelsanti D, Tan Z H. Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification[J]. arXiv preprint arXiv: 1709.01703, 2017.
Zurück zum Zitat Novotny O, Plchot O, Glembek O, et al. Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition[J]. arXiv preprint arXiv: 1811.07629, 2018. Novotny O, Plchot O, Glembek O, et al. Analysis of DNN Speech Signal Enhancement for Robust Speaker Recognition[J]. arXiv preprint arXiv: 1811.07629, 2018.
Zurück zum Zitat Park S R, Lee J. A fully convolutional neural network for speech enhancement[J]. arXiv preprint arXiv: 1609.07132, 2016. Park S R, Lee J. A fully convolutional neural network for speech enhancement[J]. arXiv preprint arXiv: 1609.07132, 2016.
Zurück zum Zitat Qing, W. (2018). Research on speech enhancement based on multi-objective learning and fusion of deep neural network [D]. University of science and technology of China. Qing, W. (2018). Research on speech enhancement based on multi-objective learning and fusion of deep neural network [D]. University of science and technology of China.
Zurück zum Zitat Ruiyu, L., Li, Z., Qingyun, W., et al. (2018). Speech signal processing: C + + version [M]. China Machine Press. Ruiyu, L., Li, Z., Qingyun, W., et al. (2018). Speech signal processing: C + + version [M]. China Machine Press.
Zurück zum Zitat Sun, M., Konstantelos, I., & Strbac, G. (2018). A deep learning-based feature extraction framework for system security assessment. IEEE Transactions on Smart Grid, 10(5), 5007–5020.CrossRef Sun, M., Konstantelos, I., & Strbac, G. (2018). A deep learning-based feature extraction framework for system security assessment. IEEE Transactions on Smart Grid, 10(5), 5007–5020.CrossRef
Zurück zum Zitat Wai C.Chu.Speech Coding Algorithms: Foundation and Evolution of Standardized Coders[M]. Hoboken and New Jersey:A John Wiley & Sons,Inc,2003:1–60. Wai C.Chu.Speech Coding Algorithms: Foundation and Evolution of Standardized Coders[M]. Hoboken and New Jersey:A John Wiley & Sons,Inc,2003:1–60.
Zurück zum Zitat Wan, J., Zheng, P., Si, H., Xiong, N. N., Zhang, W., & Vasilakos, A. V. (2019). An artificial intelligence driven multi-feature extraction scheme for big data detection. IEEE Access, 7, 80122–80132.CrossRef Wan, J., Zheng, P., Si, H., Xiong, N. N., Zhang, W., & Vasilakos, A. V. (2019). An artificial intelligence driven multi-feature extraction scheme for big data detection. IEEE Access, 7, 80122–80132.CrossRef
Zurück zum Zitat Wang Y, Zhao S, Liu W, et al. Speech Bandwidth Expansion Based on Deep Neural Networks[C]//Interspeech. 2015: 2593–2597. Wang Y, Zhao S, Liu W, et al. Speech Bandwidth Expansion Based on Deep Neural Networks[C]//Interspeech. 2015: 2593–2597.
Zurück zum Zitat Weninger, F., Erdogan, H., Watanabe, S., et al. (2015). Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR[C]. International Conference on Latent Variable Analysis and Signal Separation, Liberec, Czech Republic, 8, 91–99. Weninger, F., Erdogan, H., Watanabe, S., et al. (2015). Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR[C]. International Conference on Latent Variable Analysis and Signal Separation, Liberec, Czech Republic, 8, 91–99.
Zurück zum Zitat Li Xiaodong. Research and implementation of an improved speech codec algorithm [D]. University of Defense Science and technology, 2011:3–10. Li Xiaodong. Research and implementation of an improved speech codec algorithm [D]. University of Defense Science and technology, 2011:3–10.
Zurück zum Zitat Xu Y, Du J, Huang Z, et al. Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement[J]. arXiv preprint arXiv: 1703.07172, 2017. Xu Y, Du J, Huang Z, et al. Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement[J]. arXiv preprint arXiv: 1703.07172, 2017.
Zurück zum Zitat Xu, Y., Du, J., Dai, L. R., et al. (2015). A regression approach to speech enhancement based on deep neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 7–19.CrossRef Xu, Y., Du, J., Dai, L. R., et al. (2015). A regression approach to speech enhancement based on deep neural networks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(1), 7–19.CrossRef
Zurück zum Zitat Feng Xudong. Research and implementation of low bit rate error resilient speech coding algorithm [D]. Xi'an University of Electronic Science and technology, 2013:2–15. Feng Xudong. Research and implementation of low bit rate error resilient speech coding algorithm [D]. Xi'an University of Electronic Science and technology, 2013:2–15.
Zurück zum Zitat Jiang Xuehua. Digital speech compression system based on arm [D]. Hunan University. 2007:2–8. Jiang Xuehua. Digital speech compression system based on arm [D]. Hunan University. 2007:2–8.
Zurück zum Zitat Zhuang, Y.-T., Fei, Wu., Chen, C., & Pan, Y.-h. (2017). Challenges and opportunities: from big data to knowledge in AI 2.0. Frontiers of Information Technology & Electronic Engineering, 18(1), 3–14.CrossRef Zhuang, Y.-T., Fei, Wu., Chen, C., & Pan, Y.-h. (2017). Challenges and opportunities: from big data to knowledge in AI 2.0. Frontiers of Information Technology & Electronic Engineering, 18(1), 3–14.CrossRef
Metadaten
Titel
Construction of complex environment speech signal communication system based on 5G and AI driven feature extraction techniques
verfasst von
Yi Jiang
ErLi Cheng
YongHao Li
Yali Zhang
Publikationsdatum
18.10.2021
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 4/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-021-09900-5

Weitere Artikel der Ausgabe 4/2022

International Journal of Speech Technology 4/2022 Zur Ausgabe

Neuer Inhalt