School of Information and Engineering, Minzu University of China , Beijing, 100081, China.
Rensselaer Polytechnic Institute, 110 Eighth Street, Troy NY 12180-3590, USA.
We proposed a method using latent regression Bayesian network (LRBN) to extract the shared speech feature for the input of end-to-end speech recognition model. The structure of LRBN is compact and its parameter learning is fast. Compared with Convolutional Neural Network, it has a simpler and understood structure and less parameters to learn. Experimental results show that the advantage of hybrid LRBN/Bidirectional Long Short-Term Memory-Connectionist Temporal Classification architecture for Tibetan multi-dialect speech recognition, and demonstrate the LRBN is helpful to differentiate among multiple language speech sets.
Zhao, Y., Yue, J., Song, W., Xu, X., Li, X. et al. (2019). Tibetan multi-dialect speech recognition using latent regression bayesian network and end-to-end mode. Journal on Internet of Things, 1(1), 17-23. https://doi.org/10.32604/jiot.2019.05866
Vancouver Style
Zhao Y, Yue J, Song W, Xu X, Li X, Wu L, et al. Tibetan multi-dialect speech recognition using latent regression bayesian network and end-to-end mode. J Internet Things . 2019;1(1):17-23 https://doi.org/10.32604/jiot.2019.05866
IEEE Style
Y. Zhao et al., "Tibetan Multi-Dialect Speech Recognition Using Latent Regression Bayesian Network and End-To-End Mode," J. Internet Things , vol. 1, no. 1, pp. 17-23. 2019. https://doi.org/10.32604/jiot.2019.05866
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.