Hybrid CTC/Attention Architecture with Self-Attention and Convolution Hybrid Encoder for Speech Recognition

Mingxin Nie; Zhichao Lei

doi:10.1088/1742-6596/1549/5/052034

Journal of Physics: Conference Series

Paper • The following article is Open access

Hybrid CTC/Attention Architecture with Self-Attention and Convolution Hybrid Encoder for Speech Recognition

Mingxin Nie¹ and Zhichao Lei¹

Published under licence by IOP Publishing Ltd
Journal of Physics: Conference Series, Volume 1549, 4. Power Engineering Citation Mingxin Nie and Zhichao Lei 2020 J. Phys.: Conf. Ser. 1549 052034 DOI 10.1088/1742-6596/1549/5/052034

Download Article PDF

Article metrics

377 Total downloads

Author e-mails

624968304@qq.com

2653088672@qq.com

Author affiliations

¹ School of Wuhan University of Technology, Wuhan 430070, China

Buy this article in print

Journal RSS

Sign up for new issue notifications

Abstract

The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. The transformer model based on self-attention get a promising result. As you can see, the hybrid model based on Connectionist Temporal Classification (CTC)/Attention has very prominent advantages in decoding, which can combine the excellent sequence-to-sequence modeling ability of attention, and can also combine CTC to achieve temporal alignment. We propose SA-Conv-CTC/Attention model, which apply a Self-Attention and shallow Convolution based hybrid encoder to Hybrid CTC/Attention Architecture, and we also explored the method of decoding with Self-Attention language models. The CTC network sits on top of the encoder and is jointly trained with the attention-based decoder, even participated in decoding. We achieve a 0.8-4.75% error reduction compared to other hybrid CTC/Attention systems on WSJ and HKUST dataset.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.

Hybrid CTC/Attention Architecture with Self-Attention and Convolution Hybrid Encoder for Speech Recognition

Article metrics

Share this article

Author e-mails

Author affiliations

Abstract