Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface)

Park, Jin-Soo;Ko, Han-Seok;

doi:10.7776/ASK.2013.32.2.147

The Journal of the Acoustical Society of Korea (한국음향학회지)

Volume 32 Issue 2
/
Pages.147-156
/
2013
/
1225-4428(pISSN)
/
2287-3775(eISSN)

The Acoustical Society of Korea (한국음향학회)

DOI QR Code

Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface)

인간로봇 상호작용을 위한 잡음환경에 강인한 음성 끝점 검출 기법

박진수 (고려대학교 바이오마이크로시스템기술 협동과정) ;
고한석 (고려대학교 전기전자전파공학부)

Received : 2012.02.03
Accepted : 2013.01.10
Published : 2013.03.31

https://doi.org/10.7776/ASK.2013.32.2.147 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In this paper, a new speech endpoint detection method in noisy environments for moving robot platforms is proposed. In the conventional method, the endpoint of speech is obtained by applying an edge detection filter that finds abrupt changes in the feature domain. However, since the feature of the frame energy is unstable in such noisy environments, it is difficult to accurately find the endpoint of speech. Therefore, a novel feature extraction method based on the twice-iterated fast fourier transform (TIFFT) and statistical models of speech is proposed. The proposed feature extraction method was applied to an edge detection filter for effective detection of the endpoint of speech. Representative experiments claim that there was a substantial improvement over the conventional method.

본 논문에서는 이동하는 로봇에 탑재한 대화체 음성인식기의 주위 잡음 환경에 강인한 새로운 음성 끝점 검출 기법을 제안한다. 기존의 기법은 특징 값의 갑작스러운 변화점을 찾기 위해 에지 검출 필터(edge detection filter)를 적용하여 끝점을 찾았다. 하지만 프레임 에너지의 특징은 잡음 환경에서 불안정하기 때문에 음성의 끝점을 정확하게 찾기 어렵다. 그러므로 두 번의 고속 퓨리에 변환과 통계적 모델 기반의 특징 추출 기법을 제안하여 에지 검출 필터에 적용한다. 제안한 기법이 기존의 기법보다 강인한 특징이 될 수 있음을 본 실험을 통하여 확인하였다.

Keywords

References

J. Beh, R. H. Baran, and H. Ko, "Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment," IEEE Trans. Consumer Electronics 52, 583-589 (2006). https://doi.org/10.1109/TCE.2006.1649683
J. Beh and H. Ko, "Spectral subtraction using spectral harmonics for robust speech recognition in car environments," LNCS 2660, 1109-1116 (2003).
L. R. Labiner and M. R. Sambur, "An algorithm for determining the endpoints for isolated utterance," Bell Syst. Tech. J. 54, 297-315 (1975). https://doi.org/10.1002/j.1538-7305.1975.tb02840.x
L. R. Labiner and B. H. Juang, Fundamentals of Speech Recognition, (Prentice Hall, NJ, 1993).
ITU-T, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to ITU-T V.70, (ITU-T Rec. G. 729, Annex B, 1996).
J. G. Wilpon and L. R. Labiner, "Application of hidden Markov models to automatic speech endpoint detection," Comput. Speech Lang. 2, 321-341 (1987). https://doi.org/10.1016/0885-2308(87)90015-5
E. Nemer, R. Goubran, and S. Mahmoud, "Robust voice activity detection using higher-order statistics in the LPC residual domain," IEEE Trans. Speech Audio Process. 9, 217-231 (2001). https://doi.org/10.1109/89.905996
K. Li, M. N. S. Swamy, and M. O. Ahmad, "An improved voice activity detection using higher order statistics," IEEE Trans. Speech Audio Process. 13, 965-974 (2005). https://doi.org/10.1109/TSA.2005.851955
B. F. Wu and K. C. Wang, "Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments," IEEE Trans. Speech Audio Process. 13, 762-775 (2005). https://doi.org/10.1109/TSA.2005.851909
Q. Li and A. Tsai, "A matched filter approach to endpoint detection for robust speaker verification," in Proc. IEEE Work. AIAT (1999).
Q. Li, J. Zheng, A. Tsai, and Q. Zhou, "Robust endpoint detection and energy normalization for real-time speech and speaker recognition," IEEE Trans. Speech Audio Process. 10, 146-157 (2002). https://doi.org/10.1109/TSA.2002.1001979
H. Ghaemmaghami, R. Vogt, S. Sridharan, and M. Mason, "Speech endpoint detection using gradient based edge detection techniques," in Proc. ICSPCS, 1-8 (2008).
T. Fukuda, O. Ichikawa, and M. Nishimura, "Long-term spectro-temporal and static harmonic features for voice activity detection," IEEE J. STSP 4, 834-844 (2010).
K. Ishizuka, T. Nakatani, and M. Fujimoto, "Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio," Speech Communication 52, 41-60 (2010). https://doi.org/10.1016/j.specom.2009.08.003
T. Kristjansson, S. Deligne, and P. Olsen, "Voicing features for robust speech detection," in Proc. Interspeech, 369-372 (2005).
Q. Jo, J. Chang, J. Kim, and N. Kim, "Statistical modelbased voice activity detection using support vector machine," IET Signal Process. 3, 205-210 (2009). https://doi.org/10.1049/iet-spr.2008.0128
Q. Jo, Y. Park, K. Lee, and J. Jang, "A support vector machine-based voice activity detection using effective feature vectors" (in Korean) J. Telecommunications Review 18, 362-370 (2008).
N. C. Maddage, K. Wan, and C. Xu, Wang, "Singing voice detection using twice-iterated composite fourier transform," in Proc. IEEE ICME, 1347-1350 (2004).
S. Gazor and W. Zhang, "A soft voice activity detector based on a Laplacian-Gaussian model," IEEE Trans. Speech Audio Process. 11, 498-505 (2003). https://doi.org/10.1109/TSA.2003.815518
J. Sohn and W. Sung, "A Voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE ICASSP, 365-368 (1998).

Cited by

Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals 2017, https://doi.org/10.1007/s11277-017-4645-x

The Journal of the Acoustical Society of Korea (한국음향학회지)

Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface)

인간로봇 상호작용을 위한 잡음환경에 강인한 음성 끝점 검출 기법

Abstract

Keywords

References

Cited by

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)