DOI QR코드

DOI QR Code

Robust Speech Endpoint Detection in Noisy Environments for HRI (Human-Robot Interface)

인간로봇 상호작용을 위한 잡음환경에 강인한 음성 끝점 검출 기법

  • 박진수 (고려대학교 바이오마이크로시스템기술 협동과정) ;
  • 고한석 (고려대학교 전기전자전파공학부)
  • Received : 2012.02.03
  • Accepted : 2013.01.10
  • Published : 2013.03.31

Abstract

In this paper, a new speech endpoint detection method in noisy environments for moving robot platforms is proposed. In the conventional method, the endpoint of speech is obtained by applying an edge detection filter that finds abrupt changes in the feature domain. However, since the feature of the frame energy is unstable in such noisy environments, it is difficult to accurately find the endpoint of speech. Therefore, a novel feature extraction method based on the twice-iterated fast fourier transform (TIFFT) and statistical models of speech is proposed. The proposed feature extraction method was applied to an edge detection filter for effective detection of the endpoint of speech. Representative experiments claim that there was a substantial improvement over the conventional method.

본 논문에서는 이동하는 로봇에 탑재한 대화체 음성인식기의 주위 잡음 환경에 강인한 새로운 음성 끝점 검출 기법을 제안한다. 기존의 기법은 특징 값의 갑작스러운 변화점을 찾기 위해 에지 검출 필터(edge detection filter)를 적용하여 끝점을 찾았다. 하지만 프레임 에너지의 특징은 잡음 환경에서 불안정하기 때문에 음성의 끝점을 정확하게 찾기 어렵다. 그러므로 두 번의 고속 퓨리에 변환과 통계적 모델 기반의 특징 추출 기법을 제안하여 에지 검출 필터에 적용한다. 제안한 기법이 기존의 기법보다 강인한 특징이 될 수 있음을 본 실험을 통하여 확인하였다.

Keywords

References

  1. J. Beh, R. H. Baran, and H. Ko, "Dual channel based speech enhancement using novelty filter for robust speech recognition in automobile environment," IEEE Trans. Consumer Electronics 52, 583-589 (2006). https://doi.org/10.1109/TCE.2006.1649683
  2. J. Beh and H. Ko, "Spectral subtraction using spectral harmonics for robust speech recognition in car environments," LNCS 2660, 1109-1116 (2003).
  3. L. R. Labiner and M. R. Sambur, "An algorithm for determining the endpoints for isolated utterance," Bell Syst. Tech. J. 54, 297-315 (1975). https://doi.org/10.1002/j.1538-7305.1975.tb02840.x
  4. L. R. Labiner and B. H. Juang, Fundamentals of Speech Recognition, (Prentice Hall, NJ, 1993).
  5. ITU-T, A Silence Compression Scheme for G.729 Optimized for Terminals Conforming to ITU-T V.70, (ITU-T Rec. G. 729, Annex B, 1996).
  6. J. G. Wilpon and L. R. Labiner, "Application of hidden Markov models to automatic speech endpoint detection," Comput. Speech Lang. 2, 321-341 (1987). https://doi.org/10.1016/0885-2308(87)90015-5
  7. E. Nemer, R. Goubran, and S. Mahmoud, "Robust voice activity detection using higher-order statistics in the LPC residual domain," IEEE Trans. Speech Audio Process. 9, 217-231 (2001). https://doi.org/10.1109/89.905996
  8. K. Li, M. N. S. Swamy, and M. O. Ahmad, "An improved voice activity detection using higher order statistics," IEEE Trans. Speech Audio Process. 13, 965-974 (2005). https://doi.org/10.1109/TSA.2005.851955
  9. B. F. Wu and K. C. Wang, "Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments," IEEE Trans. Speech Audio Process. 13, 762-775 (2005). https://doi.org/10.1109/TSA.2005.851909
  10. Q. Li and A. Tsai, "A matched filter approach to endpoint detection for robust speaker verification," in Proc. IEEE Work. AIAT (1999).
  11. Q. Li, J. Zheng, A. Tsai, and Q. Zhou, "Robust endpoint detection and energy normalization for real-time speech and speaker recognition," IEEE Trans. Speech Audio Process. 10, 146-157 (2002). https://doi.org/10.1109/TSA.2002.1001979
  12. H. Ghaemmaghami, R. Vogt, S. Sridharan, and M. Mason, "Speech endpoint detection using gradient based edge detection techniques," in Proc. ICSPCS, 1-8 (2008).
  13. T. Fukuda, O. Ichikawa, and M. Nishimura, "Long-term spectro-temporal and static harmonic features for voice activity detection," IEEE J. STSP 4, 834-844 (2010).
  14. K. Ishizuka, T. Nakatani, and M. Fujimoto, "Noise robust front-end processing with voice activity detection based on periodic to aperiodic component ratio," Speech Communication 52, 41-60 (2010). https://doi.org/10.1016/j.specom.2009.08.003
  15. T. Kristjansson, S. Deligne, and P. Olsen, "Voicing features for robust speech detection," in Proc. Interspeech, 369-372 (2005).
  16. Q. Jo, J. Chang, J. Kim, and N. Kim, "Statistical modelbased voice activity detection using support vector machine," IET Signal Process. 3, 205-210 (2009). https://doi.org/10.1049/iet-spr.2008.0128
  17. Q. Jo, Y. Park, K. Lee, and J. Jang, "A support vector machine-based voice activity detection using effective feature vectors" (in Korean) J. Telecommunications Review 18, 362-370 (2008).
  18. N. C. Maddage, K. Wan, and C. Xu, Wang, "Singing voice detection using twice-iterated composite fourier transform," in Proc. IEEE ICME, 1347-1350 (2004).
  19. S. Gazor and W. Zhang, "A soft voice activity detector based on a Laplacian-Gaussian model," IEEE Trans. Speech Audio Process. 11, 498-505 (2003). https://doi.org/10.1109/TSA.2003.815518
  20. J. Sohn and W. Sung, "A Voice activity detector employing soft decision based noise spectrum adaptation," in Proc. IEEE ICASSP, 365-368 (1998).

Cited by

  1. Performance Evaluation of Silence-Feature Normalization Model using Cepstrum Features of Noise Signals 2017, https://doi.org/10.1007/s11277-017-4645-x