A voice activity detector (VAD) is the prerequisite for speaker recognition in real life. Currently, we deal with the VAD problem at the frame level through short time window function. However, when tackling with the VAD problem manually, we can easily pick out the speech segments containing several words. Inspired by this, we firstly use IIR filter to get the envelope of the waveform and divide the envelope into separate sound segments. And then we extract shape features from the obtained segments and use K-means to cluster the data featured by the amplitude of the wave crest to discard the silent part. Finally, we utilize other shape features to discard the noise part. The performance of our proposed VAD method has apparently surpassed the energy-based VAD and VQVAD with a relative 20% decrease in error rate, While the computation time of the proposed VAD method is only 30% less than that of VQVAD. We also get an encouraging result utilizing our VAD method for speaker recognition with about 3% average decrease in EER.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- A Novel and Efficient Voice Activity Detector Using Shape Features of Speech Wave
- Springer International Publishing
Neuer Inhalt/© ITandMEDIA