Skip to main content
Erschienen in: International Journal of Speech Technology 3/2022

08.04.2021

RETRACTED ARTICLE: Audio fingerprint analysis for speech processing using deep learning method

verfasst von: Ali Altalbe

Erschienen in: International Journal of Speech Technology | Ausgabe 3/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We are generating truly mind-boggling amounts of audio data on a daily basis simply by using the Internet. In different audio-based applications, it increases the complexity of accessing and analyzing audio data. Therefore, the framework or supporting tools needed to retrieve audio data to make intelligent decisions in speech processing. However, non-stationarity and irregularity are insufficient for segmentation and classification of audio signals. Audio classification methods are used in many applications, such as speaker identification, gender recognition, music type classification, natural sound classification, etc. This work proposes a deep learning method based on long-term short-term memory (LSTM) that can be used with preprocessing, segmentation, and retrieval of audio signals from the GTZAN dataset. The simulation results show that the proposed algorithm can effectively improve the audio fingerprint-based data retrieval accuracy and overcome traditional methods' drawbacks. Compared with existing methods, the proposed LSTM method has achieved good results. The precision, recall, accuracy and F-measure of LSTM is 96.54%, 96.15%, 98.56% and 0.96% respectively. In the real world, the recommended audio fingerprint recognition system effectively works through voice applications, especially in heterogeneous portable consumer devices or online audio distributed systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Al-Maathidi, M. M., & Li, F. F. (2012). NNET based audio content classification and indexing system. International Journal of Digital Information and Wireless Communications (IJDIWC),2(4), 335–347. Al-Maathidi, M. M., & Li, F. F. (2012). NNET based audio content classification and indexing system. International Journal of Digital Information and Wireless Communications (IJDIWC),2(4), 335–347.
Zurück zum Zitat Chen, T., Kuan, K., Celi, L., & Clifford, G. D. (2010). Intelligent heartsound diagnostics on a cellphone using a hands-free kit. AAAI Spring Symposium Series,2010, 26–31. Chen, T., Kuan, K., Celi, L., & Clifford, G. D. (2010). Intelligent heartsound diagnostics on a cellphone using a hands-free kit. AAAI Spring Symposium Series,2010, 26–31.
Zurück zum Zitat Christopher PraveenKumar, R., Suguna, S., & Becky Elfreda, J. (2014). Audio retrieval based on cepstralfeature. International Journal of Computers and Applications,107(17), 28–33.CrossRef Christopher PraveenKumar, R., Suguna, S., & Becky Elfreda, J. (2014). Audio retrieval based on cepstralfeature. International Journal of Computers and Applications,107(17), 28–33.CrossRef
Zurück zum Zitat Deng, S. W., & Han, J. Q. (2016). Towards heart sound classification without segmentation via autocorrelation feature and diffusion maps. Future Generation Computer Systems,60, 13–21.CrossRef Deng, S. W., & Han, J. Q. (2016). Towards heart sound classification without segmentation via autocorrelation feature and diffusion maps. Future Generation Computer Systems,60, 13–21.CrossRef
Zurück zum Zitat Díaz-García, J., Brunet, P., Navazo, I., Vázquez, P.P. (2017). Down sampling methods for medical datasets. In Proceedings of the international conferences on computer graphics, visualization, computer vision and image processing 2017 and big data analytics, data mining and computational intelligence 2017—Part of the multi-conference on computer science and info, Lisbon, Portugal, 23 July 2017; pp. 12–20. Díaz-García, J., Brunet, P., Navazo, I., Vázquez, P.P. (2017). Down sampling methods for medical datasets. In Proceedings of the international conferences on computer graphics, visualization, computer vision and image processing 2017 and big data analytics, data mining and computational intelligence 2017—Part of the multi-conference on computer science and info, Lisbon, Portugal, 23 July 2017; pp. 12–20.
Zurück zum Zitat Geiger, J.T., Schuller, B., Rigoll, G. (2013). Large-scale audio feature extraction and SVM for acoustic scene classification. In Proceedings of IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), New Paltz, Oct 2013, pp. 1–4. Geiger, J.T., Schuller, B., Rigoll, G. (2013). Large-scale audio feature extraction and SVM for acoustic scene classification. In Proceedings of IEEE workshop on applications of signal processing to audio and acoustics (WASPAA), New Paltz, Oct 2013, pp. 1–4.
Zurück zum Zitat Genussov, M., & Cohen, I. (2010). Musical genre classification of audio signals using geometric methods. European Signal Processing Conference,10, 497–501. Genussov, M., & Cohen, I. (2010). Musical genre classification of audio signals using geometric methods. European Signal Processing Conference,10, 497–501.
Zurück zum Zitat Gomes, E.F., Bentley, P.J., Coimbra, M., Pereira, E., Deng, Y. (2013). Classifying heart sounds approaches to the PASCAL challenge. In Proceedings of the HEALTHINF 2013—Proceedings of the international conference on health informatics, Barcelona, Spain, 11–14 February 2013; pp. 337–340. Gomes, E.F., Bentley, P.J., Coimbra, M., Pereira, E., Deng, Y. (2013). Classifying heart sounds approaches to the PASCAL challenge. In Proceedings of the HEALTHINF 2013—Proceedings of the international conference on health informatics, Barcelona, Spain, 11–14 February 2013; pp. 337–340.
Zurück zum Zitat Haque, M. A., & Kim, J. M. (2013). An enhanced fuzzy C-means algorithm for audio segmentation and classification. International Journal of Multimedia Tools Applications,63(2), 485–500.CrossRef Haque, M. A., & Kim, J. M. (2013). An enhanced fuzzy C-means algorithm for audio segmentation and classification. International Journal of Multimedia Tools Applications,63(2), 485–500.CrossRef
Zurück zum Zitat KesavanNamboothiri, T., Anju, L. (2016). Efficient audio retrieval using SVM and DTW techniques.Int. J. Emerg. Technol. Comput. Sci. Electron. (IJETCSE) 23(2) KesavanNamboothiri, T., Anju, L. (2016). Efficient audio retrieval using SVM and DTW techniques.Int. J. Emerg. Technol. Comput. Sci. Electron. (IJETCSE) 23(2)
Zurück zum Zitat Ludeña-Choez, J., & Gallardo-Antolín, A. (2015). Feature extraction based on the high-pass filtering ofaudio signals for acoustic event classification. Journal Computer Speech & Language,30(1), 32–42.CrossRef Ludeña-Choez, J., & Gallardo-Antolín, A. (2015). Feature extraction based on the high-pass filtering ofaudio signals for acoustic event classification. Journal Computer Speech & Language,30(1), 32–42.CrossRef
Zurück zum Zitat Muthumari, A., & Mala, K. (2016). An efficient approach for segmentation, feature extraction, and audio signals classification. Journal Circuits System,7, 255–279.CrossRef Muthumari, A., & Mala, K. (2016). An efficient approach for segmentation, feature extraction, and audio signals classification. Journal Circuits System,7, 255–279.CrossRef
Zurück zum Zitat Nagavi, T.C., Anusha, S.B., Monisha, P., Poornima, S.P. (2013). Content-based audio retrieval with MFCC feature extraction, clustering and sort-merge techniques. In Proceedings of IEEE 4th international conference on computing, communications and networking technologies, July2013, pp. 1–6. Nagavi, T.C., Anusha, S.B., Monisha, P., Poornima, S.P. (2013). Content-based audio retrieval with MFCC feature extraction, clustering and sort-merge techniques. In Proceedings of IEEE 4th international conference on computing, communications and networking technologies, July2013, pp. 1–6.
Zurück zum Zitat Rong, F. (2016). Audio classification method based on machine learning. In IEEE Proceedings of international conference on intelligent transportation, big data & smart city, pp. 81–84. Rong, F. (2016). Audio classification method based on machine learning. In IEEE Proceedings of international conference on intelligent transportation, big data & smart city, pp. 81–84.
Zurück zum Zitat Srinivasa Murthy,Y., Koolagudi, S.G. (2015). Classification of vocal and non-vocal regions from audiosongs using spectral features and pitch variations. In Proceedings of IEEE 28th Canadian Conferenceon Electrical and Computer Engineering (CCECE), Halifax, May 2015, pp. 1271–1276. Srinivasa Murthy,Y., Koolagudi, S.G. (2015). Classification of vocal and non-vocal regions from audiosongs using spectral features and pitch variations. In Proceedings of IEEE 28th Canadian Conferenceon Electrical and Computer Engineering (CCECE), Halifax, May 2015, pp. 1271–1276.
Zurück zum Zitat Yaseen, Son, G.-Y., & Kwon, S. (2018). Classification of heart sound signal using multiple features. Applied Sciences,8(12), 2344.CrossRef Yaseen, Son, G.-Y., & Kwon, S. (2018). Classification of heart sound signal using multiple features. Applied Sciences,8(12), 2344.CrossRef
Zurück zum Zitat Zhang, W., Han, J., & Deng, S. (2017). Heart sound classification based on scaled spectrogram and tensor decomposition. Expert Systems with Applications,84, 220–231.CrossRef Zhang, W., Han, J., & Deng, S. (2017). Heart sound classification based on scaled spectrogram and tensor decomposition. Expert Systems with Applications,84, 220–231.CrossRef
Zurück zum Zitat Zhang, X., Su, Z., Lin, P., He,Q., Yang, J. (2014). An audio feature extraction scheme based on spectraldecomposition. In Proceedings of IEEE international conference on audio, language and image processing (ICALIP), Shanghai, July 2014, pp. 730–733. Zhang, X., Su, Z., Lin, P., He,Q., Yang, J. (2014). An audio feature extraction scheme based on spectraldecomposition. In Proceedings of IEEE international conference on audio, language and image processing (ICALIP), Shanghai, July 2014, pp. 730–733.
Zurück zum Zitat Zheng, Y., Guo, X., & Ding, X. (2015). A novel hybrid energy fraction and entropy-based approach for systolic heart murmurs identification. Expert Systems with Applications,42, 2710–2721.CrossRef Zheng, Y., Guo, X., & Ding, X. (2015). A novel hybrid energy fraction and entropy-based approach for systolic heart murmurs identification. Expert Systems with Applications,42, 2710–2721.CrossRef
Metadaten
Titel
RETRACTED ARTICLE: Audio fingerprint analysis for speech processing using deep learning method
verfasst von
Ali Altalbe
Publikationsdatum
08.04.2021
Verlag
Springer US
Erschienen in
International Journal of Speech Technology / Ausgabe 3/2022
Print ISSN: 1381-2416
Elektronische ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-021-09827-x

Weitere Artikel der Ausgabe 3/2022

International Journal of Speech Technology 3/2022 Zur Ausgabe

Neuer Inhalt