Skip to main content
Top
Published in: International Journal of Speech Technology 4/2018

04-09-2018

Large scale data based audio scene classification

Authors: E. Sophiya, S. Jothilakshmi

Published in: International Journal of Speech Technology | Issue 4/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Artificial Intelligence and Machine learning has been used by many research groups for processing large scale data known as big data. Machine learning techniques to handle large scale complex datasets are expensive to process computation. Apache Spark framework called spark MLlib is becoming a popular platform for handling big data analysis and it is used for many machine learning problems such as classification, regression and clustering. In this work, Apache Spark and the advanced machine learning architecture of a Deep Multilayer Perceptron (MLP), is proposed for Audio Scene Classification. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abeber, J., Mimilakis, S. I., Grafe, R., & Lukashevich, H. (2017). Acoustic Scene classification by combining autoencoder-based dimensionality reduction and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017). Abeber, J., Mimilakis, S. I., Grafe, R., & Lukashevich, H. (2017). Acoustic Scene classification by combining autoencoder-based dimensionality reduction and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
go back to reference Bisot, V., Serizel, R., Essid, S., & Richard, G. (2017). Nonnegative feature learning methods for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE 2017). Bisot, V., Serizel, R., Essid, S., & Richard, G. (2017). Nonnegative feature learning methods for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE 2017).
go back to reference Bouguelia, M. R., Verikas, A., Nowaczyk, S., & Santosh, K. C. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.CrossRef Bouguelia, M. R., Verikas, A., Nowaczyk, S., & Santosh, K. C. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.CrossRef
go back to reference Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T. S., & Abad, A. (2009). Detecting audio events for semantic video search. Interspeech, 2009, 1151–1154. Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T. S., & Abad, A. (2009). Detecting audio events for semantic video search. Interspeech, 2009, 1151–1154.
go back to reference Cai, R., Lu, L., Hanjalic, A., Zhang, H. J., & Cai, L. H. (2006). A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 1026–1039.CrossRef Cai, R., Lu, L., Hanjalic, A., Zhang, H. J., & Cai, L. H. (2006). A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 1026–1039.CrossRef
go back to reference Candel, A., Lanford, J., LeDell, E., Parmar, V., & Arora, A. (2015) Deep learning with H2O, by H2O.ai, c. Candel, A., Lanford, J., LeDell, E., Parmar, V., & Arora, A. (2015) Deep learning with H2O, by H2O.ai, c.
go back to reference Cotton, C. V., & Ellis, D. P. W. (2011). Spectral vs spectro-temporal features for acoustic event detection. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (pp. 69–72). IEEE. Cotton, C. V., & Ellis, D. P. W. (2011). Spectral vs spectro-temporal features for acoustic event detection. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (pp. 69–72). IEEE.
go back to reference Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for lvcsr using rectifier linear units and dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8609–8613). IEEE. Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for lvcsr using rectifier linear units and dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8609–8613). IEEE.
go back to reference Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRef Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRef
go back to reference Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRef Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRef
go back to reference Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez Gutierrez, E., & Serra, X. (2017). Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017). Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez Gutierrez, E., & Serra, X. (2017). Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
go back to reference Gupta, A., Thakur, H. K., Shrivastava, R., Kumar, P., & Nag, S. (2017). A big data analysis framework using apache spark and deep learning. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on (pp. 9–16). IEEE. Gupta, A., Thakur, H. K., Shrivastava, R., Kumar, P., & Nag, S. (2017). A big data analysis framework using apache spark and deep learning. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on (pp. 9–16). IEEE.
go back to reference Han, Y., Park, J., & Lee, K. (2017). Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017). Han, Y., Park, J., & Lee, K. (2017). Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
go back to reference Heittola, T., Mesaros, A., Eronen, A., & Virtanen, T. (2013). Context-dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–13.CrossRef Heittola, T., Mesaros, A., Eronen, A., & Virtanen, T. (2013). Context-dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–13.CrossRef
go back to reference Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural network for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29, 82–97.CrossRef Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural network for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29, 82–97.CrossRef
go back to reference Jimenez, A., Elizalde, B., & Raj, B. (2017). Acoustic scene classification using shiftinvariant kernels and random features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017). Jimenez, A., Elizalde, B., & Raj, B. (2017). Acoustic scene classification using shiftinvariant kernels and random features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
go back to reference Kong, Q., Sobieraj, I., Wang, W., & Plumbley, M. D. (2016). Deep neural network baseline for dcase challenge 2016. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016). Kong, Q., Sobieraj, I., Wang, W., & Plumbley, M. D. (2016). Deep neural network baseline for dcase challenge 2016. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).
go back to reference Kumar, A., & Raj, B. (2016). Audio event and scene recognition: A unified approach using strongly and weakly labeled data. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 3475–3482). IEEE. Kumar, A., & Raj, B. (2016). Audio event and scene recognition: A unified approach using strongly and weakly labeled data. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 3475–3482). IEEE.
go back to reference Laffitte, P., Sodoyer, D., Tatkeu, C., & Girin, L. (2016). Deep neural networks for automatic detection of screams and shouted speech in subway trains. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6460–6464). IEEE. Laffitte, P., Sodoyer, D., Tatkeu, C., & Girin, L. (2016). Deep neural networks for automatic detection of screams and shouted speech in subway trains. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6460–6464). IEEE.
go back to reference Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009). Unsuper-vised feature learning for audio classification using convolutional deep belief networks. Advances in Neural Information Processing Systems, 2009, 1096–1104. Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009). Unsuper-vised feature learning for audio classification using convolutional deep belief networks. Advances in Neural Information Processing Systems, 2009, 1096–1104.
go back to reference Lim, H., Park, J., & Han, Y. (2017). Rare sound event detection using 1d convolutional recurrent neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017). Lim, H., Park, J., & Han, Y. (2017). Rare sound event detection using 1d convolutional recurrent neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
go back to reference Mesaros, A., Heittola, T., & Klapuri, A. (2011). Latent semantic analysis in sound event detection. In Signal Processing Conference, 2011 19th European (pp. 1307–1311). IEEE. Mesaros, A., Heittola, T., & Klapuri, A. (2011). Latent semantic analysis in sound event detection. In Signal Processing Conference, 2011 19th European (pp. 1307–1311). IEEE.
go back to reference Mesaros, A., Heittola, T., & Virtanen, T. (2016). TUT database for acoustic scene classification and sound event detection. In Signal Processing Conference (EUSIPCO), 2016 24th European (pp. 1128–1132). IEEE. Mesaros, A., Heittola, T., & Virtanen, T. (2016). TUT database for acoustic scene classification and sound event detection. In Signal Processing Conference (EUSIPCO), 2016 24th European (pp. 1128–1132). IEEE.
go back to reference Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency–based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 2018, 1–8. Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency–based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 2018, 1–8.
go back to reference Nam, J., Hyung, Z., & Lee, K. (2013). Acoustic scene classification using sparse feature learning and selective max-pooling by event detection. In IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2013). Nam, J., Hyung, Z., & Lee, K. (2013). Acoustic scene classification using sparse feature learning and selective max-pooling by event detection. In IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2013).
go back to reference Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., & Mertins, A. (2017). Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1278–1290.CrossRef Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., & Mertins, A. (2017). Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1278–1290.CrossRef
go back to reference Rakotomamonjy, A. (2017). Supervised representation learning for audio scene classification. IEEE/ACM Transactions on Audio, Speech, And Language Processing, 25(6), 1253–1265.CrossRef Rakotomamonjy, A. (2017). Supervised representation learning for audio scene classification. IEEE/ACM Transactions on Audio, Speech, And Language Processing, 25(6), 1253–1265.CrossRef
go back to reference Schroder, J., Moritz, N., Anemuller, J., Goetze, S., & Kollmeier, B. (2017). Classifier architectures for acoustic scenes and events: Implications for DNNs, TDNNs, and perceptual features from DCASE 2016. IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(6), 1304–1314.CrossRef Schroder, J., Moritz, N., Anemuller, J., Goetze, S., & Kollmeier, B. (2017). Classifier architectures for acoustic scenes and events: Implications for DNNs, TDNNs, and perceptual features from DCASE 2016. IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(6), 1304–1314.CrossRef
go back to reference Schroder, J., Wabnik, S., van Hengel, P. W. J., & Gotze, S. (2011). Detection and classification of acoustic events for in-home care. In Ambient assisted living (pp. 181–195). Berlin: Springer.CrossRef Schroder, J., Wabnik, S., van Hengel, P. W. J., & Gotze, S. (2011). Detection and classification of acoustic events for in-home care. In Ambient assisted living (pp. 181–195). Berlin: Springer.CrossRef
go back to reference Vajda, S., & Santosh, K. C. (2017). A fast k-nearest neighbor classifier using unsupervised clustering. In Recent trends in image processing and pattern recognition, CCIS (Vol. 709, pp. 185–193). Singapore: Springer.CrossRef Vajda, S., & Santosh, K. C. (2017). A fast k-nearest neighbor classifier using unsupervised clustering. In Recent trends in image processing and pattern recognition, CCIS (Vol. 709, pp. 185–193). Singapore: Springer.CrossRef
go back to reference Valenti, M., Squartini, S., Diment, A., Parascandolo, G., & Virtanen, T. (2016). A convolutional neural network approach for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016). Valenti, M., Squartini, S., Diment, A., Parascandolo, G., & Virtanen, T. (2016). A convolutional neural network approach for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).
go back to reference Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., & Sarti, A. (2007). Scream and gunshot detection and localization for audio surveillance systems. In IEEE International Conference on Advanced video and Signal based Surveillance. Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., & Sarti, A. (2007). Scream and gunshot detection and localization for audio surveillance systems. In IEEE International Conference on Advanced video and Signal based Surveillance.
go back to reference Wang, C. H., You, J. K., & Liu, Y. W. (2017). Sound event detection from real-life audio by training a long short-term memory network with mono and stereo features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017). Wang, C. H., You, J. K., & Liu, Y. W. (2017). Sound event detection from real-life audio by training a long short-term memory network with mono and stereo features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).
go back to reference Wang, D. L., & Brown, G. J. (2006). Fundamentals of computational auditory scene analysis: Principles, algorithms, and applications. Hoboken: Wiley.CrossRef Wang, D. L., & Brown, G. J. (2006). Fundamentals of computational auditory scene analysis: Principles, algorithms, and applications. Hoboken: Wiley.CrossRef
go back to reference Xu, M., Xu, C., Duan, L., Jin, J. S., & Luo, S. (2008). Audio keywords generation for sports video analysis. ACM Transactions on Multimedia Computing, Communications, and Applications, 4(2), 1–23.CrossRef Xu, M., Xu, C., Duan, L., Jin, J. S., & Luo, S. (2008). Audio keywords generation for sports video analysis. ACM Transactions on Multimedia Computing, Communications, and Applications, 4(2), 1–23.CrossRef
go back to reference Xu, Y., Huang, Q., Wang, W., & Plumbley, M. D. (2016). Hierar-chical learning for Dnn-based acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016). Xu, Y., Huang, Q., Wang, W., & Plumbley, M. D. (2016). Hierar-chical learning for Dnn-based acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).
go back to reference Xu, Y., Huangy, Q., Wang, W., Jackson, P. J. B., & Plumbley, M. D. (2016). Fully Dnn-based multi-label regression for audio tagging. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016). Xu, Y., Huangy, Q., Wang, W., Jackson, P. J. B., & Plumbley, M. D. (2016). Fully Dnn-based multi-label regression for audio tagging. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).
go back to reference Zahid, S., Hussain, F., Rashid, M., Yousaf, M. H., & Habib, H. A. (2015). Optimized audio classification and segmentation algorithm by using ensemble methods. Hindawi Publishing Corporation Mathematical Problems in Engineering, Article ID 209814, p. 11. Zahid, S., Hussain, F., Rashid, M., Yousaf, M. H., & Habib, H. A. (2015). Optimized audio classification and segmentation algorithm by using ensemble methods. Hindawi Publishing Corporation Mathematical Problems in Engineering, Article ID 209814, p. 11.
Metadata
Title
Large scale data based audio scene classification
Authors
E. Sophiya
S. Jothilakshmi
Publication date
04-09-2018
Publisher
Springer US
Published in
International Journal of Speech Technology / Issue 4/2018
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI
https://doi.org/10.1007/s10772-018-9552-3

Other articles of this Issue 4/2018

International Journal of Speech Technology 4/2018 Go to the issue