Top

International Journal of Speech Technology

Published in:

04-09-2018

Large scale data based audio scene classification

Authors: E. Sophiya, S. Jothilakshmi

Published in: International Journal of Speech Technology | Issue 4/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Artificial Intelligence and Machine learning has been used by many research groups for processing large scale data known as big data. Machine learning techniques to handle large scale complex datasets are expensive to process computation. Apache Spark framework called spark MLlib is becoming a popular platform for handling big data analysis and it is used for many machine learning problems such as classification, regression and clustering. In this work, Apache Spark and the advanced machine learning architecture of a Deep Multilayer Perceptron (MLP), is proposed for Audio Scene Classification. Log Mel band features are used to represent the characteristics of the input audio scenes. The parameters of the DNN are set according to the DNN baseline of DCASE 2017 challenge. The system is evaluated with TUT dataset (2017) and the result is compared with the baseline provided.

previous article A new efficient backward BSS crosstalk-resistant algorithm for automatic blind speech quality enhancement

next article Improvement in monaural speech separation using sparse non-negative tucker decomposition

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Abeber, J., Mimilakis, S. I., Grafe, R., & Lukashevich, H. (2017). Acoustic Scene classification by combining autoencoder-based dimensionality reduction and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

Bisot, V., Serizel, R., Essid, S., & Richard, G. (2017). Nonnegative feature learning methods for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE 2017).

Bouguelia, M. R., Verikas, A., Nowaczyk, S., & Santosh, K. C. (2017). Agreeing to disagree: Active learning with noisy labels without crowdsourcing. International Journal of Machine Learning and Cybernetics, 9(8), 1307–1319.CrossRef

Bugalho, M., Portelo, J., Trancoso, I., Pellegrini, T. S., & Abad, A. (2009). Detecting audio events for semantic video search. Interspeech, 2009, 1151–1154.

Cai, R., Lu, L., Hanjalic, A., Zhang, H. J., & Cai, L. H. (2006). A flexible framework for key audio effects detection and auditory context inference. IEEE Transactions on Audio, Speech, and Language Processing, 14(3), 1026–1039.CrossRef

Candel, A., Lanford, J., LeDell, E., Parmar, V., & Arora, A. (2015) Deep learning with H2O, by H2O.ai, c.

Cotton, C. V., & Ellis, D. P. W. (2011). Spectral vs spectro-temporal features for acoustic event detection. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on (pp. 69–72). IEEE.

Dahl, G. E., Sainath, T. N., & Hinton, G. E. (2013). Improving deep neural networks for lvcsr using rectifier linear units and dropout. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8609–8613). IEEE.

Dey, N., & Ashour, A. S. (2018). Challenges and future perspectives in speech-sources direction of arrival estimation and localization. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRef

Dey, N., & Ashour, A. S. (2018). Applied examples and applications of localization and tracking problem of multiple speech sources. In Direction of arrival estimation and localization of multi-speech sources (pp. 35–48). Cham: Springer.CrossRef

Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez Gutierrez, E., & Serra, X. (2017). Acoustic scene classification by ensembling gradient boosting machine and convolutional neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

Gupta, A., Thakur, H. K., Shrivastava, R., Kumar, P., & Nag, S. (2017). A big data analysis framework using apache spark and deep learning. In Data Mining Workshops (ICDMW), 2017 IEEE International Conference on (pp. 9–16). IEEE.

Han, Y., Park, J., & Lee, K. (2017). Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

Heittola, T., Mesaros, A., Eronen, A., & Virtanen, T. (2013). Context-dependent sound event detection. EURASIP Journal on Audio, Speech, and Music Processing, 1, 1–13.CrossRef

Hinton, G., Deng, L., Yu, D., Dahl, G. E., Mohamed, A. R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. N., & Kingsbury, B. (2012). Deep neural network for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29, 82–97.CrossRef

Jimenez, A., Elizalde, B., & Raj, B. (2017). Acoustic scene classification using shiftinvariant kernels and random features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

Kong, Q., Sobieraj, I., Wang, W., & Plumbley, M. D. (2016). Deep neural network baseline for dcase challenge 2016. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).

Kumar, A., & Raj, B. (2016). Audio event and scene recognition: A unified approach using strongly and weakly labeled data. In Neural Networks (IJCNN), 2017 International Joint Conference on (pp. 3475–3482). IEEE.

Laffitte, P., Sodoyer, D., Tatkeu, C., & Girin, L. (2016). Deep neural networks for automatic detection of screams and shouted speech in subway trains. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on (pp. 6460–6464). IEEE.

Lee, H., Pham, P., Largman, Y., & Ng, A. Y. (2009). Unsuper-vised feature learning for audio classification using convolutional deep belief networks. Advances in Neural Information Processing Systems, 2009, 1096–1104.

Lim, H., Park, J., & Han, Y. (2017). Rare sound event detection using 1d convolutional recurrent neural networks. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

Mesaros, A., Heittola, T., & Klapuri, A. (2011). Latent semantic analysis in sound event detection. In Signal Processing Conference, 2011 19th European (pp. 1307–1311). IEEE.

Mesaros, A., Heittola, T., & Virtanen, T. (2016). TUT database for acoustic scene classification and sound event detection. In Signal Processing Conference (EUSIPCO), 2016 24th European (pp. 1128–1132). IEEE.

Mukherjee, H., Obaidullah, S. M., Santosh, K. C., Phadikar, S., & Roy, K. (2018). Line spectral frequency–based features and extreme learning machine for voice activity detection from audio signal. International Journal of Speech Technology, 2018, 1–8.

Nam, J., Hyung, Z., & Lee, K. (2013). Acoustic scene classification using sparse feature learning and selective max-pooling by event detection. In IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE2013).

Phan, H., Hertel, L., Maass, M., Koch, P., Mazur, R., & Mertins, A. (2017). Improved audio scene classification based on label-tree embeddings and convolutional neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6), 1278–1290.CrossRef

Rakotomamonjy, A. (2017). Supervised representation learning for audio scene classification. IEEE/ACM Transactions on Audio, Speech, And Language Processing, 25(6), 1253–1265.CrossRef

Schroder, J., Moritz, N., Anemuller, J., Goetze, S., & Kollmeier, B. (2017). Classifier architectures for acoustic scenes and events: Implications for DNNs, TDNNs, and perceptual features from DCASE 2016. IEEE/ACM Transactions on Audio, Speech and Language Processing, 25(6), 1304–1314.CrossRef

Schroder, J., Wabnik, S., van Hengel, P. W. J., & Gotze, S. (2011). Detection and classification of acoustic events for in-home care. In Ambient assisted living (pp. 181–195). Berlin: Springer.CrossRef

Vajda, S., & Santosh, K. C. (2017). A fast k-nearest neighbor classifier using unsupervised clustering. In Recent trends in image processing and pattern recognition, CCIS (Vol. 709, pp. 185–193). Singapore: Springer.CrossRef

Valenti, M., Squartini, S., Diment, A., Parascandolo, G., & Virtanen, T. (2016). A convolutional neural network approach for acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).

Valenzise, G., Gerosa, L., Tagliasacchi, M., Antonacci, F., & Sarti, A. (2007). Scream and gunshot detection and localization for audio surveillance systems. In IEEE International Conference on Advanced video and Signal based Surveillance.

Wang, C. H., You, J. K., & Liu, Y. W. (2017). Sound event detection from real-life audio by training a long short-term memory network with mono and stereo features. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2017).

Wang, D. L., & Brown, G. J. (2006). Fundamentals of computational auditory scene analysis: Principles, algorithms, and applications. Hoboken: Wiley.CrossRef

Xu, M., Xu, C., Duan, L., Jin, J. S., & Luo, S. (2008). Audio keywords generation for sports video analysis. ACM Transactions on Multimedia Computing, Communications, and Applications, 4(2), 1–23.CrossRef

Xu, Y., Huang, Q., Wang, W., & Plumbley, M. D. (2016). Hierar-chical learning for Dnn-based acoustic scene classification. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).

Xu, Y., Huangy, Q., Wang, W., Jackson, P. J. B., & Plumbley, M. D. (2016). Fully Dnn-based multi-label regression for audio tagging. In IEEE proceedings of the detection and classification of acoustic scenes and events (DCASE2016).

Zahid, S., Hussain, F., Rashid, M., Yousaf, M. H., & Habib, H. A. (2015). Optimized audio classification and segmentation algorithm by using ensemble methods. Hindawi Publishing Corporation Mathematical Problems in Engineering, Article ID 209814, p. 11.

Title: Large scale data based audio scene classification
Authors: E. Sophiya
S. Jothilakshmi
Publication date: 04-09-2018
Publisher: Springer US
Published in: International Journal of Speech Technology / Issue 4/2018
Print ISSN: 1381-2416
Electronic ISSN: 1572-8110
DOI: https://doi.org/10.1007/s10772-018-9552-3

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 4/2018

Reduction of residual noise based on eigencomponent filtering for speech enhancement

Correction to: Revisiting distinctive phonetic features from applied computing perspective: unifying views and analyzing modern Arabic speech varieties

An investigation on the degradation of different features extracted from the compressed American English speech using narrowband and wideband codecs

An investigation of the impact of MVA normalization on the advanced front-end features

Evaluation of speech unit modelling for HMM-based speech synthesis for Arabic

Arabic discourse analysis based on acoustic, prosodic and phonetic modeling: elocution evaluation, speech classification and pathological speech correction