nach oben

Artificial Intelligence Review

Erschienen in:

01.01.2024

Acoustic-based LEGO recognition using attention-based convolutional neural networks

verfasst von: Van-Thuan Tran, Chia-Yang Wu, Wei-Ho Tsai

Erschienen in: Artificial Intelligence Review | Ausgabe 1/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This work investigates the classification of LEGO types using deep learning-based audio classification approaches. The motivation for this investigation is based on the following assumption. If objects of the same shape fall freely from a certain height and hit a fixed plane, the impact sounds will be very similar, so we can distinguish the same types of objects from the others. Applying this idea to LEGO recognition, we collect impact sounds of 200 LEGO objects that fall from a height of about 30cm from a designated plane, and design a CNN-based recognition system that processes the impact sounds to determine the type of LEGO it belongs to. Recognizing that the fall of LEGO results in the main impact sound (i.e., only the sound at the moment of impact) and several subsequent sounds, we examine whether considering only the first impact sound or all sounds brings about better classification accuracies. We propose a compact two-dimensional CNN model, namely LegoNet, which is designed with a frame-level attention module at the input spectrogram and time-distributed fully-connected layers. Our experiments show that free-fall impact sounds can be used efficiently for accurate object recognition, and the proposed LegoNet, with a much smaller size, achieves better accuracy and robustness compared to baseline models. Also, using the whole sequence of impact sounds is more informative for LEGO classification than only considering the first impact sound. Moreover, it is found that utilizing data of specific object postures can help to improve the classifier’s performance in the case of small training data. The proposed approach can be employed as an extra module to build intelligent agents or object classification systems that require a rich understanding of the surrounding physical world.

Vorheriger Artikel Bartletts principal regressive and arbitrary African buffalo optimizatized three-dimensional protein structure prediction

Nächster Artikel Deep learning models for digital image processing: a review

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Aytar Y, Vondrick C, Torralba A (2016) SoundNet: learning sound representations from unlabeled video. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. pp 892–900

Bochkovskiy A, Wang C-Y, Liao H-YM (2020) YOLOv4 optimal speed and accuracy of object detection. arXiv:200410934 https://doi.org/10.48550/arXiv.2004.10934CrossRef

Boddapati V, Petef A, Rasmusson J, Lars L (2017) Classifying environmental sounds using image recognition networks. Procedia Comput Sci 112:2048–2056CrossRef

Buerano J, Zalameda J, Ruiz RS (2012) Microphone system optimization for free fall impact acoustic method in detection of rice kernel damage. Comput Electron Agric 85:140–148. https://doi.org/10.1016/J.COMPAG.2012.04.014CrossRef

Cao Y, Sun Y, Xie G, Li P (2022) A Sound-Based Fault Diagnosis Method for Railway Point Machines Based on Two-Stage Feature Selection Strategy and Ensemble Classifier. IEEE Trans Intell Transp Syst 23:12074–12083. https://doi.org/10.1109/TITS.2021.3109632CrossRef

Clarke S, Rhodes T, Atkeson CG, Kroemer O (2018) Learning audio feedback for estimating amount and flow of granular material. In: Billard A, Dragan A, Peters J, Morimoto J (eds) Proceedings of the 2nd conference on robot learning. PMLR, pp 529–550

Colangelo F, Battisti F, Neri A, Carli M (2018) Convolutional recurrent neural network for audio events classification. In: Detect. Classif. Acoust. Scenes Events Chall. 2018. https://dcase.community/documents/challenge2018/technical_reports/DCASE2018_Colangelo_61.pdf. Accessed 12 Sep 2022

Dai W, Dai C, Qu S, et al (2017) Very deep convolutional neural networks for raw waveforms. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing. Institute of Electrical and Electronics Engineers Inc., pp 421–425

Espinosa R, Ponce H, Gutiérrez S (2021) Click-event sound detection in automotive industry using machine/deep learning. Appl Soft Comput 108:1–12. https://doi.org/10.1016/J.ASOC.2021.107465CrossRef

Gandhi D, Gupta A, Pinto L (2020) Swoosh! rattle! thump!--actions that sound. In: Robotics: Science and Systems 2020. pp 1–10

Griffith S, Sinapov J, Sukhoy V, Stoytchev A (2012a) A behavior-grounded approach to forming object categories: Separating containers from noncontainers. IEEE Trans Auton Ment Dev 4:54–69. https://doi.org/10.1109/TAMD.2011.2157504CrossRef

Griffith S, Sukhoy V, Wegter T, Stoytchev A (2012b) Object categorization in the sink : learning behavior—grounded object categories with water. In: Proceedings of the 2012 ICRA Workshop on Semantic Perception, Mapping and Exploration. pp 1–6

Guo J, Xu N, Li L-J, Alwan A (2017) Attention based CLDNNs for short-duration acoustic scene classification. In: INTERSPEECH. pp 469–473

Hassan SU, Zeeshan Khan M, Ghani Khan MU, Saleem S (2019) Robust sound classification for surveillance using time frequency audio features. In: Proceeding of International Conference on Communication Technologies (ComTech). pp 13–18

Henze D, Gorishti K, Bruegge B, Simen J-P (2019) AudioForesight: A process model for audio predictive maintenance in industrial environments. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA). pp 352–357

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning. pp 448–456

Kim G, Han DK, Ko H (2021) SpecMix : a mixed sample data augmentation method for training withtime-frequency domain features. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech Communication Association, pp 6–10

Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: Proceeding of International Conference for Learning Representations. pp 1–15

Ko T, Peddinti V, Povey D, Khudanpur S (2015) Audio augmentation for speech recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. International Speech and Communication Association, pp 3586–3589

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems. Curran Associates, New York, pp 1097–1105

Lezhenin I, Bogach N, Pyshkin E (2019) Urban sound classification using long short-term memory neural network. In: Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, FedCSIS 2019. Institute of Electrical and Electronics Engineers, pp 57–60

Li X, Chebiyyam V, Kirchhoff K (2019) Multi-stream network with temporal attention for environmental sound classification. In: INTERSPEECH. pp 3604–3608

Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. Springer, Berlin, pp 21–37CrossRef

Lopez-Caudana E, Quiroz O, Rodríguez A et al (2017) Classification of materials by acoustic signal processing in real time for NAO robots. Int J Adv Robot Syst 14:1–10CrossRef

Mcfee B, Raffel C, Liang D, et al (2015) librosa: audio and music signal analysis in python. In: Proceeding of the 14th Python in Science Conference. pp 18–25

Mushtaq Z, Su S-F (2020) Environmental sound classification using a regularized deep convolutional neural network with data augmentation. Appl Acoust 167:1–13CrossRef

Nakamura T, Nagai T, Iwahashi N (2007) Multimodal object categorization by a robot. In: IEEE International Conference on Intelligent Robots and Systems. pp 2415–2420

Park DS, Chan W, Zhang Y, et al (2019) SpecAugment: a simple data augmentation method for automatic speech recognition. In: INTERSPEECH 2019. pp 2613–2617

Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149CrossRef

Salamon J, Bello JP (2017) Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification. IEEE Signal Process Lett 24:279–283CrossRef

Sehili MA, Lecouteux B, Vacher M et al (2012) Sound Environment analysis in smart home. In: Paternò F, de Ruyter B, Markopoulos P, Santoro C, van Loenen E, Luyten K (eds) Proceeding of ambient intelligence. Springer, Berlin, pp 208–223CrossRef

Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556. 1–14

Sinapov J, Wiemer M, Stoytchev A (2009) Interactive learning of the acoustic properties of household objects. In: Proceedings—IEEE International Conference on Robotics and Automation. pp 2518–2524

Srivastava N, Hinton GE, Krizhevsky A et al (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958MathSciNet

Toffa OK, Mignotte M (2021) Environmental sound classification using local binary pattern and audio features collaboration. IEEE Trans Multimed 23:3978–3985. https://doi.org/10.1109/TMM.2020.3035275CrossRef

Tokozume Y, Harada T (2017) Learning environmental sounds with end-to-end convolutional neural network. In: Proc. IEEE International Conference on Acoustics, Speech and Signal Processing. Institute of Electrical and Electronics Engineers. pp 2721–2725

Tran HD, Li H (2011) Sound event recognition with probabilistic distance SVMs. IEEE Trans Audio, Speech Lang Process 19:1556–1568. https://doi.org/10.1109/TASL.2010.2093519CrossRef

Tran VT, Tsai WH (2020) Acoustic-Based Emergency Vehicle Detection Using Convolutional Neural Networks. IEEE Access 8:75702–75713CrossRef

Tran VT, Tsai WH (2021) Audio-Vision Emergency Vehicle Detection. IEEE Sens J 21:27905–27917CrossRef

Tran V-T, Tsai W-H, Furletov Y, Gorodnichev M (2022) End-to-End Train Horn Detection for Railway Transit Safety. Sensors 22:4453. https://doi.org/10.3390/S22124453CrossRef

Tsalera E, Papadakis A, Samarakou M (2021) Novel principal component analysis-based feature selection mechanism for classroom sound classification. Comput Intell 37:1827–1843. https://doi.org/10.1111/COIN.12468MathSciNetCrossRef

Xu K, Feng D, Mi H, et al (2018) Mixup-based acoustic scene classification using multi-channel convolutional neural network. In: Advances in Multimedia Information Processing—PCM 2018. pp 14–23

Yun S, Han D, Chun S, et al (2019) CutMix: regularization strategy to train strong classifiers with localizable features. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, pp 6022–6031

Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2018a) mixup: beyond empirical risk minimization. In: 6th International Conference on Learning Representations, ICLR 2018. pp 1–13

Zhang Z, Wen G, Chen S (2018b) Audible Sound-Based Intelligent Evaluation for Aluminum Alloy in Robotic Pulsed GTAW: Mechanism, Feature Selection, and Defect Detection. IEEE Trans Ind Informatics 14:2973–2983. https://doi.org/10.1109/TII.2017.2775218CrossRef

Zhang Z, Xu S, Zhang S et al (2021) Attention based convolutional recurrent neural network for environmental sound classification. Neurocomputing 453:896–903CrossRef

Titel: Acoustic-based LEGO recognition using attention-based convolutional neural networks
verfasst von: Van-Thuan Tran
Chia-Yang Wu
Wei-Ho Tsai
Publikationsdatum: 01.01.2024
Verlag: Springer Netherlands
Erschienen in: Artificial Intelligence Review / Ausgabe 1/2024
Print ISSN: 0269-2821
Elektronische ISSN: 1573-7462
DOI: https://doi.org/10.1007/s10462-023-10625-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 1/2024

Bartletts principal regressive and arbitrary African buffalo optimizatized three-dimensional protein structure prediction

A comprehensive survey on scheduling algorithms using fuzzy systems in distributed environments

Computational methods for studying relationship between nutritional status and respiratory viral diseases: a systematic review

Interpretable scientific discovery with symbolic regression: a review

A novel approach to fuzzy N-soft sets and its application for identifying and sanctioning cyber harassment on social media platforms

Feature subset selection for multi-scale neighborhood decision information system via mutual information

Premium Partner