Skip to main content
Top

2018 | OriginalPaper | Chapter

ISNN: Impact Sound Neural Network for Audio-Visual Object Classification

Authors : Auston Sterling, Justin Wilson, Sam Lowe, Ming C. Lin

Published in: Computer Vision – ECCV 2018

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

3D object geometry reconstruction remains a challenge when working with transparent, occluded, or highly reflective surfaces. While recent methods classify shape features using raw audio, we present a multimodal neural network optimized for estimating an object’s geometry and material. Our networks use spectrograms of recorded and synthesized object impact sounds and voxelized shape estimates to extend the capabilities of vision-based reconstruction. We evaluate our method on multiple datasets of both recorded and synthesized sounds. We further present an interactive application for real-time scene reconstruction in which a user can strike objects, producing sound that can instantly classify and segment the struck object, even if the object is transparent or visually occluded.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
1.
go back to reference Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Washington, DC, USA, pp. 580–587. IEEE Computer Society (2014) Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Washington, DC, USA, pp. 580–587. IEEE Computer Society (2014)
2.
go back to reference Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015) Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
3.
go back to reference Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shape modeling. In: Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shape modeling. In: Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
4.
go back to reference Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015) Maturana, D., Scherer, S.: VoxNet: a 3D convolutional neural network for real-time object recognition. In: IROS (2015)
5.
go back to reference Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: Conference on Neural Information Processing Systems (NIPS) (2012) Socher, R., Huval, B., Bhat, B., Manning, C.D., Ng, A.Y.: Convolutional-recursive deep learning for 3D object classification. In: Conference on Neural Information Processing Systems (NIPS) (2012)
6.
go back to reference Golodetz, S., et al.: SemanticPaint: a framework for the interactive segmentation of 3D scenes. Technical report TVG-2015-1, Department of Engineering Science, University of Oxford, Oct 2015. Released as arXiv e-print arXiv:1510.03727 Golodetz, S., et al.: SemanticPaint: a framework for the interactive segmentation of 3D scenes. Technical report TVG-2015-1, Department of Engineering Science, University of Oxford, Oct 2015. Released as arXiv e-print arXiv:​1510.​03727
7.
go back to reference Valentin, J., et al.: SemanticPaint: interactive 3D labeling and learning at your fingertips. ACM Trans. Graph. 34(5) (2015)CrossRef Valentin, J., et al.: SemanticPaint: interactive 3D labeling and learning at your fingertips. ACM Trans. Graph. 34(5) (2015)CrossRef
8.
go back to reference Zhang, Z., et al.: Generative modeling of audible shapes for object perception. In: The IEEE International Conference on Computer Vision (ICCV) (2017) Zhang, Z., et al.: Generative modeling of audible shapes for object perception. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
9.
go back to reference Arnab, A., et al.: Joint object-material category segmentation from audio-visual cues. In: Proceedings of the British Machine Vision Conference (BMVC) (2015) Arnab, A., et al.: Joint object-material category segmentation from audio-visual cues. In: Proceedings of the British Machine Vision Conference (BMVC) (2015)
10.
go back to reference Singh, A., Sha, J., Narayan, K.S., Achim, T., Abbeel, P.: BigBIRD: a large-scale 3D database of object instances. In: IEEE International Conference on Robotics and Automation (ICRA) (2014) Singh, A., Sha, J., Narayan, K.S., Achim, T., Abbeel, P.: BigBIRD: a large-scale 3D database of object instances. In: IEEE International Conference on Robotics and Automation (ICRA) (2014)
11.
go back to reference Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International Conference on Robotics and Automation (ICRA) (2011) Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International Conference on Robotics and Automation (ICRA) (2011)
12.
go back to reference Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet: learning object classification using unsupervised viewpoint estimation. CoRR abs/1603.06208 (2016) Kanezaki, A., Matsushita, Y., Nishida, Y.: Rotationnet: learning object classification using unsupervised viewpoint estimation. CoRR abs/1603.06208 (2016)
14.
go back to reference Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition (2017) Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: Proceedings of 30th IEEE Conference on Computer Vision and Pattern Recognition (2017)
15.
go back to reference Westoby, M., Brasington, J., Glasser, N., Hambrey, M., Reynolds, J.: structure-from-motion photogrammetry: a low-cost, effective tool for geoscience applications. Geomorphology 179, 300–314 (2012)CrossRef Westoby, M., Brasington, J., Glasser, N., Hambrey, M., Reynolds, J.: structure-from-motion photogrammetry: a low-cost, effective tool for geoscience applications. Geomorphology 179, 300–314 (2012)CrossRef
16.
go back to reference Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006) Seitz, S.M., Curless, B., Diebel, J., Scharstein, D., Szeliski, R.: A comparison and evaluation of multi-view stereo reconstruction algorithms. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
17.
go back to reference Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)CrossRef Zhang, R., Tsai, P.S., Cryer, J.E., Shah, M.: Shape-from-shading: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 21(8), 690–706 (1999)CrossRef
18.
go back to reference Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: International Symposium on Mixed and Augmented Reality (ISMAR) (2011) Newcombe, R.A., et al.: KinectFusion: real-time dense surface mapping and tracking. In: International Symposium on Mixed and Augmented Reality (ISMAR) (2011)
19.
go back to reference Newcombe, R., Fox, D., Seitz, S.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015) Newcombe, R., Fox, D., Seitz, S.: DynamicFusion: reconstruction and tracking of non-rigid scenes in real-time. In: Proceedings of 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
20.
go back to reference Dai, A., Niessner, M., Zollhofer, M., Izadi, S., Theobalt, C.: BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. In: SIGGRAPH (2017) Dai, A., Niessner, M., Zollhofer, M., Izadi, S., Theobalt, C.: BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration. In: SIGGRAPH (2017)
21.
go back to reference Lysenkov, I., Eruhimov, V., Bradski, G.: Recognition and pose estimation of rigid transparent objects with a kinect sensor. In: Robotics: Science and Systems Conference (RSS) (2013) Lysenkov, I., Eruhimov, V., Bradski, G.: Recognition and pose estimation of rigid transparent objects with a kinect sensor. In: Robotics: Science and Systems Conference (RSS) (2013)
22.
go back to reference Aberman, K., et al.: Dip transform for 3D shape reconstruction. In: SIGGRAPH (2017) Aberman, K., et al.: Dip transform for 3D shape reconstruction. In: SIGGRAPH (2017)
23.
go back to reference Tanaka, K., Mukaigawa, Y., Funatomi, T., Kubo, H., Matsushita, Y., Yagi, Y.: Material classification using frequency- and depth-dependent time-of-flight distortion. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 79–88, July 2017 Tanaka, K., Mukaigawa, Y., Funatomi, T., Kubo, H., Matsushita, Y., Yagi, Y.: Material classification using frequency- and depth-dependent time-of-flight distortion. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 79–88, July 2017
24.
go back to reference Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: Proceedings of the IEEE ICASSP 2017, New Orleans, LA (2017) Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: Proceedings of the IEEE ICASSP 2017, New Orleans, LA (2017)
25.
go back to reference Piczak, K.J.: Esc: dataset for environmental sound classification. In: Proceedings of the 23rd ACM International Conference on Multimedia, MM 2015, New York, NY, USA, pp. 1015–1018. ACM (2015) Piczak, K.J.: Esc: dataset for environmental sound classification. In: Proceedings of the 23rd ACM International Conference on Multimedia, MM 2015, New York, NY, USA, pp. 1015–1018. ACM (2015)
26.
go back to reference Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, New York, NY, USA, pp. 1041–1044. ACM (2014) Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, New York, NY, USA, pp. 1041–1044. ACM (2014)
27.
go back to reference Büchler, M., Allegro, S., Launer, S., Dillier, N.: Sound classification in hearing aids inspired by auditory scene analysis. EURASIP J. Adv. Signal Process. 2005(18), 387845 (2005)CrossRef Büchler, M., Allegro, S., Launer, S., Dillier, N.: Sound classification in hearing aids inspired by auditory scene analysis. EURASIP J. Adv. Signal Process. 2005(18), 387845 (2005)CrossRef
28.
go back to reference Cowling, M., Sitte, R.: Comparison of techniques for environmental sound recognition. Pattern Recognit. Lett. 24(15), 2895–2907 (2003)CrossRef Cowling, M., Sitte, R.: Comparison of techniques for environmental sound recognition. Pattern Recognit. Lett. 24(15), 2895–2907 (2003)CrossRef
29.
go back to reference Barchiesi, D., Giannoulis, D., Stowell, D., Plumbley, M.D.: Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process. Mag. 32(3), 16–34 (2015)CrossRef Barchiesi, D., Giannoulis, D., Stowell, D., Plumbley, M.D.: Acoustic scene classification: classifying environments from the sounds they produce. IEEE Signal Process. Mag. 32(3), 16–34 (2015)CrossRef
30.
go back to reference Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, Sept 2015 Piczak, K.J.: Environmental sound classification with convolutional neural networks. In: 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, Sept 2015
31.
go back to reference Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017)CrossRef Salamon, J., Bello, J.P.: Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Process. Lett. 24(3), 279–283 (2017)CrossRef
32.
go back to reference Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135, Mar 2017 Hershey, S., et al.: CNN architectures for large-scale audio classification. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 131–135, Mar 2017
33.
go back to reference Huzaifah, M.: Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. CoRR abs/1706.07156 (2017) Huzaifah, M.: Comparison of time-frequency representations for environmental sound classification using convolutional neural networks. CoRR abs/1706.07156 (2017)
34.
go back to reference Ren, Z., Yeh, H., Lin, M.C.: Example-guided physically based modal sound synthesis. ACM Trans. Graph. 32(1), 1:1–1:16 (2013)CrossRef Ren, Z., Yeh, H., Lin, M.C.: Example-guided physically based modal sound synthesis. ACM Trans. Graph. 32(1), 1:1–1:16 (2013)CrossRef
35.
go back to reference Sterling, A., Lin, M.C.: Interactive modal sound synthesis using generalized proportional damping. In: Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D 2016, New York, NY, USA, pp. 79–86. ACM (2016) Sterling, A., Lin, M.C.: Interactive modal sound synthesis using generalized proportional damping. In: Proceedings of the 20th ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, I3D 2016, New York, NY, USA, pp. 79–86. ACM (2016)
36.
go back to reference Zhang, Z., Li, Q., Huang, Z., Wu, J., Tenenbaum, J., Freeman, B.: Shape and material from sound. In: Guyon, I. (eds.) Advances in Neural Information Processing Systems, vol. 30. pp. 1278–1288. Curran Associates, Inc. (2017) Zhang, Z., Li, Q., Huang, Z., Wu, J., Tenenbaum, J., Freeman, B.: Shape and material from sound. In: Guyon, I. (eds.) Advances in Neural Information Processing Systems, vol. 30. pp. 1278–1288. Curran Associates, Inc. (2017)
39.
go back to reference Aytar, Y., Vondrick, C., Torralba, A.: Soundnet: Learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900. (2016) Aytar, Y., Vondrick, C., Torralba, A.: Soundnet: Learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900. (2016)
40.
go back to reference Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E.H., Freeman, W.T.: Visually indicated sounds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2405–2413 (2016) Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E.H., Freeman, W.T.: Visually indicated sounds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2405–2413 (2016)
41.
go back to reference Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 568–576. Curran Associates, Inc. (2014) Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 568–576. Curran Associates, Inc. (2014)
42.
go back to reference Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)CrossRef Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)CrossRef
43.
go back to reference Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: International Conference on Computer Vision (ICCV) (2015) Lin, T.Y., RoyChowdhury, A., Maji, S.: Bilinear CNN models for fine-grained visual recognition. In: International Conference on Computer Vision (ICCV) (2015)
44.
go back to reference Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016) Gao, Y., Beijbom, O., Zhang, N., Darrell, T.: Compact bilinear pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 317–326 (2016)
45.
go back to reference Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: IEEE International Conference on Computer Vision (ICCV), pp. 1839–1848 (2017) Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: IEEE International Conference on Computer Vision (ICCV), pp. 1839–1848 (2017)
46.
go back to reference Park, E., Han, X., Berg, T.L., Berg, A.C.: Combining multiple sources of knowledge in deep CNNs for action recognition. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016) Park, E., Han, X., Berg, T.L., Berg, A.C.: Combining multiple sources of knowledge in deep CNNs for action recognition. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8. IEEE (2016)
47.
go back to reference O’Brien, J.F., Shen, C., Gatchalian, C.M.: Synthesizing sounds from rigid-body simulations. In: Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2002, New York, NY, USA, pp. 175–181. ACM (2002) O’Brien, J.F., Shen, C., Gatchalian, C.M.: Synthesizing sounds from rigid-body simulations. In: Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 2002, New York, NY, USA, pp. 175–181. ACM (2002)
48.
go back to reference Raghuvanshi, N., Lin, M.C.: Interactive sound synthesis for large scale environments. In: Proceedings of the 2006 Symposium on Interactive 3D Graphics and Games, I3D 2006, New York, NY, USA, pp. 101–108. ACM (2006) Raghuvanshi, N., Lin, M.C.: Interactive sound synthesis for large scale environments. In: Proceedings of the 2006 Symposium on Interactive 3D Graphics and Games, I3D 2006, New York, NY, USA, pp. 101–108. ACM (2006)
49.
go back to reference van den Doel, K., Pai, D.K.: The sounds of physical shapes. Presence 7, 382–395 (1996)CrossRef van den Doel, K., Pai, D.K.: The sounds of physical shapes. Presence 7, 382–395 (1996)CrossRef
50.
go back to reference Morrison, J.D., Adrien, J.M.: Mosaic: a framework for modal synthesis. Comput. Music. J. 17(1), 45–56 (1993)CrossRef Morrison, J.D., Adrien, J.M.: Mosaic: a framework for modal synthesis. Comput. Music. J. 17(1), 45–56 (1993)CrossRef
51.
go back to reference James, D.L., Barbič, J., Pai, D.K.: Precomputed acoustic transfer: output-sensitive, accurate sound generation for geometrically complex vibration sources. ACM Trans. Graph. (TOG) 25, 987–995 (2006)CrossRef James, D.L., Barbič, J., Pai, D.K.: Precomputed acoustic transfer: output-sensitive, accurate sound generation for geometrically complex vibration sources. ACM Trans. Graph. (TOG) 25, 987–995 (2006)CrossRef
52.
go back to reference Schissler, C., Manocha, D.: Gsound: interactive sound propagation for games. In: Audio Engineering Society Conference: 41st International Conference: Audio for Games, Feb 2011 Schissler, C., Manocha, D.: Gsound: interactive sound propagation for games. In: Audio Engineering Society Conference: 41st International Conference: Audio for Games, Feb 2011
53.
go back to reference Thiemann, J., Ito, N., Vincent, E.: The diverse environments multi-channel acoustic noise database (demand): a database of multichannel environmental noise recordings. Proc. Meet. Acoust. 19(1), 035081 (2013)CrossRef Thiemann, J., Ito, N., Vincent, E.: The diverse environments multi-channel acoustic noise database (demand): a database of multichannel environmental noise recordings. Proc. Meet. Acoust. 19(1), 035081 (2013)CrossRef
54.
go back to reference Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 2546–2554. Curran Associates, Inc. (2011) Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 24, pp. 2546–2554. Curran Associates, Inc. (2011)
56.
go back to reference Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015) Kingma, D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR) (2015)
Metadata
Title
ISNN: Impact Sound Neural Network for Audio-Visual Object Classification
Authors
Auston Sterling
Justin Wilson
Sam Lowe
Ming C. Lin
Copyright Year
2018
DOI
https://doi.org/10.1007/978-3-030-01267-0_34

Premium Partner