Top

World Wide Web

Published in:

18-09-2018

Multimodal deep learning based on multiple correspondence analysis for disaster management

Authors: Samira Pouyanfar, Yudong Tao, Haiman Tian, Shu-Ching Chen, Mei-Ling Shyu

Published in: World Wide Web | Issue 5/2019

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The fast and explosive growth of digital data in social media and World Wide Web has led to numerous opportunities and research activities in multimedia big data. Among them, disaster management applications have attracted a lot of attention in recent years due to its impacts on society and government. This study targets content analysis and mining for disaster management. Specifically, a multimedia big data framework based on the advanced deep learning techniques is proposed. First, a video dataset of natural disasters is collected from YouTube. Then, two separate deep networks including a temporal audio model and a spatio-temporal visual model are presented to analyze the audio-visual modalities in video clips effectively. Thereafter, the results of both models are integrated using the proposed fusion model based on the Multiple Correspondence Analysis (MCA) algorithm which considers the correlations between data modalities and final classes. The proposed multimodal framework is evaluated on the collected disaster dataset and compared with several state-of-the-art single modality and fusion techniques. The results demonstrate the effectiveness of both visual model and fusion model compared to the baseline approaches. Specifically, the accuracy of the final multi-class classification using the proposed MCA-based fusion reaches to 73% on this challenging dataset.

previous article Guest editorial: special issue on big data for effective disaster management (In Memorial of Tao Li)

next article dTexSL: A dynamic disaster textual storyline generating framework

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Available at https://github.com/Breakthrough/PySceneDetect

Aytar, Y., Vondrick, C., Torralba, A.: SoundNet: Learning sound representations from unlabeled video. In: Advances in neural information processing systems, pp. 892–900 (2016)

Baecchi, C., Uricchio, T., Bertini, M., Del Bimbo, A.: A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed. Tools Appl. 75(5), 2507–2525 (2016)CrossRef

Bartlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face recognition by independent component analysis. IEEE Trans. Neural Netw. 13(6), 1450–1464 (2002)CrossRef

Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzadeh, A., Lee, S., Neumann, U., Narayanan, S.: Analysis of emotion recognition using facial expressions, speech and multimodal information. In: ACM international conference on multimodal interfaces, pp. 205–211 (2004)

Careem, M., De Silva, C., De Silva, R., Raschid, L., Weerawarana, S.: Sahana: Overview of a disaster management system. In: IEEE international conference on information and automation, pp. 361–366 (2006)

Chang, K.I., Bowyer, K.W., Flynn, P.J.: An evaluation of multimodal 2D + 3D face biometrics. IEEE Trans. Pattern Anal. Mach. Intell. 27(4), 619–624 (2005)CrossRef

Che, X., Ip, B., Lin, L.: A survey of current youtube video characteristics. IEEE Multimed. 22(2), 56–63 (2015)CrossRef

Chen, C., Zhu, Q., Lin, L., Shyu, M.L.: Web media semantic concept retrieval via tag removal and model fusion. ACM Trans. Intell. Syst. Technol. 4(4), 61 (2013)

Chen, M., Chen, S.C., Shyu, M.L., Wickramaratna, K.: Semantic event detection via multimodal data mining. IEEE Signal Process. Mag. 23(2), 38–46 (2006)CrossRef

10.

Chen, S.C., Shyu, M.L., Kashyap, R.L.: Augmented transition network as a semantic model for video data. Int. J. Netw. Inf. Syst., Special Issue on Video Data 3(1), 9–25 (2000)

11.

Chen, S.C., Shyu, M.L., Peeta, S., Zhang, C.: Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems. IEEE Trans. Intell. Transp. Syst. 4(3), 154–167 (2003)CrossRef

12.

Chen, S.C., Shyu, M.L., Zhang, C.: Innovative shot boundary detection for video indexing. In: Deb, S. (ed.) Video Data Management and Information Retrieval, 217–236. Idea Group Publishing (2005)

13.

Chen, X., Zhang, C., Chen, S.C., Rubin, S.: A human-centered multiple instance learning framework for semantic video retrieval. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 39(2), 228–233 (2009)CrossRef

14.

Fang, R., Pouyanfar, S., Yang, Y., Chen, S.C., Iyengar, S.S.: Computational health informatics in the big data age: A survey. ACM Comput. Surv. 49(1), 12:1–12:36 (2016)CrossRef

15.

Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649 (2013)

16.

Greenacre, M., Blasius, J.: Multiple correspondence analysis and related methods. Chapman and Hall/CRC press, London (2006)CrossRefMATH

17.

Grosky, W.I., Zhang, C., Chen, S.C.: Intelligent and pervasive multimedia systems. IEEE MultiMed. 16(1), 14–15 (2009)CrossRef

18.

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRef

19.

Hu, X., Deng, F., Li, K., Zhang, T., Chen, H., Jiang, X., Lv, J., Zhu, D., Faraco, C., Zhang, D., et al.: Bridging low-level features and high-level semantics via fMRI brain imaging for video classification. In: ACM international conference on multimedia, pp. 451–460 (2010)

20.

Huh, M., Agrawal, P., Efros, A.A.: What makes ImageNet good for transfer learning?. arXiv:1608.08614 (2016)

21.

Josse, J., Chavent, M., Liquet, B., Husson, F.: Handling missing values with regularized iterative multiple correspondence analysis. J. Classif. 29(1), 91–116 (2012)MathSciNetCrossRefMATH

22.

Kahou, S.E., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, K., Jean, S., Froumenty, P., Dauphin, Y., Boulanger-Lewandowski, N., et al.: Emonets: Multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10(2), 99–111 (2016)CrossRef

23.

Kessous, L., Castellano, G., Caridakis, G.: Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis. J. Multimodal User Interfaces 3(1-2), 33–48 (2010)CrossRef

24.

Khan, S., Yong, S.P.: A comparison of deep learning and hand crafted features in medical image modality classification. In: IEEE international conference on computer and information sciences, pp. 633–638 (2016)

25.

Kim, Y.: Convolutional neural networks for sentence classification. In: Conference on empirical methods in natural language processing, pp. 1746–1751 (2014)

26.

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)

27.

Lahat, D., Adali, T., Jutten, C.: Multimodal data fusion: an overview of methods, challenges, and prospects. Proc. IEEE 103(9), 1449–1477 (2015)CrossRef

28.

Li, T., Xie, N., Zeng, C., Zhou, W., Zheng, L., Jiang, Y., Yang, Y., Ha, H.Y., Xue, W., Huang, Y., et al.: Data-driven techniques in disaster information management. ACM Comput. Surv. 50(1), 1:1–1:45 (2017)CrossRef

29.

Li, Y., Gai, K., Ming, Z., Zhao, H., Qiu, M.: Intercrossed access controls for secure financial services on multimedia big data in cloud systems. ACM Trans. Multimed. Comput. Commun. Appl. 12(4), 67:1–67:18 (2016)

30.

Lin, L., Chen, C., Shyu, M.L., Chen, S.C.: Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE MultiMed. 18(3), 32–43 (2011)CrossRef

31.

Lin, L., Ravitz, G., Shyu, M.L., Chen, S.C.: Effective feature space reduction with imbalanced data for semantic concept detection. In: IEEE international conference on sensor networks, ubiquitous and trustworthy computing, pp. 262–269 (2008)

32.

Lin, L., Shyu, M.L.: Weighted association rule mining for video semantic detection. Methods and Innovations for Multimedia Database Content Management 1 (1), 37–54 (2012)

33.

Maestre, E., Papiotis, P., Marchini, M., Llimona, Q., Mayor, O., Pérez, A., Wanderley, M.M.: Enriched multimodal representations of music performances: Online access and visualization. IEEE MultiMed. 24(1), 24–34 (2017)CrossRef

34.

McDonald, K., Smeaton, A.F.: A comparison of score, rank and probability-based fusion methods for video shot retrieval. In: International conference on image and video retrieval, pp. 61–70 (2005)

35.

Meissner, A., Luckenbach, T., Risse, T., Kirste, T., Kirchner, H.: Design challenges for an integrated disaster management communication and information system. In: IEEE workshop on disaster recovery networks (2002)

36.

Meng, T., Shyu, M.L.: Leveraging concept association network for multimedia rare concept mining and retrieval. In: IEEE international conference on multimedia and expo, pp. 860–865 (2012)

37.

Messing, R., Pal, C., Kautz, H.: Activity recognition using the velocity histories of tracked keypoints. In: IEEE international conference on computer vision, pp. 104–111 (2009)

38.

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: International conference on machine learning, pp. 689–696 (2011)

39.

Pan, L., Pouyanfar, S., Chen, H., Qin, J., Chen, S.C.: Deepfood: Automatic multi-class classification of food ingredients using deep learning. In: IEEE international conference on collaboration and internet computing, pp. 181–189 (2017)

40.

Poria, S., Cambria, E., Hussain, A., Huang, G.B.: Towards an intelligent framework for multimodal affective data analysis. Neural Netw. 63, 104–116 (2015)CrossRef

41.

Pouyanfar, S., Chen, S.C.: Semantic concept detection using weighted discretization multiple correspondence analysis for disaster information management. In: IEEE international conference on information reuse and integration, pp. 556–564 (2016)

42.

Pouyanfar, S., Chen, S.C.: Automatic video event detection for imbalance data using enhanced ensemble deep learning. Int. J. Semantic Comput. 11(01), 85–109 (2017)CrossRef

43.

Pouyanfar, S., Chen, S.C., Shyu, M.L.: Deep spatio-temporal representation learning for multi-class imbalanced data classification. In: IEEE international conference on information reuse and integration for data science, pp. 386–393 (2018)

44.

Pouyanfar, S., Yang, Y., Chen, S.C., Shyu, M.L., Iyengar, S.S.: Multimedia big data analytics: A survey. ACM Comput. Surv. 51(1), 10:1–10:34 (2018)CrossRef

45.

Saporta, G.: Data fusion and data grafting. Comput. Stat. Data Anal. 38(4), 465–473 (2002)MathSciNetCrossRefMATH

46.

Shyu, M.L., Chen, S.C.: Emerging multimedia research and applications. IEEE MultiMed. 22(4), 11–13 (2015)CrossRef

47.

Shyu, M.L., Chen, S.C., Kashyap, R.L.: Generalized affinity-based association rule mining for multimedia database queries. Knowl. Inf. Syst. 3(3), 319–337 (2001)CrossRefMATH

48.

Shyu, M.L., Sarinnapakorn, K., Kuruppu-Appuhamilage, I., Chen, S.C., Chang, L., Goldring, T.: Handling nominal features in anomaly intrusion detection problems. In: International workshop on research issues in data engineering: Stream data mining and applications, pp. 55–62 (2005)

49.

Smith, J.R.: Riding the multimedia big data wave. In: ACM SIGIR conference on research and development in information retrieval, pp. 1–2 (2013)

50.

Song, F., Guo, Z., Mei, D.: Feature selection using principal component analysis. In: IEEE international conference on system science, engineering design and manufacturing informatization, pp. 27–30 (2010)

51.

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE conference on computer vision and pattern recognition, pp. 2818–2826 (2016)

52.

Tian, H., Chen, S.C.: MCA-NN: Multiple correspondence analysis based neural network for disaster information detection. In: IEEE international conference on multimedia big data, pp. 268–275 (2017)

53.

Tian, H., Chen, S.C., Rubin, S.H., Grefe, W.K.: FA-MCADF: Feature affinity based multiple correspondence analysis and decision fusion framework for disaster information management. In: IEEE international conference on information reuse and integration, pp. 198–206 (2017)

54.

Tian, Y., Chen, S.C., Shyu, M. L., Huang, T., Sheu, P., Del Bimbo, A.: Multimedia big data. IEEE MultiMed. 22(3), 93–95 (2015)CrossRef

55.

Walch, M., Lange, K., Baumann, M., Weber, M.: Autonomous driving: investigating the feasibility of car-driver handover assistance. In: ACM international conference on automotive user interfaces and interactive vehicular applications, pp. 11–18 (2015)

56.

Wang, Z., Kuan, K., Ravaut, M., Manek, G., Song, S., Fang, Y., Kim, S., Chen, N., D’Haro, L.F., Tuan, L.A., et al.: Truly multi-modal youtube-8m video classification with video, audio, and text. arXiv:1706.05461 (2017)

57.

Weill, P., Vitale, M.: Place to space: Migrating to eBusiness models. Harvard Business Press, Brighton (2001)

58.

Wöllmer, M., Metallinou, A., Eyben, F., Schuller, B., Narayanan, S.: Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In: Annual conference of the international speech communication association, pp. 2362–2365 (2010)

59.

Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)CrossRef

60.

Wu, Y., Chang, E.Y., Chang, K.C.C., Smith, J.R.: Optimal multimodal fusion for multimedia data analysis. In: ACM international conference on multimedia, pp. 572–579 (2004)

61.

Yang, Y., Pouyanfar, S., Tian, H., Chen, M., Chen, S.C., Shyu, M.L.: IF-MCA: Importance factor-based multiple correspondence analysis for multimedia data analytics. IEEE Trans. Multimed. 20(4), 1024–1032 (2018)CrossRef

62.

Yates, D., Paquette, S.: Emergency knowledge management and social media technologies: A case study of the 2010 haitian earthquake. Int. J. Inf. Manag. 31(1), 6–13 (2011)CrossRef

63.

Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database. In: Advances in neural information processing systems, pp. 487–495 (2014)

64.

Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Feature selection using correlation and reliability based scoring metric for video semantic detection. In: IEEE international conference on semantic computing, pp. 462–469 (2010)

65.

Zhu, Q., Lin, L., Shyu, M.L., Chen, S.C.: Effective supervised discretization for classification based on correlation maximization. In: IEEE international conference on information reuse and integration, pp. 390–395 (2011)

66.

Zhu, W., Cui, P., Wang, Z., Hua, G.: Multimedia big data computing. IEEE Multimed. 22(3), 96–105 (2015)CrossRef

Title: Multimodal deep learning based on multiple correspondence analysis for disaster management
Authors: Samira Pouyanfar
Yudong Tao
Haiman Tian
Shu-Ching Chen
Mei-Ling Shyu
Publication date: 18-09-2018
Publisher: Springer US
Published in: World Wide Web / Issue 5/2019
Print ISSN: 1386-145X
Electronic ISSN: 1573-1413
DOI: https://doi.org/10.1007/s11280-018-0636-4

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Other articles of this Issue 5/2019

Towards secure and truthful task assignment in spatial crowdsourcing

Editor’s Note: Special Issue on Big Data Management and Intelligent Analytics

Spatio-temporal top-k term search over sliding window

Guest editorial: special issue on Web data querying, mining, and privacy preserving

Machine learning based fast multi-layer liquefaction disaster assessment

Fine-grained probability counting for cardinality estimation of data streams

Premium Partner