Skip to main content
Erschienen in: Information Systems Frontiers 5/2018

22.02.2018

Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets

verfasst von: Girish Keshav Palshikar, Manoj Apte, Deepak Pandita

Erschienen in: Information Systems Frontiers | Ausgabe 5/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Social media has quickly established itself as an important means that people, NGOs and governments use to spread information during natural or man-made disasters, mass emergencies and crisis situations. Given this important role, real-time analysis of social media contents to locate, organize and use valuable information for disaster management is crucial. In this paper, we propose self-learning algorithms that, with minimal supervision, construct a simple bag-of-words model of information expressed in the news about various natural disasters. Such a model is human-understandable, human-modifiable and usable in a real-time scenario. Since tweets are a different category of documents than news, we next propose a model transfer algorithm, which essentially refines the model learned from news by analyzing a large unlabeled corpus of tweets. We show empirically that model transfer improves the predictive accuracy of the model. We demonstrate empirically that our model learning algorithm is better than several state of the art semi-supervised learning algorithms. Finally, we present an online algorithm that learns the weights for words in the model and demonstrate the efficacy of the model with word weights.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Guerra, P.H.C., Veloso, A., Meira, W.Jr., & Almeida, V. (2011). From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 150–158): ACM. Guerra, P.H.C., Veloso, A., Meira, W.Jr., & Almeida, V. (2011). From bias to opinion: a transfer-learning approach to real-time sentiment analysis. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 150–158): ACM.
Zurück zum Zitat Dai, W., Xue, G.-R., Yang, Q., & Yong, Y. (2007). Transferring naive bayes classifiers for text classification. In Proceedings of the national conference on artificial intelligence 1999 (Vol. 22, p. 540). Menlo Park, CA; Cambridge, MA; London: AAAI Press; MIT Press. Dai, W., Xue, G.-R., Yang, Q., & Yong, Y. (2007). Transferring naive bayes classifiers for text classification. In Proceedings of the national conference on artificial intelligence 1999 (Vol. 22, p. 540). Menlo Park, CA; Cambridge, MA; London: AAAI Press; MIT Press.
Zurück zum Zitat Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in twitter and amazon. In Proceedings of the fourteenth conference on computational natural language learning (pp. 107–116). Association for Computational Linguistics. Davidov, D., Tsur, O., & Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in twitter and amazon. In Proceedings of the fourteenth conference on computational natural language learning (pp. 107–116). Association for Computational Linguistics.
Zurück zum Zitat De Boom, C., Van Canneyt, S., Demeester, T., & Dhoedt, B. (2016). Representation learning for very short texts using weighted word embedding aggregation. Pattern Recognition Letters, 80(C), 150–156.CrossRef De Boom, C., Van Canneyt, S., Demeester, T., & Dhoedt, B. (2016). Representation learning for very short texts using weighted word embedding aggregation. Pattern Recognition Letters, 80(C), 150–156.CrossRef
Zurück zum Zitat Druck, G., Mann, G., & McCallum, A. (2008). Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 595–602). ACM. Druck, G., Mann, G., & McCallum, A. (2008). Learning from labeled features using generalized expectation criteria. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 595–602). ACM.
Zurück zum Zitat Greene, D., & Cunningham, P. (2006). Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings 23rd international conference on machine learning (ICML06) (pp. 377384). ACM Press. Greene, D., & Cunningham, P. (2006). Practical solutions to the problem of diagonal dominance in kernel document clustering. In Proceedings 23rd international conference on machine learning (ICML06) (pp. 377384). ACM Press.
Zurück zum Zitat Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency: a survey. ACM Computing Surveys, 47(4), 67:1–67:38.CrossRef Imran, M., Castillo, C., Diaz, F., & Vieweg, S. (2015). Processing social media messages in mass emergency: a survey. ACM Computing Surveys, 47(4), 67:1–67:38.CrossRef
Zurück zum Zitat Joachims, T. (1999). Transductive inference for text classification using support vector machines. In Proceedings of the sixteenth international conference on machine learning (ICML 99) (pp. 200–209). Joachims, T. (1999). Transductive inference for text classification using support vector machines. In Proceedings of the sixteenth international conference on machine learning (ICML 99) (pp. 200–209).
Zurück zum Zitat Kenter, T., & de Rijke, M. (2015). Short text similarity with word embeddings. In Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM ’15 (pp. 1411–1420). Kenter, T., & de Rijke, M. (2015). Short text similarity with word embeddings. In Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM ’15 (pp. 1411–1420).
Zurück zum Zitat Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning the MIT press. Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of machine learning the MIT press.
Zurück zum Zitat Musaev, A., De, W., & Litmus, C.P. (2014). Landslide detection by integrating multiple sources. In 11th international conference information systems for crisis response and management (ISCRAM). Musaev, A., De, W., & Litmus, C.P. (2014). Landslide detection by integrating multiple sources. In 11th international conference information systems for crisis response and management (ISCRAM).
Zurück zum Zitat Nigam, K., McCallum, A.K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Machine Learning, 39(2-3), 103–134.CrossRef Nigam, K., McCallum, A.K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using em. Machine Learning, 39(2-3), 103–134.CrossRef
Zurück zum Zitat Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543). Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: global vectors for word representation. In Empirical methods in natural language processing (EMNLP) (pp. 1532–1543).
Zurück zum Zitat Ritter, A., Wright, E., Casey, W., & Mitchell, T. (2015). Weakly supervised extraction of computer security events from twitter. In Proceedings of the 24th international conference on world wide web (pp.896–905). ACM. Ritter, A., Wright, E., Casey, W., & Mitchell, T. (2015). Weakly supervised extraction of computer security events from twitter. In Proceedings of the 24th international conference on world wide web (pp.896–905). ACM.
Zurück zum Zitat Roy, Suman D., Mei, T., Zeng, W., & Li, S. (2012). Socialtransfer: cross-domain transfer learning from social streams for media applications. In Proceedings of the 20th ACM international conference on multimedia (pp. 649–658). ACM. Roy, Suman D., Mei, T., Zeng, W., & Li, S. (2012). Socialtransfer: cross-domain transfer learning from social streams for media applications. In Proceedings of the 20th ACM international conference on multimedia (pp. 649–658). ACM.
Zurück zum Zitat Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shake s twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on world wide web (pp. 851–860). ACM. Sakaki, T., Okazaki, M., & Matsuo, Y. (2010). Earthquake shake s twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on world wide web (pp. 851–860). ACM.
Zurück zum Zitat Tsur, O., Davidov, D., & name, A.R. (2010). Icwsm-a great catchy Semi-supervised recognition of sarcastic sentences in online product reviews. In ICWSM. Tsur, O., Davidov, D., & name, A.R. (2010). Icwsm-a great catchy Semi-supervised recognition of sarcastic sentences in online product reviews. In ICWSM.
Zurück zum Zitat Yang, C.C., Shi, X., & Wei, C.-P. (2009). Discovering event evolution graphs from news corpora. IEEE Transactions on Systems, Man, and cybernetics-Part A: Systems and Humans, 39(4), 850–863.CrossRef Yang, C.C., Shi, X., & Wei, C.-P. (2009). Discovering event evolution graphs from news corpora. IEEE Transactions on Systems, Man, and cybernetics-Part A: Systems and Humans, 39(4), 850–863.CrossRef
Zurück zum Zitat Zhao, Q., Mitra, P., & Bi, C. (2007). Temporal and information flow based event detection from social text streams. In AAAI (Vol. 7, pp. 1501–1506). Zhao, Q., Mitra, P., & Bi, C. (2007). Temporal and information flow based event detection from social text streams. In AAAI (Vol. 7, pp. 1501–1506).
Zurück zum Zitat Zhao, Z., Da, Y., Ng, W., & Gao, S. (2013). A transfer learning based framework of crowd-selection on twitter. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1514–1517). ACM. Zhao, Z., Da, Y., Ng, W., & Gao, S. (2013). A transfer learning based framework of crowd-selection on twitter. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1514–1517). ACM.
Zurück zum Zitat Zhao, L., Chen, F., Dai, J., Hua, T., Lu, C.-T., & Ramakrishnan, N. (2014). Unsupervised spatial event detection in targeted domains with applications to civil unrest modeling. PLOS ONE, 9(10). Zhao, L., Chen, F., Dai, J., Hua, T., Lu, C.-T., & Ramakrishnan, N. (2014). Unsupervised spatial event detection in targeted domains with applications to civil unrest modeling. PLOS ONE, 9(10).
Zurück zum Zitat Zhou, Y., Kantarcioglu, M., & Thuraisingham, B. (2012). Self-training with selection-by-rejection. In 2012 IEEE 12th international conference on data mining (pp. 795–803). IEEE. Zhou, Y., Kantarcioglu, M., & Thuraisingham, B. (2012). Self-training with selection-by-rejection. In 2012 IEEE 12th international conference on data mining (pp. 795–803). IEEE.
Zurück zum Zitat Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML (pp. 912–919). Zhu, X., Ghahramani, Z., & Lafferty, J. (2003). Semi-supervised learning using gaussian fields and harmonic functions. In ICML (pp. 912–919).
Metadaten
Titel
Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets
verfasst von
Girish Keshav Palshikar
Manoj Apte
Deepak Pandita
Publikationsdatum
22.02.2018
Verlag
Springer US
Erschienen in
Information Systems Frontiers / Ausgabe 5/2018
Print ISSN: 1387-3326
Elektronische ISSN: 1572-9419
DOI
https://doi.org/10.1007/s10796-018-9830-2

Weitere Artikel der Ausgabe 5/2018

Information Systems Frontiers 5/2018 Zur Ausgabe

Premium Partner