Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2020

01.12.2020 | Original Article

Feature selection methods for event detection in Twitter: a text mining approach

verfasst von: Ahmad Hany Hossny, Lewis Mitchell, Nick Lothian, Grant Osborne

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Selecting keywords from Twitter as features to identify events is challenging due to language informality such as acronyms, misspelled words, synonyms, transliteration and ambiguous terms. In this paper, We compare and identify the best methods for keyword selection as features to be used for classification purposes. Specifically, we study the aspects affecting keywords as features to identify civil unrest and protests. These aspects include the word count, the word forms such as n-gram, skip-gram and bags-of-words as well as the data association methods including correlation techniques and similarity techniques. To test the impact of the mentioned factors, we developed a framework that analyzed 641 days of tweets and extracted the words highly associated with event days along the same time frame. Then, we used the extracted words as features to classify any single day to be either an event day or a nonevent day in a specific location. In this framework, we used the same pipeline of data cleaning, prepossessing, feature selection, model learning and event classification using all combinations of keyword selection criteria. We used Naive Bayes classifier to learn the selected features and accordingly predict the event days. The classification is tested using multiple metrics, such as accuracy, precision, recall, F-score and AUC. This study concluded that the best word form is bag-of-words with average AUC of 0.72 and the best word count is two with average AUC of 0.74 and the best feature selection method is Spearman's correlation with average AUC of 0.89 and the best classifier for event detection is Naive Bayes Classifier.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdelhaq H, Sengstock C, Gertz M (2013) Eventweet: online localized event detection from twitter. Proc VLDB Endow 6(12):1326–1329CrossRef Abdelhaq H, Sengstock C, Gertz M (2013) Eventweet: online localized event detection from twitter. Proc VLDB Endow 6(12):1326–1329CrossRef
Zurück zum Zitat Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919 Allahyari M, Pouriyeh S, Assefi M, Safaei S, Trippe ED, Gutierrez JB, Kochut K (2017) A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:​1707.​02919
Zurück zum Zitat Ayache A, Cohen S, Véhel JL (2000) The covariance structure of multifractional brownian motion, with application to long range dependence. In: Acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings. 2000 IEEE international conference on, vol 6, pp. 3810–3813. IEEE Ayache A, Cohen S, Véhel JL (2000) The covariance structure of multifractional brownian motion, with application to long range dependence. In: Acoustics, speech, and signal processing, 2000. ICASSP’00. Proceedings. 2000 IEEE international conference on, vol 6, pp. 3810–3813. IEEE
Zurück zum Zitat Azzam A, Tazi N, Hossny A (2017) A question routing technique using deep neural network for communities of question answering. In: International conference on database systems for advanced applications. Springer, pp 35–49 Azzam A, Tazi N, Hossny A (2017) A question routing technique using deep neural network for communities of question answering. In: International conference on database systems for advanced applications. Springer, pp 35–49
Zurück zum Zitat Baker LD, McCallum AK (1998) Distributional clustering of words for text classification. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 96–103 Baker LD, McCallum AK (1998) Distributional clustering of words for text classification. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 96–103
Zurück zum Zitat Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing, Springer, pp 1–4 Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing, Springer, pp 1–4
Zurück zum Zitat Blumenstock JE (2008) Size matters: word count as a measure of quality on wikipedia. In: Proceedings of the 17th international conference on World wide web. ACM, pp 1095–1096 Blumenstock JE (2008) Size matters: word count as a measure of quality on wikipedia. In: Proceedings of the 17th international conference on World wide web. ACM, pp 1095–1096
Zurück zum Zitat Carley KM (2003) Dynamic network analysis. na Carley KM (2003) Dynamic network analysis. na
Zurück zum Zitat Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, ACM, p 4 Cataldi M, Di Caro L, Schifanella C (2010) Emerging topic detection on twitter based on temporal and social terms evaluation. In: Proceedings of the tenth international workshop on multimedia data mining, ACM, p 4
Zurück zum Zitat Cheng W, Greaves C, Warren M (2006) From n-gram to skipgram to concgram. Int J Corpus Linguistics 11(4):411–433CrossRef Cheng W, Greaves C, Warren M (2006) From n-gram to skipgram to concgram. Int J Corpus Linguistics 11(4):411–433CrossRef
Zurück zum Zitat Chien JT, Wu MS (2007) Adaptive Bayesian latent semantic analysis. IEEE Trans Audio Speech Lang Process 16(1):198–207CrossRef Chien JT, Wu MS (2007) Adaptive Bayesian latent semantic analysis. IEEE Trans Audio Speech Lang Process 16(1):198–207CrossRef
Zurück zum Zitat Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguistics 16(1):22–29 Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguistics 16(1):22–29
Zurück zum Zitat Cordeiro M (2012) Twitter event detection: combining wavelet analysis and topic inference summarization. In: Doctoral symposium on informatics engineering, pp 11–16 Cordeiro M (2012) Twitter event detection: combining wavelet analysis and topic inference summarization. In: Doctoral symposium on informatics engineering, pp 11–16
Zurück zum Zitat Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM , pp 160–168 Crandall D, Cosley D, Huttenlocher D, Kleinberg J, Suri S (2008) Feedback effects between similarity and social influence in online communities. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM , pp 160–168
Zurück zum Zitat Danowski JA, Cepela N (2010) Automatic mapping of social networks of actors from text corpora: time series analysis. In: Data mining for social network data, Springer, pp 31–46 Danowski JA, Cepela N (2010) Automatic mapping of social networks of actors from text corpora: time series analysis. In: Data mining for social network data, Springer, pp 31–46
Zurück zum Zitat D’hondt E, Verberne S, Weber N, Koster C, Boves L (2012) Using skipgrams and POS-based feature selection for patent classification D’hondt E, Verberne S, Weber N, Koster C, Boves L (2012) Using skipgrams and POS-based feature selection for patent classification
Zurück zum Zitat Diesner J, Carley KM (2004) Using network text analysis to detect the organizational structure of covert networks. In: Proceedings of the North American association for computational social and organizational science (NAACSOS) conference, vol 3. NAACSOS Diesner J, Carley KM (2004) Using network text analysis to detect the organizational structure of covert networks. In: Proceedings of the North American association for computational social and organizational science (NAACSOS) conference, vol 3. NAACSOS
Zurück zum Zitat Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM (2011) Temporal patterns of happiness and information in a global social network: hedonometrics and twitter. PloS ONE 6(12):e26752CrossRef Dodds PS, Harris KD, Kloumann IM, Bliss CA, Danforth CM (2011) Temporal patterns of happiness and information in a global social network: hedonometrics and twitter. PloS ONE 6(12):e26752CrossRef
Zurück zum Zitat Dubey VK, Saxena AK (2016) Cosine similarity based filter technique for feature selection. In: Control, computing, communication and materials (ICCCCM), 2016 international conference on, IEEE, pp 1–6 Dubey VK, Saxena AK (2016) Cosine similarity based filter technique for feature selection. In: Control, computing, communication and materials (ICCCCM), 2016 international conference on, IEEE, pp 1–6
Zurück zum Zitat Fernández J, Gutiérrez Y, Soriano JMG, Martínez-Barco P (2014) Gplsi: Supervised sentiment analysis in twitter using skipgrams. In: SemEval@ COLING, pp 294–299 Fernández J, Gutiérrez Y, Soriano JMG, Martínez-Barco P (2014) Gplsi: Supervised sentiment analysis in twitter using skipgrams. In: SemEval@ COLING, pp 294–299
Zurück zum Zitat Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305MATH Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3(Mar):1289–1305MATH
Zurück zum Zitat Fraser AM, Swinney HL (1986) Independent coordinates for strange attractors from mutual information. Phys Rev A 33(2):1134MathSciNetCrossRef Fraser AM, Swinney HL (1986) Independent coordinates for strange attractors from mutual information. Phys Rev A 33(2):1134MathSciNetCrossRef
Zurück zum Zitat Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, pp 181–192 Fung GPC, Yu JX, Yu PS, Lu H (2005) Parameter free bursty events detection in text streams. In: Proceedings of the 31st international conference on very large data bases, VLDB Endowment, pp 181–192
Zurück zum Zitat Guthrie D, Allison B, Liu W, Guthrie L, Wilks Y (2006) A closer look at skip-gram modelling. In: Proceedings of the 5th international conference on language resources and evaluation (LREC-2006), sn, pp 1–4 Guthrie D, Allison B, Liu W, Guthrie L, Wilks Y (2006) A closer look at skip-gram modelling. In: Proceedings of the 5th international conference on language resources and evaluation (LREC-2006), sn, pp 1–4
Zurück zum Zitat Guzman J, Poblete B (2013) On-line relevant anomaly detection in the twitter stream: an efficient bursty keyword detection model. In: Proceedings of the ACM SIGKDD workshop on outlier detection and description, ACM, pp 31–39 Guzman J, Poblete B (2013) On-line relevant anomaly detection in the twitter stream: an efficient bursty keyword detection model. In: Proceedings of the ACM SIGKDD workshop on outlier detection and description, ACM, pp 31–39
Zurück zum Zitat Hauke J, Kossowski T (2011) Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaest Geogr 30(2):87CrossRef Hauke J, Kossowski T (2011) Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaest Geogr 30(2):87CrossRef
Zurück zum Zitat Havlicek LL, Peterson NL (1976) Robustness of the pearson correlation against violations of assumptions. Percept Mot Skills 43(3-suppl):1319–1334CrossRef Havlicek LL, Peterson NL (1976) Robustness of the pearson correlation against violations of assumptions. Percept Mot Skills 43(3-suppl):1319–1334CrossRef
Zurück zum Zitat Hazewinkel M (2001) Orthogonalization. Encyclopedia of mathematics. Kluwer Academic Publishers, 2002, Dordrecht Hazewinkel M (2001) Orthogonalization. Encyclopedia of mathematics. Kluwer Academic Publishers, 2002, Dordrecht
Zurück zum Zitat Hewapathirana IU, Lee D, Moltchanova E, McLeod J (2020) Change detection in noisy dynamic networks: a spectral embedding approach. Soc Netw Anal Mining 10(1):14CrossRef Hewapathirana IU, Lee D, Moltchanova E, McLeod J (2020) Change detection in noisy dynamic networks: a spectral embedding approach. Soc Netw Anal Mining 10(1):14CrossRef
Zurück zum Zitat Hossny A, Shaalan K, Fahmy A (2008) Automatic morphological rule induction for arabic. In: Proceedings of the workshop on human language translation and natural language processing within the arabic world (LREC08), pp 97–101 Hossny A, Shaalan K, Fahmy A (2008) Automatic morphological rule induction for arabic. In: Proceedings of the workshop on human language translation and natural language processing within the arabic world (LREC08), pp 97–101
Zurück zum Zitat Hossny A, Shaalan K, Fahmy A (2009) Machine translation model using inductive logic programming. In: 2009 International conference on natural language processing and knowledge engineering, IEEE, pp 1–8 Hossny A, Shaalan K, Fahmy A (2009) Machine translation model using inductive logic programming. In: 2009 International conference on natural language processing and knowledge engineering, IEEE, pp 1–8
Zurück zum Zitat Hossny AH, Moschuo T, Osborne G, Mitchell L, Lothian N (2018) Enhancing keyword correlation for event detection in social networks using svd and k-means: twitter case study. Soc Netw Anal Min 8(1):49CrossRef Hossny AH, Moschuo T, Osborne G, Mitchell L, Lothian N (2018) Enhancing keyword correlation for event detection in social networks using svd and k-means: twitter case study. Soc Netw Anal Min 8(1):49CrossRef
Zurück zum Zitat Khafaei T, Taraghi AT, Hosseinzadeh M, Rezaee A (2019) Tracing temporal communities and event prediction in dynamic social networks. Soc Netw Anal Min 9(1):59CrossRef Khafaei T, Taraghi AT, Hosseinzadeh M, Rezaee A (2019) Tracing temporal communities and event prediction in dynamic social networks. Soc Netw Anal Min 9(1):59CrossRef
Zurück zum Zitat Kim C, Park S, Kwon K, Chang W (2012) An empirical study of the structure of relevant keywords in a search engine using the minimum spanning tree. Expert Syst Appl 39(4):4432–4443. https://doi.org/10.1016/j.eswa.2011.09.147. http://www.sciencedirect.com/science/article/pii/S0957417411014709CrossRef Kim C, Park S, Kwon K, Chang W (2012) An empirical study of the structure of relevant keywords in a search engine using the minimum spanning tree. Expert Syst Appl 39(4):4432–4443. https://​doi.​org/​10.​1016/​j.​eswa.​2011.​09.​147.​ http://​www.​sciencedirect.​com/​science/​article/​pii/​S095741741101470​9CrossRef
Zurück zum Zitat Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS (2014) Consistent binary classification with generalized performance metrics. In: Advances in neural information processing systems, pp 2744–2752 Koyejo OO, Natarajan N, Ravikumar PK, Dhillon IS (2014) Consistent binary classification with generalized performance metrics. In: Advances in neural information processing systems, pp 2744–2752
Zurück zum Zitat Kurihara K, Sato T (2006) Variational Bayesian grammar induction for natural language. In: International colloquium on grammatical inference, Springer, pp 84–96 Kurihara K, Sato T (2006) Variational Bayesian grammar induction for natural language. In: International colloquium on grammatical inference, Springer, pp 84–96
Zurück zum Zitat Lampos V, Cristianini N (2012) Nowcasting events from the social web with statistical learning. ACM Trans Intell Syst Technol (TIST) 3(4):72 Lampos V, Cristianini N (2012) Nowcasting events from the social web with statistical learning. ACM Trans Intell Syst Technol (TIST) 3(4):72
Zurück zum Zitat Landauer TK (2006) Latent semantic analysis. Wiley Online Library, New JerseyCrossRef Landauer TK (2006) Latent semantic analysis. Wiley Online Library, New JerseyCrossRef
Zurück zum Zitat Lawrence I, Lin K (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics pp 255–268 Lawrence I, Lin K (1989) A concordance correlation coefficient to evaluate reproducibility. Biometrics pp 255–268
Zurück zum Zitat Levy O, Goldberg Y (2014) Dependency-based word embeddings. ACL 2:302–308 Levy O, Goldberg Y (2014) Dependency-based word embeddings. ACL 2:302–308
Zurück zum Zitat Li R, Lei KH, Khadiwala R, Chang KCC (2012) Tedas: a twitter-based event detection and analysis system. In: Data engineering (ICDE), 2012 IEEE 28th international conference on, IEEE, pp 1273–1276 Li R, Lei KH, Khadiwala R, Chang KCC (2012) Tedas: a twitter-based event detection and analysis system. In: Data engineering (ICDE), 2012 IEEE 28th international conference on, IEEE, pp 1273–1276
Zurück zum Zitat Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107(499):1129–1139MathSciNetCrossRef Li R, Zhong W, Zhu L (2012) Feature screening via distance correlation learning. J Am Stat Assoc 107(499):1129–1139MathSciNetCrossRef
Zurück zum Zitat Loper E, Bird S (2002) NLTK: The natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics—vol 1, ETMTNLP ’02. Association for computational linguistics, Stroudsburg, PA, USA, pp 63–70. https://doi.org/10.3115/1118108.1118117 Loper E, Bird S (2002) NLTK: The natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics—vol 1, ETMTNLP ’02. Association for computational linguistics, Stroudsburg, PA, USA, pp 63–70. https://​doi.​org/​10.​3115/​1118108.​1118117
Zurück zum Zitat Mandera P, Keuleers E, Brysbaert M (2017) Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: a review and empirical validation. J Mem Lang 92:57–78CrossRef Mandera P, Keuleers E, Brysbaert M (2017) Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: a review and empirical validation. J Mem Lang 92:57–78CrossRef
Zurück zum Zitat Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International conference on management of data, SIGMOD ’10, ACM, Indianapolis, Indiana, USA pp 1155–1158 https://doi.org/10.1145/1807167.1807306 Mathioudakis M, Koudas N (2010) Twittermonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD International conference on management of data, SIGMOD ’10, ACM, Indianapolis, Indiana, USA pp 1155–1158 https://​doi.​org/​10.​1145/​1807167.​1807306
Zurück zum Zitat Matsuo Y, Mori J, Hamasaki M, Nishimura T, Takeda H, Hasida K, Ishizuka M (2007) Polyphonet: an advanced social network extraction system from the web. Web Semant Sci Serv Agents World Wide Web 5(4):262–278CrossRef Matsuo Y, Mori J, Hamasaki M, Nishimura T, Takeda H, Hasida K, Ishizuka M (2007) Polyphonet: an advanced social network extraction system from the web. Web Semant Sci Serv Agents World Wide Web 5(4):262–278CrossRef
Zurück zum Zitat Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:​1301.​3781
Zurück zum Zitat Myers L, Sirois MJ (2006) Spearman correlation coefficients, differences between. Wiley StatsRef, Statistics Reference Online Myers L, Sirois MJ (2006) Spearman correlation coefficients, differences between. Wiley StatsRef, Statistics Reference Online
Zurück zum Zitat Nasution MK, Noah SAM, Saad S (2016) Social network extraction: superficial method and information retrieval. arXiv preprint arXiv:1601.02904 Nasution MK, Noah SAM, Saad S (2016) Social network extraction: superficial method and information retrieval. arXiv preprint arXiv:​1601.​02904
Zurück zum Zitat Niwattanakul S, Singthongchai J, Naenudorn E, Wanapu S (2013) Using of jaccard coefficient for keywords similarity. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1 Niwattanakul S, Singthongchai J, Naenudorn E, Wanapu S (2013) Using of jaccard coefficient for keywords similarity. In: Proceedings of the international multiconference of engineers and computer scientists, vol 1
Zurück zum Zitat Pennacchiotti M, Gurumurthy S (2011) Investigating topic models for social media user recommendation. In: Proceedings of the 20th international conference companion on World wide web, ACM, pp 101–102 Pennacchiotti M, Gurumurthy S (2011) Investigating topic models for social media user recommendation. In: Proceedings of the 20th international conference companion on World wide web, ACM, pp 101–102
Zurück zum Zitat Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, HLT ’10 . Association for computational linguistics, Stroudsburg, PA, USA, pp 181–189 . http://dl.acm.org/citation.cfm?id=1857999.1858020 Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, HLT ’10 . Association for computational linguistics, Stroudsburg, PA, USA, pp 181–189 . http://​dl.​acm.​org/​citation.​cfm?​id=​1857999.​1858020
Zurück zum Zitat Popescu AM, Pennacchiotti M (2010) Detecting controversial events from twitter. In: Proceedings of the 19th ACM international conference on Information and knowledge management, ACM, pp 1873–1876. Popescu AM, Pennacchiotti M (2010) Detecting controversial events from twitter. In: Proceedings of the 19th ACM international conference on Information and knowledge management, ACM, pp 1873–1876.
Zurück zum Zitat Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web, ACM, pp 851–860 Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web, ACM, pp 851–860
Zurück zum Zitat Sayyadi H, Hurst M, Maykov A (2009) Event detection and tracking in social streams. In: ICWSM Sayyadi H, Hurst M, Maykov A (2009) Event detection and tracking in social streams. In: ICWSM
Zurück zum Zitat Shazeer N, Pelemans J, Chelba C (2015) Sparse non-negative matrix language modeling for skip-grams. Proc Interspeech 2015:1428–1432 Shazeer N, Pelemans J, Chelba C (2015) Sparse non-negative matrix language modeling for skip-grams. Proc Interspeech 2015:1428–1432
Zurück zum Zitat Singhal A (2001) Modern information retrieval: a brief overview. IEEE Data Eng Bull 24(4):35–43 Singhal A (2001) Modern information retrieval: a brief overview. IEEE Data Eng Bull 24(4):35–43
Zurück zum Zitat Székely GJ, Rizzo ML, Bakirov NK et al (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794MathSciNetCrossRef Székely GJ, Rizzo ML, Bakirov NK et al (2007) Measuring and testing dependence by correlation of distances. Ann Stat 35(6):2769–2794MathSciNetCrossRef
Zurück zum Zitat Thelwall M, Buckley K, Paltoglou G (2011) Sentiment in twitter events. J Assoc Inform Sci Technol 62(2):406–418CrossRef Thelwall M, Buckley K, Paltoglou G (2011) Sentiment in twitter events. J Assoc Inform Sci Technol 62(2):406–418CrossRef
Zurück zum Zitat Unankard S, Li X, Sharaf MA (2015) Emerging event detection in social networks with location sensitivity. World Wide Web 18(5):1393–1417CrossRef Unankard S, Li X, Sharaf MA (2015) Emerging event detection in social networks with location sensitivity. World Wide Web 18(5):1393–1417CrossRef
Zurück zum Zitat Viola P, Wells WM III (1997) Alignment by maximization of mutual information. Int J Comput Vis 24(2):137–154CrossRef Viola P, Wells WM III (1997) Alignment by maximization of mutual information. Int J Comput Vis 24(2):137–154CrossRef
Zurück zum Zitat Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 977–984 Wallach HM (2006) Topic modeling: beyond bag-of-words. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 977–984
Zurück zum Zitat Walther M, Kaisser M (2013) Geo-spatial event detection in the twitter stream. In: ECIR, Springer, pp 356–367 Walther M, Kaisser M (2013) Geo-spatial event detection in the twitter stream. In: ECIR, Springer, pp 356–367
Zurück zum Zitat Wells WM, Viola P, Atsumi H, Nakajima S, Kikinis R (1996) Multi-modal volume registration by maximization of mutual information. Med Image Anal 1(1):35–51CrossRef Wells WM, Viola P, Atsumi H, Nakajima S, Kikinis R (1996) Multi-modal volume registration by maximization of mutual information. Med Image Anal 1(1):35–51CrossRef
Zurück zum Zitat Weng J, Lee BS (2011) Event detection in twitter. ICWSM 11:401–408 Weng J, Lee BS (2011) Event detection in twitter. ICWSM 11:401–408
Zurück zum Zitat Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. ICML 97:412–420 Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. ICML 97:412–420
Zurück zum Zitat Zhang H, Li D (2007) Naïve bayes text classifier. In: Granular computing, 2007. GRC 2007. IEEE international conference on, IEEE, pp 708–708 Zhang H, Li D (2007) Naïve bayes text classifier. In: Granular computing, 2007. GRC 2007. IEEE international conference on, IEEE, pp 708–708
Zurück zum Zitat Zhang J, Ackerman MS, Adamic L (2007) Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 221–230 Zhang J, Ackerman MS, Adamic L (2007) Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th international conference on World Wide Web, ACM, pp 221–230
Zurück zum Zitat Zywica J, Danowski J (2008) The faces of facebookers: investigating social enhancement and social compensation hypotheses; predicting facebook and offline popularity from sociability and self-esteem, and mapping the meanings of popularity with semantic networks. J Comput Mediat Commun 14(1):1–34CrossRef Zywica J, Danowski J (2008) The faces of facebookers: investigating social enhancement and social compensation hypotheses; predicting facebook and offline popularity from sociability and self-esteem, and mapping the meanings of popularity with semantic networks. J Comput Mediat Commun 14(1):1–34CrossRef
Metadaten
Titel
Feature selection methods for event detection in Twitter: a text mining approach
verfasst von
Ahmad Hany Hossny
Lewis Mitchell
Nick Lothian
Grant Osborne
Publikationsdatum
01.12.2020
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2020
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-020-00658-3

Weitere Artikel der Ausgabe 1/2020

Social Network Analysis and Mining 1/2020 Zur Ausgabe

Premium Partner