Skip to main content
Top
Published in: Social Network Analysis and Mining 1/2018

01-12-2018 | Original Article

Enhancing keyword correlation for event detection in social networks using SVD and k-means: Twitter case study

Authors: Ahmad Hany Hossny, Terry Moschuo, Grant Osborne, Lewis Mitchell, Nick Lothian

Published in: Social Network Analysis and Mining | Issue 1/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Extracting textual features from tweets is a challenging task due to the noisy nature of the content and the weak signal of most of the words used. In this paper, we propose using singular value decomposition (SVD) with clustering to group related words as enhanced signals for textual features in tweets in order to improve the correlation with events. The proposed method applies SVD to the time series vector for each feature to factorize the matrix of feature/day counts, to ensure the independence of the feature vectors. Then, k-means clustering is applied to build a look-up table that maps members of each cluster to the cluster centroid. The look-up table is used to map each feature in the original data to the centroid of its cluster. Then, we calculate the sum of the term-frequency vectors of all features in each cluster to the term-frequency vector of the cluster centroid. To evaluate the method, we calculated the correlations of the cluster centroids with the golden standard record vector before and after summing the vectors of the cluster members to the centroid vector. The proposed method is applied to multiple correlation techniques including the Pearson, Spearman, distance correlation, and Kendal Tao. The experiments also considered the different word forms and lengths of the features including keywords, n grams, skip grams, and bags-of-words. The correlation results are enhanced significantly as the highest correlation scores have increased from 0.22 to 0.70, and the average correlation scores have increased from 0.22 to 0.60.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459CrossRef Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459CrossRef
go back to reference Anduiza E, Cristancho C, Sabucedo JM (2014) Mobilization through online social networks: the political protest of the indignados in spain. Inf Commun Soc 17(6):750–764CrossRef Anduiza E, Cristancho C, Sabucedo JM (2014) Mobilization through online social networks: the political protest of the indignados in spain. Inf Commun Soc 17(6):750–764CrossRef
go back to reference Azzam A, Tazi N, Hossny A (2017) A question routing technique using deep neural network for communities of question answering. In: International conference on database systems for advanced applications. Springer, New York, pp 35–49 Azzam A, Tazi N, Hossny A (2017) A question routing technique using deep neural network for communities of question answering. In: International conference on database systems for advanced applications. Springer, New York, pp 35–49
go back to reference Blankertz B, Tomioka R, Lemm S, Kawanabe M, Muller KR (2008) Optimizing spatial filters for robust eeg single-trial analysis. IEEE Signal Process Mag 25(1):41–56CrossRef Blankertz B, Tomioka R, Lemm S, Kawanabe M, Muller KR (2008) Optimizing spatial filters for robust eeg single-trial analysis. IEEE Signal Process Mag 25(1):41–56CrossRef
go back to reference Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mac Learn Res 3(Jan):993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mac Learn Res 3(Jan):993–1022MATH
go back to reference Chen ZP, Morris J, Martin E, Hammond RB, Lai X, Ma C, Purba E, Roberts KJ, Bytheway R (2005) Enhancing the signal-to-noise ratio of X-ray diffraction profiles by smoothed principal component analysis. Anal Chem 77(20):6563–6570CrossRef Chen ZP, Morris J, Martin E, Hammond RB, Lai X, Ma C, Purba E, Roberts KJ, Bytheway R (2005) Enhancing the signal-to-noise ratio of X-ray diffraction profiles by smoothed principal component analysis. Anal Chem 77(20):6563–6570CrossRef
go back to reference Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314MATHCrossRef Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314MATHCrossRef
go back to reference Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC Press, Boca RatonMATHCrossRef Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC Press, Boca RatonMATHCrossRef
go back to reference Dumais ST (2004) Latent semantic analysis. Annu Rev Inf Sci Technol 38(1):188–230CrossRef Dumais ST (2004) Latent semantic analysis. Annu Rev Inf Sci Technol 38(1):188–230CrossRef
go back to reference Evangelopoulos NE (2013) Latent semantic analysis. Wiley Interdiscip Rev Cogn Sci 4(6):683–692CrossRef Evangelopoulos NE (2013) Latent semantic analysis. Wiley Interdiscip Rev Cogn Sci 4(6):683–692CrossRef
go back to reference Fernández J, Gutiérrez Y, Gómez JM, Martınez-Barco P (2014) Gplsi: supervised sentiment analysis in twitter using skipgrams. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), number SemEval, pp 294–299 Fernández J, Gutiérrez Y, Gómez JM, Martınez-Barco P (2014) Gplsi: supervised sentiment analysis in twitter using skipgrams. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), number SemEval, pp 294–299
go back to reference Hamadache M, Lee D (2017) Principal component analysis based signal-to-noise ratio improvement for inchoate faulty signals: application to ball bearing fault detection. Int J Control Autom Syst 15(2):506–517CrossRef Hamadache M, Lee D (2017) Principal component analysis based signal-to-noise ratio improvement for inchoate faulty signals: application to ball bearing fault detection. Int J Control Autom Syst 15(2):506–517CrossRef
go back to reference Hoffman M, Bach FR, Blei DM (2010) Online learning for latent Dirichlet allocation. In: advances in neural information processing systems, pp 856–864 Hoffman M, Bach FR, Blei DM (2010) Online learning for latent Dirichlet allocation. In: advances in neural information processing systems, pp 856–864
go back to reference Hossny A, Shaalan K, Fahmy A (2008) Automatic morphological rule induction for Arabic. In: Proceedings of the LREC08 workshop on HLT & NLP within the Arabic world: Arabic language and local languages processing: status updates and prospects, pp 97–101 Hossny A, Shaalan K, Fahmy A (2008) Automatic morphological rule induction for Arabic. In: Proceedings of the LREC08 workshop on HLT & NLP within the Arabic world: Arabic language and local languages processing: status updates and prospects, pp 97–101
go back to reference Hossny A, Shaalan K, Fahmy A (2009) Machine translation model using inductive logic programming. In: International conference on natural language processing and knowledge engineering, 2009. NLP-KE 2009. IEEE, pp 1–8 Hossny A, Shaalan K, Fahmy A (2009) Machine translation model using inductive logic programming. In: International conference on natural language processing and knowledge engineering, 2009. NLP-KE 2009. IEEE, pp 1–8
go back to reference Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430CrossRef Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430CrossRef
go back to reference Hyvärinen A, Karhunen J, Oja E (2004) Independent component analysis, vol 46. Wiley, New York Hyvärinen A, Karhunen J, Oja E (2004) Independent component analysis, vol 46. Wiley, New York
go back to reference Jiang Z, Lin Z, Davis LS (2013) Label consistent k-svd: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35(11):2651–2664CrossRef Jiang Z, Lin Z, Davis LS (2013) Label consistent k-svd: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35(11):2651–2664CrossRef
go back to reference Klema V, Laub A (1980) The singular value decomposition: its computation and some applications. IEEE Trans Autom Control 25(2):164–176MathSciNetMATHCrossRef Klema V, Laub A (1980) The singular value decomposition: its computation and some applications. IEEE Trans Autom Control 25(2):164–176MathSciNetMATHCrossRef
go back to reference Koutsias N, Mallinis G, Karteris M (2009) A forward/backward principal component analysis of landsat-7 etm+ data to enhance the spectral signal of burnt surfaces. ISPRS J Photogramm Remote Sens 64(1):37–46CrossRef Koutsias N, Mallinis G, Karteris M (2009) A forward/backward principal component analysis of landsat-7 etm+ data to enhance the spectral signal of burnt surfaces. ISPRS J Photogramm Remote Sens 64(1):37–46CrossRef
go back to reference Landauer TK (2006) Latent semantic analysis. Wiley Online Library Landauer TK (2006) Latent semantic analysis. Wiley Online Library
go back to reference Lange K (2010) Singular value decomposition. Numerical analysis for statisticians, pp 129–142 Lange K (2010) Singular value decomposition. Numerical analysis for statisticians, pp 129–142
go back to reference Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 155–164 Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 155–164
go back to reference Loper E, Bird S (2002) Nltk: the natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, vol 1, ETMTNLP ’02. Association for Computational Linguistics, Stroudsburg, pp 63–70. https://doi.org/10.3115/1118108.1118117 Loper E, Bird S (2002) Nltk: the natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, vol 1, ETMTNLP ’02. Association for Computational Linguistics, Stroudsburg, pp 63–70. https://​doi.​org/​10.​3115/​1118108.​1118117
go back to reference Lotte F, Guan C (2011) Regularizing common spatial patterns to improve bci designs: unified theory and new algorithms. IEEE Trans Biomed Eng 58(2):355–362CrossRef Lotte F, Guan C (2011) Regularizing common spatial patterns to improve bci designs: unified theory and new algorithms. IEEE Trans Biomed Eng 58(2):355–362CrossRef
go back to reference Martınez-Cámara E, Gutiérrez-Vázquez Y, Fernández J, Montejo-Ráez A, Munoz-Guillena R (2015) Ensemble classifier for twitter sentiment analysis Martınez-Cámara E, Gutiérrez-Vázquez Y, Fernández J, Montejo-Ráez A, Munoz-Guillena R (2015) Ensemble classifier for twitter sentiment analysis
go back to reference Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Human language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 181–189. http://dl.acm.org/citation.cfm?id=1857999.1858020 Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Human language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 181–189. http://​dl.​acm.​org/​citation.​cfm?​id=​1857999.​1858020
go back to reference Potapov P, Longo P, Okunishi E (2017) Enhancement of noisy edx hrstem spectrum-images by combination of filtering and pca. Micron 96:29–37CrossRef Potapov P, Longo P, Okunishi E (2017) Enhancement of noisy edx hrstem spectrum-images by combination of filtering and pca. Micron 96:29–37CrossRef
go back to reference Ramoser H, Muller-Gerking J, Pfurtscheller G (2000) Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehabil Eng 8(4):441–446CrossRef Ramoser H, Muller-Gerking J, Pfurtscheller G (2000) Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehabil Eng 8(4):441–446CrossRef
go back to reference Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for twitter considering dynamics of user interests and topic trends. In: EMNLP, pp 1977–1985 Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for twitter considering dynamics of user interests and topic trends. In: EMNLP, pp 1977–1985
go back to reference Sun S, Zhang C, Lu Y (2008) The random electrode selection ensemble for eeg signal classification. Pattern Recognit 41(5):1663–1675MATHCrossRef Sun S, Zhang C, Lu Y (2008) The random electrode selection ensemble for eeg signal classification. Pattern Recognit 41(5):1663–1675MATHCrossRef
go back to reference Tufekci Z, Wilson C (2012) Social media and the decision to participate in political protest: observations from Tahrir Square. J Commun 62(2):363–379CrossRef Tufekci Z, Wilson C (2012) Social media and the decision to participate in political protest: observations from Tahrir Square. J Commun 62(2):363–379CrossRef
go back to reference Valenzuela S (2013) Unpacking the use of social media for protest behavior: the roles of information, opinion expression, and activism. Am Behav Sci 57(7):920–942CrossRef Valenzuela S (2013) Unpacking the use of social media for protest behavior: the roles of information, opinion expression, and activism. Am Behav Sci 57(7):920–942CrossRef
go back to reference Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. Springer US, Boston, pp 91–109 Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. Springer US, Boston, pp 91–109
go back to reference Wang X, Gerber MS, Brown DE (2012) Automatic crime prediction using events extracted from twitter posts. In: International conference on social computing, behavioral-cultural modeling, and prediction. Springer, New York, pp 231–238 Wang X, Gerber MS, Brown DE (2012) Automatic crime prediction using events extracted from twitter posts. In: International conference on social computing, behavioral-cultural modeling, and prediction. Springer, New York, pp 231–238
go back to reference Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52CrossRef Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52CrossRef
go back to reference Yu X, Chum P, Sim KB (2014) Analysis the effect of PCA for feature reduction in non-stationary EEG based motor imagery of BCI system. Optik 125(3):1498–1502CrossRef Yu X, Chum P, Sim KB (2014) Analysis the effect of PCA for feature reduction in non-stationary EEG based motor imagery of BCI system. Optik 125(3):1498–1502CrossRef
Metadata
Title
Enhancing keyword correlation for event detection in social networks using SVD and k-means: Twitter case study
Authors
Ahmad Hany Hossny
Terry Moschuo
Grant Osborne
Lewis Mitchell
Nick Lothian
Publication date
01-12-2018
Publisher
Springer Vienna
Published in
Social Network Analysis and Mining / Issue 1/2018
Print ISSN: 1869-5450
Electronic ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-018-0519-9

Other articles of this Issue 1/2018

Social Network Analysis and Mining 1/2018 Go to the issue

Premium Partner