Skip to main content
Erschienen in: Social Network Analysis and Mining 1/2018

01.12.2018 | Original Article

Enhancing keyword correlation for event detection in social networks using SVD and k-means: Twitter case study

verfasst von: Ahmad Hany Hossny, Terry Moschuo, Grant Osborne, Lewis Mitchell, Nick Lothian

Erschienen in: Social Network Analysis and Mining | Ausgabe 1/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Extracting textual features from tweets is a challenging task due to the noisy nature of the content and the weak signal of most of the words used. In this paper, we propose using singular value decomposition (SVD) with clustering to group related words as enhanced signals for textual features in tweets in order to improve the correlation with events. The proposed method applies SVD to the time series vector for each feature to factorize the matrix of feature/day counts, to ensure the independence of the feature vectors. Then, k-means clustering is applied to build a look-up table that maps members of each cluster to the cluster centroid. The look-up table is used to map each feature in the original data to the centroid of its cluster. Then, we calculate the sum of the term-frequency vectors of all features in each cluster to the term-frequency vector of the cluster centroid. To evaluate the method, we calculated the correlations of the cluster centroids with the golden standard record vector before and after summing the vectors of the cluster members to the centroid vector. The proposed method is applied to multiple correlation techniques including the Pearson, Spearman, distance correlation, and Kendal Tao. The experiments also considered the different word forms and lengths of the features including keywords, n grams, skip grams, and bags-of-words. The correlation results are enhanced significantly as the highest correlation scores have increased from 0.22 to 0.70, and the average correlation scores have increased from 0.22 to 0.60.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459CrossRef Abdi H, Williams LJ (2010) Principal component analysis. Wiley Interdiscip Rev Comput Stat 2(4):433–459CrossRef
Zurück zum Zitat Anduiza E, Cristancho C, Sabucedo JM (2014) Mobilization through online social networks: the political protest of the indignados in spain. Inf Commun Soc 17(6):750–764CrossRef Anduiza E, Cristancho C, Sabucedo JM (2014) Mobilization through online social networks: the political protest of the indignados in spain. Inf Commun Soc 17(6):750–764CrossRef
Zurück zum Zitat Azzam A, Tazi N, Hossny A (2017) A question routing technique using deep neural network for communities of question answering. In: International conference on database systems for advanced applications. Springer, New York, pp 35–49 Azzam A, Tazi N, Hossny A (2017) A question routing technique using deep neural network for communities of question answering. In: International conference on database systems for advanced applications. Springer, New York, pp 35–49
Zurück zum Zitat Blankertz B, Tomioka R, Lemm S, Kawanabe M, Muller KR (2008) Optimizing spatial filters for robust eeg single-trial analysis. IEEE Signal Process Mag 25(1):41–56CrossRef Blankertz B, Tomioka R, Lemm S, Kawanabe M, Muller KR (2008) Optimizing spatial filters for robust eeg single-trial analysis. IEEE Signal Process Mag 25(1):41–56CrossRef
Zurück zum Zitat Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mac Learn Res 3(Jan):993–1022MATH Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mac Learn Res 3(Jan):993–1022MATH
Zurück zum Zitat Chen ZP, Morris J, Martin E, Hammond RB, Lai X, Ma C, Purba E, Roberts KJ, Bytheway R (2005) Enhancing the signal-to-noise ratio of X-ray diffraction profiles by smoothed principal component analysis. Anal Chem 77(20):6563–6570CrossRef Chen ZP, Morris J, Martin E, Hammond RB, Lai X, Ma C, Purba E, Roberts KJ, Bytheway R (2005) Enhancing the signal-to-noise ratio of X-ray diffraction profiles by smoothed principal component analysis. Anal Chem 77(20):6563–6570CrossRef
Zurück zum Zitat Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314MATHCrossRef Comon P (1994) Independent component analysis, a new concept? Signal Process 36(3):287–314MATHCrossRef
Zurück zum Zitat Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC Press, Boca RatonMATHCrossRef Diggle PJ (2013) Statistical analysis of spatial and spatio-temporal point patterns. CRC Press, Boca RatonMATHCrossRef
Zurück zum Zitat Dumais ST (2004) Latent semantic analysis. Annu Rev Inf Sci Technol 38(1):188–230CrossRef Dumais ST (2004) Latent semantic analysis. Annu Rev Inf Sci Technol 38(1):188–230CrossRef
Zurück zum Zitat Evangelopoulos NE (2013) Latent semantic analysis. Wiley Interdiscip Rev Cogn Sci 4(6):683–692CrossRef Evangelopoulos NE (2013) Latent semantic analysis. Wiley Interdiscip Rev Cogn Sci 4(6):683–692CrossRef
Zurück zum Zitat Fernández J, Gutiérrez Y, Gómez JM, Martınez-Barco P (2014) Gplsi: supervised sentiment analysis in twitter using skipgrams. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), number SemEval, pp 294–299 Fernández J, Gutiérrez Y, Gómez JM, Martınez-Barco P (2014) Gplsi: supervised sentiment analysis in twitter using skipgrams. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014), number SemEval, pp 294–299
Zurück zum Zitat Hamadache M, Lee D (2017) Principal component analysis based signal-to-noise ratio improvement for inchoate faulty signals: application to ball bearing fault detection. Int J Control Autom Syst 15(2):506–517CrossRef Hamadache M, Lee D (2017) Principal component analysis based signal-to-noise ratio improvement for inchoate faulty signals: application to ball bearing fault detection. Int J Control Autom Syst 15(2):506–517CrossRef
Zurück zum Zitat Hoffman M, Bach FR, Blei DM (2010) Online learning for latent Dirichlet allocation. In: advances in neural information processing systems, pp 856–864 Hoffman M, Bach FR, Blei DM (2010) Online learning for latent Dirichlet allocation. In: advances in neural information processing systems, pp 856–864
Zurück zum Zitat Hossny A, Shaalan K, Fahmy A (2008) Automatic morphological rule induction for Arabic. In: Proceedings of the LREC08 workshop on HLT & NLP within the Arabic world: Arabic language and local languages processing: status updates and prospects, pp 97–101 Hossny A, Shaalan K, Fahmy A (2008) Automatic morphological rule induction for Arabic. In: Proceedings of the LREC08 workshop on HLT & NLP within the Arabic world: Arabic language and local languages processing: status updates and prospects, pp 97–101
Zurück zum Zitat Hossny A, Shaalan K, Fahmy A (2009) Machine translation model using inductive logic programming. In: International conference on natural language processing and knowledge engineering, 2009. NLP-KE 2009. IEEE, pp 1–8 Hossny A, Shaalan K, Fahmy A (2009) Machine translation model using inductive logic programming. In: International conference on natural language processing and knowledge engineering, 2009. NLP-KE 2009. IEEE, pp 1–8
Zurück zum Zitat Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430CrossRef Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430CrossRef
Zurück zum Zitat Hyvärinen A, Karhunen J, Oja E (2004) Independent component analysis, vol 46. Wiley, New York Hyvärinen A, Karhunen J, Oja E (2004) Independent component analysis, vol 46. Wiley, New York
Zurück zum Zitat Jiang Z, Lin Z, Davis LS (2013) Label consistent k-svd: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35(11):2651–2664CrossRef Jiang Z, Lin Z, Davis LS (2013) Label consistent k-svd: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35(11):2651–2664CrossRef
Zurück zum Zitat Klema V, Laub A (1980) The singular value decomposition: its computation and some applications. IEEE Trans Autom Control 25(2):164–176MathSciNetMATHCrossRef Klema V, Laub A (1980) The singular value decomposition: its computation and some applications. IEEE Trans Autom Control 25(2):164–176MathSciNetMATHCrossRef
Zurück zum Zitat Koutsias N, Mallinis G, Karteris M (2009) A forward/backward principal component analysis of landsat-7 etm+ data to enhance the spectral signal of burnt surfaces. ISPRS J Photogramm Remote Sens 64(1):37–46CrossRef Koutsias N, Mallinis G, Karteris M (2009) A forward/backward principal component analysis of landsat-7 etm+ data to enhance the spectral signal of burnt surfaces. ISPRS J Photogramm Remote Sens 64(1):37–46CrossRef
Zurück zum Zitat Landauer TK (2006) Latent semantic analysis. Wiley Online Library Landauer TK (2006) Latent semantic analysis. Wiley Online Library
Zurück zum Zitat Lange K (2010) Singular value decomposition. Numerical analysis for statisticians, pp 129–142 Lange K (2010) Singular value decomposition. Numerical analysis for statisticians, pp 129–142
Zurück zum Zitat Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 155–164 Li C, Sun A, Datta A (2012) Twevent: segment-based event detection from tweets. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 155–164
Zurück zum Zitat Loper E, Bird S (2002) Nltk: the natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, vol 1, ETMTNLP ’02. Association for Computational Linguistics, Stroudsburg, pp 63–70. https://doi.org/10.3115/1118108.1118117 Loper E, Bird S (2002) Nltk: the natural language toolkit. In: Proceedings of the ACL-02 workshop on effective tools and methodologies for teaching natural language processing and computational linguistics, vol 1, ETMTNLP ’02. Association for Computational Linguistics, Stroudsburg, pp 63–70. https://​doi.​org/​10.​3115/​1118108.​1118117
Zurück zum Zitat Lotte F, Guan C (2011) Regularizing common spatial patterns to improve bci designs: unified theory and new algorithms. IEEE Trans Biomed Eng 58(2):355–362CrossRef Lotte F, Guan C (2011) Regularizing common spatial patterns to improve bci designs: unified theory and new algorithms. IEEE Trans Biomed Eng 58(2):355–362CrossRef
Zurück zum Zitat Martınez-Cámara E, Gutiérrez-Vázquez Y, Fernández J, Montejo-Ráez A, Munoz-Guillena R (2015) Ensemble classifier for twitter sentiment analysis Martınez-Cámara E, Gutiérrez-Vázquez Y, Fernández J, Montejo-Ráez A, Munoz-Guillena R (2015) Ensemble classifier for twitter sentiment analysis
Zurück zum Zitat Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Human language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 181–189. http://dl.acm.org/citation.cfm?id=1857999.1858020 Petrović S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to twitter. In: Human language technologies: the 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, HLT ’10. Association for Computational Linguistics, Stroudsburg, pp 181–189. http://​dl.​acm.​org/​citation.​cfm?​id=​1857999.​1858020
Zurück zum Zitat Potapov P, Longo P, Okunishi E (2017) Enhancement of noisy edx hrstem spectrum-images by combination of filtering and pca. Micron 96:29–37CrossRef Potapov P, Longo P, Okunishi E (2017) Enhancement of noisy edx hrstem spectrum-images by combination of filtering and pca. Micron 96:29–37CrossRef
Zurück zum Zitat Ramoser H, Muller-Gerking J, Pfurtscheller G (2000) Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehabil Eng 8(4):441–446CrossRef Ramoser H, Muller-Gerking J, Pfurtscheller G (2000) Optimal spatial filtering of single trial EEG during imagined hand movement. IEEE Trans Rehabil Eng 8(4):441–446CrossRef
Zurück zum Zitat Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for twitter considering dynamics of user interests and topic trends. In: EMNLP, pp 1977–1985 Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for twitter considering dynamics of user interests and topic trends. In: EMNLP, pp 1977–1985
Zurück zum Zitat Sun S, Zhang C, Lu Y (2008) The random electrode selection ensemble for eeg signal classification. Pattern Recognit 41(5):1663–1675MATHCrossRef Sun S, Zhang C, Lu Y (2008) The random electrode selection ensemble for eeg signal classification. Pattern Recognit 41(5):1663–1675MATHCrossRef
Zurück zum Zitat Tufekci Z, Wilson C (2012) Social media and the decision to participate in political protest: observations from Tahrir Square. J Commun 62(2):363–379CrossRef Tufekci Z, Wilson C (2012) Social media and the decision to participate in political protest: observations from Tahrir Square. J Commun 62(2):363–379CrossRef
Zurück zum Zitat Valenzuela S (2013) Unpacking the use of social media for protest behavior: the roles of information, opinion expression, and activism. Am Behav Sci 57(7):920–942CrossRef Valenzuela S (2013) Unpacking the use of social media for protest behavior: the roles of information, opinion expression, and activism. Am Behav Sci 57(7):920–942CrossRef
Zurück zum Zitat Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. Springer US, Boston, pp 91–109 Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. Springer US, Boston, pp 91–109
Zurück zum Zitat Wang X, Gerber MS, Brown DE (2012) Automatic crime prediction using events extracted from twitter posts. In: International conference on social computing, behavioral-cultural modeling, and prediction. Springer, New York, pp 231–238 Wang X, Gerber MS, Brown DE (2012) Automatic crime prediction using events extracted from twitter posts. In: International conference on social computing, behavioral-cultural modeling, and prediction. Springer, New York, pp 231–238
Zurück zum Zitat Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52CrossRef Wold S, Esbensen K, Geladi P (1987) Principal component analysis. Chemom Intell Lab Syst 2(1–3):37–52CrossRef
Zurück zum Zitat Yu X, Chum P, Sim KB (2014) Analysis the effect of PCA for feature reduction in non-stationary EEG based motor imagery of BCI system. Optik 125(3):1498–1502CrossRef Yu X, Chum P, Sim KB (2014) Analysis the effect of PCA for feature reduction in non-stationary EEG based motor imagery of BCI system. Optik 125(3):1498–1502CrossRef
Metadaten
Titel
Enhancing keyword correlation for event detection in social networks using SVD and k-means: Twitter case study
verfasst von
Ahmad Hany Hossny
Terry Moschuo
Grant Osborne
Lewis Mitchell
Nick Lothian
Publikationsdatum
01.12.2018
Verlag
Springer Vienna
Erschienen in
Social Network Analysis and Mining / Ausgabe 1/2018
Print ISSN: 1869-5450
Elektronische ISSN: 1869-5469
DOI
https://doi.org/10.1007/s13278-018-0519-9

Weitere Artikel der Ausgabe 1/2018

Social Network Analysis and Mining 1/2018 Zur Ausgabe

Premium Partner