Skip to main content
Erschienen in:
Buchtitelbild

2015 | OriginalPaper | Buchkapitel

Adaptive Identification of Hashtags for Real-Time Event Data Collection

verfasst von : Xinyue Wang, Laurissa Tokarchuk, Felix Cuadrado, Stefan Poslad

Erschienen in: Recommendation and Search in Social Networks

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The widespread use of microblogging services, such as Twitter, makes them a valuable tool to correlate people’s personal opinions about popular public events. Researchers have capitalized on such tools to detect and monitor real-world events based on this public, social, perspective. Most Twitter event analysis approaches rely on event tweets collected through a set of predefined keywords. In this paper, we show that the existing data collection approaches risk losing a significant amount of event-relevant information. We propose a refined adaptive crawling model, to detect emerging popular topics, using hashtags, and monitor them to retrieve greater amounts of highly associated data for the events of interest. The proposed adaptive crawling model expands the queries periodically by analyzing the traffic pattern of hashtags collected from a live Twitter stream. We evaluated this adaptive crawling model with a real-world event. Based on the theoretical analysis, we tuned the parameters and ran three crawlers, including one baseline and two adaptive crawlers, during the 2013 Glastonbury music festival. Our analysis shows that adaptive crawling based on a Refined Keyword Adaptation algorithm outperforms the others. It collects the most comprehensive set of keywords, and with the minimal introduction of noise.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Twitter Home page, https://​twitter.​com/​.
 
3
Search API only returns tweets within 7 days, and the rate limit of Search API is not specified in the official documentation. Version 1.0.
 
4
Streaming API provides real-time services but only returns 1 % of total number of tweets. Version 1.0.
 
5
At time of publication, access to the full Firehose stream of tweets is allowed only if a large amount of money is paid, e.g., PowerTrack costs $2,000 per month plus $0.10 per 1,000 tweets delivered. Retrieved from: http://​gnip.​com/​pr_​announcing_​power_​track.
 
8
Twitter API Documentations: https://​dev.​twitter.​com/​.
 
Literatur
1.
Zurück zum Zitat Zhao D, Rosson MB (2009) How and why people Twitter: the role that micro-blogging plays in informal communication at work. In: Proceedings of the ACM 2009 international conference on supporting group work (GROUP’09), pp 243–252 Zhao D, Rosson MB (2009) How and why people Twitter: the role that micro-blogging plays in informal communication at work. In: Proceedings of the ACM 2009 international conference on supporting group work (GROUP’09), pp 243–252
2.
Zurück zum Zitat Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide web (WWW’10), pp 851–860 Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on world wide web (WWW’10), pp 851–860
3.
Zurück zum Zitat Starbird K, Palen L (2012) (How) will the revolution be retweeted?: information diffusion and the 2011 Egyptian uprising. In: Proceedings of the ACM 2012 conference on computer supported cooperative work (CSCW’12), pp 7–16 Starbird K, Palen L (2012) (How) will the revolution be retweeted?: information diffusion and the 2011 Egyptian uprising. In: Proceedings of the ACM 2012 conference on computer supported cooperative work (CSCW’12), pp 7–16
4.
Zurück zum Zitat Becker H, Iter D, Naaman M, Gravano L (2012) Identifying content for planned events across social media sites. In: Proceedings of the fifth ACM international conference on web search and data mining (WSDM’12), pp 533–542 Becker H, Iter D, Naaman M, Gravano L (2012) Identifying content for planned events across social media sites. In: Proceedings of the fifth ACM international conference on web search and data mining (WSDM’12), pp 533–542
5.
Zurück zum Zitat Chakrabarti D, Punera K (2011) Event summarization using tweets. In: Proceedings of the 5th international AAAI conference on weblogs and social media (ICWSM’11), pp 66–73 Chakrabarti D, Punera K (2011) Event summarization using tweets. In: Proceedings of the 5th international AAAI conference on weblogs and social media (ICWSM’11), pp 66–73
6.
Zurück zum Zitat Liu SB, Palen L (2009) Spatiotemporal mashups: a survey of current tools to inform next generation crisis support. In: Proceedings of the 6th international conference on information systems for crisis response and management (ISCRAM’09) Liu SB, Palen L (2009) Spatiotemporal mashups: a survey of current tools to inform next generation crisis support. In: Proceedings of the 6th international conference on information systems for crisis response and management (ISCRAM’09)
7.
Zurück zum Zitat Krishnamurthy B, Gill P, Arlitt M (2008) A few chirps about twitter. In: Proceedings of the first workshop on online social networks (WOSN’08), pp 19–24 Krishnamurthy B, Gill P, Arlitt M (2008) A few chirps about twitter. In: Proceedings of the first workshop on online social networks (WOSN’08), pp 19–24
8.
Zurück zum Zitat Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: what 140 characters reveal about political sentiment. In: Proceedings 4th international AAAI conference on weblogs and social media (ICWSM’10), pp 178–185 Tumasjan A, Sprenger TO, Sandner PG, Welpe IM (2010) Predicting elections with twitter: what 140 characters reveal about political sentiment. In: Proceedings 4th international AAAI conference on weblogs and social media (ICWSM’10), pp 178–185
9.
Zurück zum Zitat Abel F, Celik I, Houben G-J, Siehndel P (2011) Leveraging the semantics of tweets for adaptive faceted search on twitter. In: Proceedings of the 10th international conference on the semantic web (ISWC’11), pp 1–17 Abel F, Celik I, Houben G-J, Siehndel P (2011) Leveraging the semantics of tweets for adaptive faceted search on twitter. In: Proceedings of the 10th international conference on the semantic web (ISWC’11), pp 1–17
10.
Zurück zum Zitat Bifet A, Holmes G, Pfahringer B (2011) MOA-TweetReader: real-time analysis in Twitter streaming data. In: Proceedings of the 14th international conference on discovery science (DS’11), pp 46–60 Bifet A, Holmes G, Pfahringer B (2011) MOA-TweetReader: real-time analysis in Twitter streaming data. In: Proceedings of the 14th international conference on discovery science (DS’11), pp 46–60
11.
Zurück zum Zitat Petrovi S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics (HLT’10), pp 181–189 Petrovi S, Osborne M, Lavrenko V (2010) Streaming first story detection with application to Twitter. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics (HLT’10), pp 181–189
12.
Zurück zum Zitat Huberman BA, Romero DM, Wu F (2008) Social networks that matter: Twitter under the microscope Huberman BA, Romero DM, Wu F (2008) Social networks that matter: Twitter under the microscope
13.
Zurück zum Zitat Liang F, Qiang R, Yang J (2012) Exploiting real-time information retrieval in the microblogosphere. In: Proceedings of the 12th ACM/IEEE-CS joint conference on digital libraries (JCDL’12), pp 267–276 Liang F, Qiang R, Yang J (2012) Exploiting real-time information retrieval in the microblogosphere. In: Proceedings of the 12th ACM/IEEE-CS joint conference on digital libraries (JCDL’12), pp 267–276
14.
Zurück zum Zitat Tsur O, Rappoport A (2012) What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the fifth ACM international conference on web search and data mining (WSDM’12), pp 643–652 Tsur O, Rappoport A (2012) What’s in a hashtag?: content based prediction of the spread of ideas in microblogging communities. In: Proceedings of the fifth ACM international conference on web search and data mining (WSDM’12), pp 643–652
15.
Zurück zum Zitat Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media. In: Proceedings of the 19th international conference on world wide web (WWW’10), pp 591–600 Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media. In: Proceedings of the 19th international conference on world wide web (WWW’10), pp 591–600
16.
Zurück zum Zitat Naghavi M, Sharifi M (2012) A proposed architecture for continuous web monitoring through online crawling of blogs. Int J UbiComp 3(1):11–20CrossRef Naghavi M, Sharifi M (2012) A proposed architecture for continuous web monitoring through online crawling of blogs. Int J UbiComp 3(1):11–20CrossRef
17.
Zurück zum Zitat Lanagan J, Smeaton A (2011) Using Twitter to detect and tag important events in sports media. In: Proceedings of the fifth ACM international conference international AAAI conference on weblogs and social media (ICWSM’11) Lanagan J, Smeaton A (2011) Using Twitter to detect and tag important events in sports media. In: Proceedings of the fifth ACM international conference international AAAI conference on weblogs and social media (ICWSM’11)
18.
Zurück zum Zitat Nichols J, Mahmud J, Drews C (2012) Summarizing sporting events using Twitter. In: Proceedings of the 2012 ACM international conference on intelligent user interfaces (IUI’12), pp 189–198 Nichols J, Mahmud J, Drews C (2012) Summarizing sporting events using Twitter. In: Proceedings of the 2012 ACM international conference on intelligent user interfaces (IUI’12), pp 189–198
19.
Zurück zum Zitat Yin J, Lampert A, Cameron M, Robinson B, Power R (2012) Using social media to enhance emergency situation awareness. IEEE Intell Syst. 27(6):52–59CrossRef Yin J, Lampert A, Cameron M, Robinson B, Power R (2012) Using social media to enhance emergency situation awareness. IEEE Intell Syst. 27(6):52–59CrossRef
20.
Zurück zum Zitat Perez-Tellez F, Pinto D, Cardiff J, Rosso P (2010) On the difficulty of clustering company tweets. In: Proceedings of the 2nd international workshop on search and mining user generated contents (SMUC’10), pp 95–102 Perez-Tellez F, Pinto D, Cardiff J, Rosso P (2010) On the difficulty of clustering company tweets. In: Proceedings of the 2nd international workshop on search and mining user generated contents (SMUC’10), pp 95–102
21.
Zurück zum Zitat Kontostathis A, Galitsky L, Pottenger WM, Roy S, Phelps DJ (2003) A survey of emerging trend detection in textual data mining. In: Michael W Berry (ed) Survey of Text mining. Springer-Verlag, New York, pp 185–224 Kontostathis A, Galitsky L, Pottenger WM, Roy S, Phelps DJ (2003) A survey of emerging trend detection in textual data mining. In: Michael W Berry (ed) Survey of Text mining. Springer-Verlag, New York, pp 185–224
22.
Zurück zum Zitat Mathioudakis M, Koudas N (2010) TwitterMonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, Indianapolis, 06–10 June 2010 Mathioudakis M, Koudas N (2010) TwitterMonitor: trend detection over the twitter stream. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, Indianapolis, 06–10 June 2010
23.
Zurück zum Zitat Cataldi M, Caro LD, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the 10th international workshop on multimedia data mining, Washington, 25–25 July 2010, pp 1–10 Cataldi M, Caro LD, Schifanella C (2010) Emerging topic detection on Twitter based on temporal and social terms evaluation. In: Proceedings of the 10th international workshop on multimedia data mining, Washington, 25–25 July 2010, pp 1–10
24.
Zurück zum Zitat AlSumait L, Barbar D, Domeniconi C (2008) On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 2008 eighth IEEE international conference on data mining (ICDM’08), 15–19 Dec 2008, pp 3–12 AlSumait L, Barbar D, Domeniconi C (2008) On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 2008 eighth IEEE international conference on data mining (ICDM’08), 15–19 Dec 2008, pp 3–12
25.
Zurück zum Zitat Lau JH, Collier N, Baldwin T (2012) On-line trend analysis with topic models: #twittertrends detection topic model online. In: Proceedings of the 24th international conference of on computational linguistics, pp 1519–1534 Lau JH, Collier N, Baldwin T (2012) On-line trend analysis with topic models: #twittertrends detection topic model online. In: Proceedings of the 24th international conference of on computational linguistics, pp 1519–1534
26.
Zurück zum Zitat Hong L, Davison BD (2010) Empirical study of topic modeling in Twitter. In: Proceedings of the first workshop on social media analytics (SOMA’10), ACM, New York, pp 80–88 Hong L, Davison BD (2010) Empirical study of topic modeling in Twitter. In: Proceedings of the first workshop on social media analytics (SOMA’10), ACM, New York, pp 80–88
27.
Zurück zum Zitat Varga A, Cano AE, Ciravegna F (2012) Exploring the similarity between social knowledge sources and twitter for cross-domain topic classification. In: Proceedings 11th international semantic web conference on knowledge extraction and consolidation from social media (ISWC 2012) Varga A, Cano AE, Ciravegna F (2012) Exploring the similarity between social knowledge sources and twitter for cross-domain topic classification. In: Proceedings 11th international semantic web conference on knowledge extraction and consolidation from social media (ISWC 2012)
28.
Zurück zum Zitat Abhik D, Toshniwal D (2013) Sub-event detection of natural hazards using features of social media data. In: International world wide web workshop on social web for disaster management (SWDM’13), Rio de Janeiro, Brazil Abhik D, Toshniwal D (2013) Sub-event detection of natural hazards using features of social media data. In: International world wide web workshop on social web for disaster management (SWDM’13), Rio de Janeiro, Brazil
29.
Zurück zum Zitat Wang X, Tokarchuk L, Cuadrado F, Poslad S (2013) Exploiting hashtags for adaptive microblog crawling. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM 2013) Wang X, Tokarchuk L, Cuadrado F, Poslad S (2013) Exploiting hashtags for adaptive microblog crawling. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM 2013)
30.
Zurück zum Zitat Baray MB, Kurt H, On-line new event detection and tracking in a multi-resource environment. Unpublished master’s thesis, Bilkent University, Computer Engineering Department Baray MB, Kurt H, On-line new event detection and tracking in a multi-resource environment. Unpublished master’s thesis, Bilkent University, Computer Engineering Department
31.
Zurück zum Zitat Byun C, Kim Y, Lee H, Ko Kim K (2012) Automated Twitter data collecting tool and case study with rule-based analysis. In: Proceedings of the 14th international conference on information integration and web-based applications and services (IIWAS’12), pp 196–204 Byun C, Kim Y, Lee H, Ko Kim K (2012) Automated Twitter data collecting tool and case study with rule-based analysis. In: Proceedings of the 14th international conference on information integration and web-based applications and services (IIWAS’12), pp 196–204
32.
Zurück zum Zitat Boanjak M, Oliveira E, Martins J, Rodrigues EM, Sarmento L (2012) TwitterEcho: a distributed focused crawler to support open research with Twitter data. In: Proceedings of the 21st international conference companion on world wide web (WWW’12 Companion), pp 1233–1240 Boanjak M, Oliveira E, Martins J, Rodrigues EM, Sarmento L (2012) TwitterEcho: a distributed focused crawler to support open research with Twitter data. In: Proceedings of the 21st international conference companion on world wide web (WWW’12 Companion), pp 1233–1240
Metadaten
Titel
Adaptive Identification of Hashtags for Real-Time Event Data Collection
verfasst von
Xinyue Wang
Laurissa Tokarchuk
Felix Cuadrado
Stefan Poslad
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-14379-8_1