Skip to main content

2015 | OriginalPaper | Buchkapitel

Mining Newsworthy Topics from Social Media

verfasst von : Carlos Martin, David Corney, Ayse Goker

Erschienen in: Advances in Social Media Analysis

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Newsworthy stories are increasingly being shared through social networking platforms such as Twitter and Reddit, and journalists now use them to rapidly discover stories and eye-witness accounts. We present a technique that detects “bursts” of phrases on Twitter that is designed for a real-time topic-detection system. We describe a time-dependent variant of the classic tf-idf approach and group together bursty phrases that often appear in the same messages in order to identify emerging topics. We demonstrate our methods by analysing tweets corresponding to events drawn from the worlds of politics and sport, as well as more general mainstream news. We created a user-centred “ground truth” to evaluate our methods, based on mainstream media accounts of the events. This helps ensure our methods remain practical. We compare several clustering and topic ranking methods to discover the characteristics of news-related collections, and show that different strategies are needed to detect emerging topics within them. We show that our methods successfully detect a range of different topics for each event and can retrieve messages (for example, tweets) that represent each topic for the user.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994) Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
2.
Zurück zum Zitat Aiello, L., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., Göker, A., Kompatsiaris, I., Jaimes, A.: Sensing trending topics in Twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013). doi: 10.1109/TMM.2013.2265080 Aiello, L., Petkos, G., Martin, C., Corney, D., Papadopoulos, S., Skraba, R., Göker, A., Kompatsiaris, I., Jaimes, A.: Sensing trending topics in Twitter. IEEE Trans. Multimedia 15(6), 1268–1282 (2013). doi: 10.​1109/​TMM.​2013.​2265080
3.
Zurück zum Zitat Alvanaki, F., Sebastian, M., Ramamritham, K., Weikum, G.: Enblogue: emergent topic detection in Web 2.0 streams. In: Proceedings of the 2011 International Conference on Management of Data, pp. 1271–1274. ACM (2011) Alvanaki, F., Sebastian, M., Ramamritham, K., Weikum, G.: Enblogue: emergent topic detection in Web 2.0 streams. In: Proceedings of the 2011 International Conference on Management of Data, pp. 1271–1274. ACM (2011)
4.
Zurück zum Zitat Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on Twitter. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM11) (2011) Becker, H., Naaman, M., Gravano, L.: Beyond trending topics: real-world event identification on Twitter. In: Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM11) (2011)
5.
Zurück zum Zitat Benhardus, J.: Streaming trend detection in Twitter. National Science Foundation REU for Artificial Intelligence, Natural Language Processing and Information Retrieval, University of Colarado (2010) Benhardus, J.: Streaming trend detection in Twitter. National Science Foundation REU for Artificial Intelligence, Natural Language Processing and Information Retrieval, University of Colarado (2010)
6.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
7.
Zurück zum Zitat Byrne, E., Corney, D.: Sweet FA: sentiment, swearing and soccer. In: ICMR2014 1st Workshop on Social Multimedia and Storytelling. Glasgow, UK (2014) Byrne, E., Corney, D.: Sweet FA: sentiment, swearing and soccer. In: ICMR2014 1st Workshop on Social Multimedia and Storytelling. Glasgow, UK (2014)
8.
Zurück zum Zitat Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: Proceedings of the 20th International Conference on World Wide Web, pp. 675–684. ACM (2011) Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: Proceedings of the 20th International Conference on World Wide Web, pp. 675–684. ACM (2011)
9.
Zurück zum Zitat Corney, D., Martin, C., Göker, A.: Spot the ball: detecting sports events on Twitter. In: ECIR 2014, pp. 449–454. Amsterdam, Holland (2014) Corney, D., Martin, C., Göker, A.: Spot the ball: detecting sports events on Twitter. In: ECIR 2014, pp. 449–454. Amsterdam, Holland (2014)
10.
Zurück zum Zitat Corney, D., Martin, C., Göker, A.: Two sides to every story: Subjective event summarization of sports events using Twitter. In: ICMR2014 1st Workshop on Social Multimedia and Storytelling. Glasgow, UK (2014) Corney, D., Martin, C., Göker, A.: Two sides to every story: Subjective event summarization of sports events using Twitter. In: ICMR2014 1st Workshop on Social Multimedia and Storytelling. Glasgow, UK (2014)
11.
Zurück zum Zitat Cunningham, B.: Re-thinking objectivity. Columbia. Journalism Rev. 42(2), 24–32 (2003) Cunningham, B.: Re-thinking objectivity. Columbia. Journalism Rev. 42(2), 24–32 (2003)
12.
Zurück zum Zitat Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological), 1–38 (1977) Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodological), 1–38 (1977)
13.
Zurück zum Zitat Dork, M., Gruen, D., Williamson, C., Carpendale, S.: A visual backchannel for large-scale events. IEEE Trans. Vis. Comput. Graph. 16(6), 1129–1138 (2010)CrossRef Dork, M., Gruen, D., Williamson, C., Carpendale, S.: A visual backchannel for large-scale events. IEEE Trans. Vis. Comput. Graph. 16(6), 1129–1138 (2010)CrossRef
14.
Zurück zum Zitat Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pp. 363–370. Stroudsburg, PA, USA (2005). doi:10.3115/1219840.1219885 Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL ’05, pp. 363–370. Stroudsburg, PA, USA (2005). doi:10.​3115/​1219840.​1219885
15.
Zurück zum Zitat Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)MATHCrossRef Fraley, C., Raftery, A.E.: How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)MATHCrossRef
17.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef
18.
Zurück zum Zitat He, D., Göker, A., Harper, D.: Combining evidence for automatic web session identification. Inf. Process. Manage. 38(5), 727–742 (2002)MATHCrossRef He, D., Göker, A., Harper, D.: Combining evidence for automatic web session identification. Inf. Process. Manage. 38(5), 727–742 (2002)MATHCrossRef
19.
Zurück zum Zitat Ifrim, G., Shi, B., Brigadir, I.: Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. In: Proceedings of the SNOW 2014 Data Challenge (2014) Ifrim, G., Shi, B., Brigadir, I.: Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. In: Proceedings of the SNOW 2014 Data Challenge (2014)
20.
Zurück zum Zitat Ku, L.W., Lee, L.Y., Wu, T.H., Chen, H.H.: Major topic detection and its application to opinion summarization. In: 28th ACM SIGIR Conference, pp. 627–628. ACM (2005) Ku, L.W., Lee, L.Y., Wu, T.H., Chen, H.H.: Major topic detection and its application to opinion summarization. In: 28th ACM SIGIR Conference, pp. 627–628. ACM (2005)
21.
Zurück zum Zitat Kubo, M., Sasano, R., Takamura, H., Okumura, M.: Generating live sports updates from Twitter by finding good reporters. In: IEEE/WIC/ACM International Joint WI-IAT Conferences, vol. 1, pp. 527–534. IEEE (2013) Kubo, M., Sasano, R., Takamura, H., Okumura, M.: Generating live sports updates from Twitter by finding good reporters. In: IEEE/WIC/ACM International Joint WI-IAT Conferences, vol. 1, pp. 527–534. IEEE (2013)
22.
Zurück zum Zitat Liu, B.: Sentiment analysis and subjectivity. In: N. Indurkhya, F.J. Damerau (eds.) Handbook of Natural Language Processing, 2nd edn. Chapman & Hall, Boca Raton (2010) Liu, B.: Sentiment analysis and subjectivity. In: N. Indurkhya, F.J. Damerau (eds.) Handbook of Natural Language Processing, 2nd edn. Chapman & Hall, Boca Raton (2010)
23.
Zurück zum Zitat Martin, C., Corney, D., Göker, A.: Finding newsworthy topics on Twitter. IEEE Comput. Soc. Spec. Tech. Community Soc. Netw. E-Letter 1(3) (2013) Martin, C., Corney, D., Göker, A.: Finding newsworthy topics on Twitter. IEEE Comput. Soc. Spec. Tech. Community Soc. Netw. E-Letter 1(3) (2013)
24.
Zurück zum Zitat Martin, C., Göker, A.: Real-time topic detection with bursty \(n\)-grams: RGU’s submission to the 2014 SNOW challenge. In: Proceedings of the SNOW 2014 Data Challenge (2014) Martin, C., Göker, A.: Real-time topic detection with bursty \(n\)-grams: RGU’s submission to the 2014 SNOW challenge. In: Proceedings of the SNOW 2014 Data Challenge (2014)
25.
Zurück zum Zitat Maynard, D., Bontcheva, K., Rout, D.: Challenges in developing opinion mining tools for social media. In: Proceedings of @NLP can u tag #usergeneratedcontent?! Workshop at LREC 2012. Turkey (2012) Maynard, D., Bontcheva, K., Rout, D.: Challenges in developing opinion mining tools for social media. In: Proceedings of @NLP can u tag #usergeneratedcontent?! Workshop at LREC 2012. Turkey (2012)
26.
Zurück zum Zitat Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)MATHCrossRef Murtagh, F.: A survey of recent advances in hierarchical clustering algorithms. Comput. J. 26(4), 354–359 (1983)MATHCrossRef
27.
Zurück zum Zitat Newman, N.: #ukelection2010, mainstream media and the role of the internet. Reuters Institute for the Study of Journalism working paper (2010) Newman, N.: #ukelection2010, mainstream media and the role of the internet. Reuters Institute for the Study of Journalism working paper (2010)
28.
Zurück zum Zitat Newman, N.: Mainstream media and the distribution of news in the age of social discovery. Reuters Institute for the Study of Journalism working paper (2011) Newman, N.: Mainstream media and the distribution of news in the age of social discovery. Reuters Institute for the Study of Journalism working paper (2011)
29.
Zurück zum Zitat Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., Ounis, I.: Bieber no more: First story detection using Twitter and Wikipedia. In: SIGIR 2012 Workshop on Time-aware Information Access (2012) Osborne, M., Petrovic, S., McCreadie, R., Macdonald, C., Ounis, I.: Bieber no more: First story detection using Twitter and Wikipedia. In: SIGIR 2012 Workshop on Time-aware Information Access (2012)
30.
Zurück zum Zitat Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of hashtags for enhanced event detection in Twitter. In: Proceedings of VLDB 2012 Workshop on Online Social Systems (2012) Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of hashtags for enhanced event detection in Twitter. In: Proceedings of VLDB 2012 Workshop on Online Social Systems (2012)
31.
Zurück zum Zitat Papadopoulos, S., Corney, D., Aiello, L.M.: SNOW 2014 data challenge: Assessing the performance of news topic detection methods in social media. In: Proceedings of the SNOW 2014 Data Challenge (2014) Papadopoulos, S., Corney, D., Aiello, L.M.: SNOW 2014 data challenge: Assessing the performance of news topic detection methods in social media. In: Proceedings of the SNOW 2014 Data Challenge (2014)
32.
Zurück zum Zitat Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Proceedings of Human Language Technologies: 2010 Conference of NAACL, vol. 10 (2010) Petrovic, S., Osborne, M., Lavrenko, V.: Streaming first story detection with application to Twitter. In: Proceedings of Human Language Technologies: 2010 Conference of NAACL, vol. 10 (2010)
33.
Zurück zum Zitat Petrovic, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of HTL12 Human Language Technologies, pp. 338–346 (2012) Petrovic, S., Osborne, M., Lavrenko, V.: Using paraphrases for improving first story detection in news and Twitter. In: Proceedings of HTL12 Human Language Technologies, pp. 338–346 (2012)
34.
Zurück zum Zitat Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 120–123 (2010) Phuvipadawat, S., Murata, T.: Breaking news detection and tracking in Twitter. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 120–123 (2010)
35.
Zurück zum Zitat Phuvipadawat, S., Murata, T.: Detecting a multi-level content similarity from microblogs based on community structures and named entities. J. Emerg. Technol. Web Intell. 3(1), 11–19 (2011) Phuvipadawat, S., Murata, T.: Detecting a multi-level content similarity from microblogs based on community structures and named entities. J. Emerg. Technol. Web Intell. 3(1), 11–19 (2011)
36.
Zurück zum Zitat Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Flammini, A., Menczer, F.: Detecting and tracking political abuse in social media. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2011) Ratkiewicz, J., Conover, M., Meiss, M., Gonçalves, B., Flammini, A., Menczer, F.: Detecting and tracking political abuse in social media. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM) (2011)
37.
Zurück zum Zitat Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: Proceedings of International Conference on Weblogs and Social Media (ICWSM) (2009) Sayyadi, H., Hurst, M., Maykov, A.: Event detection and tracking in social streams. In: Proceedings of International Conference on Weblogs and Social Media (ICWSM) (2009)
38.
39.
Zurück zum Zitat Shamma, D., Kennedy, L., Churchill, E.: Peaks and persistence: modeling the shape of microblog conversations. In: Proceedings of the ACM 2011 conference on Computer Supported Co-operative Work, pp. 355–358. ACM (2011) Shamma, D., Kennedy, L., Churchill, E.: Peaks and persistence: modeling the shape of microblog conversations. In: Proceedings of the ACM 2011 conference on Computer Supported Co-operative Work, pp. 355–358. ACM (2011)
40.
Zurück zum Zitat Spärck, J.K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972) Spärck, J.K.: A statistical interpretation of term specificity and its application in retrieval. J. Documentation 28(1), 11–21 (1972)
41.
Zurück zum Zitat Thurman, N., Walters, A.: Live blogging—digital journalism’s pivotal platform? A case study of the production, consumption, and form of live blogs at Guardian.co.uk. Digital Journalism 1(1), 82–101 (2013)CrossRef Thurman, N., Walters, A.: Live blogging—digital journalism’s pivotal platform? A case study of the production, consumption, and form of live blogs at Guardian.co.uk. Digital Journalism 1(1), 82–101 (2013)CrossRef
42.
Zurück zum Zitat van Oorschot, G., van Erp, M., Dijkshoorn, C.: Automatic extraction of soccer game events from Twitter. In: Proceedings of the Workshop on Detection, Representation, and Exploitation of Events in the Semantic Web (2012) van Oorschot, G., van Erp, M., Dijkshoorn, C.: Automatic extraction of soccer game events from Twitter. In: Proceedings of the Workshop on Detection, Representation, and Exploitation of Events in the Semantic Web (2012)
43.
Zurück zum Zitat Zhao, S., Zhong, L., Wickramasuriya, J., Vasudevan, V.: Human as real-time sensors of social and physical events: A case study of Twitter and sports games. arXiv preprint arXiv:1106.4300 (2011) Zhao, S., Zhong, L., Wickramasuriya, J., Vasudevan, V.: Human as real-time sensors of social and physical events: A case study of Twitter and sports games. arXiv preprint arXiv:​1106.​4300 (2011)
Metadaten
Titel
Mining Newsworthy Topics from Social Media
verfasst von
Carlos Martin
David Corney
Ayse Goker
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-18458-6_2