ABSTRACT
Twitter is a user-generated content system that allows its users to share short text messages, called tweets, for a variety of purposes, including daily conversations, URLs sharing and information news. Considering its world-wide distributed network of users of any age and social condition, it represents a low level news flashes portal that, in its impressive short response time, has the principal advantage.
In this paper we recognize this primary role of Twitter and we propose a novel topic detection technique that permits to retrieve in real-time the most emergent topics expressed by the community. First, we extract the contents (set of terms) of the tweets and model the term life cycle according to a novel aging theory intended to mine the emerging ones. A term can be defined as emerging if it frequently occurs in the specified time interval and it was relatively rare in the past. Moreover, considering that the importance of a content also depends on its source, we analyze the social relationships in the network with the well-known Page Rank algorithm in order to determine the authority of the users. Finally, we leverage a navigable topic graph which connects the emerging terms with other semantically related keywords, allowing the detection of the emerging topics, under user-specified time constraints. We provide different case studies which show the validity of the proposed approach.
- Trendistic. http://trendistic.com/.Google Scholar
- Tweet tabs. http://tweettabs.com/.Google Scholar
- Twitter API. http://apiwiki.twitter.com/.Google Scholar
- Twopular. http://twopular.com/.Google Scholar
- Where-what-when. http://where-what-when.husk.org/.Google Scholar
- S. Abrol and L. Khan. Twinner: understanding news queries with geo-content using twitter. In GIR '10: Proceedings of the 6th Workshop on Geographic Information Retrieval, pages 1--8, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- J. Allan, editor. Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Norwell, MA, USA, 2002. Google ScholarDigital Library
- M. Balabanovic and Y. Shoham. Fab: Content-based, collaborative recommendation. Communications of the ACM, 40:66--72, 1997. Google ScholarDigital Library
- K. K. Bun, M. Ishizuka, and B. M. Ishizuka. Topic extraction from news archive using tf*pdf algorithm. In Proceedings of 3rd Int'l Conference on Web Informtion Systems Engineering (WISE 2002), IEEE Computer Soc, pages 73--82. WISE, 2002. Google ScholarDigital Library
- M. Cataldi, C. Schifanella, K. S. Candan, M. L. Sapino, and L. D. Caro. Cosena: a context-based search and navigation system. In MEDES, pages 218--225. ACM, 2009. Google ScholarDigital Library
- C. C. Chen, Y.-T. Chen, Y. S. Sun, and M. C. Chen. Life cycle modeling of news events using aging theory. In ECML, pages 47--59, 2003.Google ScholarDigital Library
- J. Chen, W. Geyer, C. Dugan, M. Muller, and I. Guy. Make new friends, but keep the old: recommending people on social networking sites. In CHI '09: Proceedings of the 27th international conference on Human factors in computing systems, pages 201--210, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi. Short and tweet: Experiments on recommending content from information. Atlanta, USA, 2009. ACM Press.Google Scholar
- L. Di Caro, K. S. Candan, and M. L. Sapino. Using tagflake for condensing navigable tag hierarchies from tag clouds. In KDD '08: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1069--1072, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- A. Favenza, M. Cataldi, M. L. Sapino, and A. Messina. Topic development based refinement of audio-segmented television news. In NLDB '08, pages 226--232, Berlin, Heidelberg, 2008. Springer-Verlag. Google ScholarDigital Library
- G. P. C. Fung, J. X. Yu, H. Liu, and P. S. Yu. Time-dependent event hierarchy construction. In KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 300--309, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12):61--70, 1992. Google ScholarDigital Library
- T. L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101(Suppl. 1):5228--5235, April 2004.Google ScholarCross Ref
- A. Hassan, D. Radev, J. Cho, and A. Joshi. Content based recommendation and summarization in the blogosphere. International AAAI Conference on Weblogs and Social Media, 2009.Google Scholar
- Q. He, K. Chang, and E.-P. Lim. Using burstiness to improve clustering of topics in news streams. Data Mining, IEEE International Conference on, 0:493--498, 2007. Google ScholarDigital Library
- R. Jäschke, L. Marinho, A. Hotho, L. Schmidt-Thieme, and G. Stumme. Tag recommendations in folksonomies. In PKDD 2007, pages 506--514, Berlin, Heidelberg, 2007. Springer-Verlag. Google ScholarDigital Library
- J. Leskovec and C. Faloutsos. Sampling from large graphs. In KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 631--636, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- J. Makkonen, H. Ahonen-Myka, and M. Salmenkivi. Simple semantics in topic detection and tracking. Inf. Retr., 7(3--4):347--368, 2004. Google ScholarDigital Library
- P. Melville, R. J. Mooney, and R. Nagarajan. Content-boosted collaborative filtering. In In Proceedings of the 2001 SIGIR Workshop on Recommender Systems, 2001. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. In Proceedings of the 7th International World Wide Web Conference, pages 161--172, Brisbane, Australia, 1998.Google Scholar
- Y. Qi and K. S. Candan. Cuts: Curvature-based development pattern analysis and segmentation for blogs and other text streams. In HYPERTEXT '06, pages 1--10, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- I. Ruthven and M. Lalmas. A survey on the use of relevance feedback for information access systems. Knowl. Eng. Rev., 18(2):95--145, June 2003. Google ScholarDigital Library
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. In Information Processing and Management, pages 513--523, 1988. Google ScholarDigital Library
- M. O. Takeshi Sakaki and Y. Matsuo. Earthquake shakes twitter users: Real-time event detection by social sensors. In WWW 2010, 2010. Google ScholarDigital Library
- P. Treeratpituk and J. Callan. Automatically labeling hierarchical clusters. In dg.o '06: Proceedings of the 2006 international conference on Digital government research, pages 167--176, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- C. Wang, M. Zhang, L. Ru, and S. Ma. Automatic online news topic ranking using media focus and user attention based on aging theory. In CIKM '08, pages 1033--1042, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- Y. Wu, Y. Ding, X. Wang, and J. Xu. On-line hot topic recommendation using tolerance rough set based topic clustering. Journal of Computers, 5(4), 2010.Google ScholarCross Ref
- Q. Zhao, P. Mitra, and B. Chen. Temporal and information flow based event detection from social text streams. In AAAI'07: Proceedings of the 22nd national conference on Artificial intelligence, pages 1501--1506. AAAI Press, 2007. Google ScholarDigital Library
Index Terms
- Emerging topic detection on Twitter based on temporal and social terms evaluation
Recommendations
Personalized emerging topic detection based on a term aging model
Special Section on Intelligent Mobile Knowledge Discovery and Management Systems and Special Issue on Social Web MiningTwitter is a popular microblogging service that acts as a ground-level information news flashes portal where people with different background, age, and social condition provide information about what is happening in front of their eyes. This ...
Finding news-topic oriented influential twitter users based on topic related hashtag community detection
Recently, more and more users would like to collect and provide information about news topics in Twitter, which is one of the most popular microblogging services. Virtual communities defined by hashtags in Twitter are created for exchanging information ...
Social sentiment sensor: a visualization system for topic detection and topic sentiment analysis on microblog
As a new form of social media, microblogging provides platform sharing, wherein users can share their feelings and ideas on certain topics. Bursty topics from microblogs are the results of the emerging issues that instantly attract more followers and ...
Comments