article

Towards social data platform: automatic topic-focused monitor for twitter stream

Authors:
Rui Li

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
Shengjie Wang

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL

Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL
View Profile

,
Kevin Chen-Chuan Chang

Advanced Digital Sciences Center, Illinois at Singapore, Singapore

Advanced Digital Sciences Center, Illinois at Singapore, Singapore
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 14pp 1966–1977https://doi.org/10.14778/2556549.2556577

Published:01 September 2013Publication History

Proceedings of the VLDB Endowment

Abstract

Many novel applications have been built based on analyzing tweets about specific topics. While these applications provide different kinds of analysis, they share a common task of monitoring "target" tweets from the Twitter stream for a topic. The current solution for this task tracks a set of manually selected keywords with Twitter APIs. Obviously, this manual approach has many limitations. In this paper, we propose a data platform to automatically monitor target tweets from the Twitter stream for any given topic. To monitor target tweets in an optimal and continuous way, we design Automatic Topic-focused Monitor (ATM), which iteratively 1) samples tweets from the stream and 2) selects keywords to track based on the samples. To realize ATM, we develop a tweet sampling algorithm to sample sufficient unbiased tweets with available Twitter APIs, and a keyword selection algorithm to efficiently select keywords that have a near-optimal coverage of target tweets under cost constraints. We conduct extensive experiments to show the effectiveness of ATM. E.g., ATM covers 90% of target tweets for a topic and improves the manual approach by 49%.

References

E. Agichtein and L. Gravano. Querying text databases for efficient information extraction. In ICDE, pages 113-124, 2003.Google Scholar
F. R. Bach. Structured sparsity-inducing norms through submodular functions. In NIPS, pages 118-126, 2010.Google Scholar
Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. J. ACM, 55(5):24:1-24:74, Nov. 2008. Google Scholar
M. Boanjak, E. Oliveira, J. Martins, E. Mendes Rodrigues, and L. Sarmento. Twitterecho: a distributed focused crawler to support open research with twitter data. In WWW Companion, pages 1233-1240, 2012. Google Scholar
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 31(11-16):1623-1640, 1999. Google Scholar
M. Efron. Hashtag retrieval in a microblogging environment. In SIGIR, pages 787-788, 2010. Google Scholar
M. Hurst and A. Maykov. Social streams blog crawler. In ICDE, pages 1615-1618, 2009. Google Scholar
P. G. Ipeirotis, L. Gravano, and M. Sahami. Qprober: A system for automatic classification of hidden-web databases. ACM TOIS, 21:1-41, 2003. Google Scholar
L. Katzir, E. Liberty, and O. Somekh. Estimating sizes of social networks via biased sampling. In WWW, pages 597-606, 2011. Google Scholar
D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137-146, 2003. Google Scholar
A. Kulik, H. Shachnai, and T. Tamir. Maximizing submodular set functions subject to multiple linear constraints. In SODA, pages 545-554, 2009. Google Scholar
R. Li, K. H. Lei, R. Khadiwala, and K. C.-C. Chang. Tedas: A twitter-based event detection and analysis system. In ICDE, pages 1273-1276, 2012. Google Scholar
L. Lovsz. Random Walks on Graphs: A Survey, 1993.Google Scholar
M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In SIGMOD, pages 1155-1158, 2010. Google Scholar
G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14:265-294, 1978.Google Scholar
C. Olston and M. Najork. Web crawling. Foundations and Trends? in Information Retrieval, 4(3):175-246, 2010. Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999.Google Scholar
S. Petrovic, M. Osborne, and V. Lavrenko. The edinburgh twitter corpus. In Workshop on Computational Linguistics in a World of Social Media, pages 25-26, 2010. Google Scholar
X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In WWW, pages 91-100, 2008. Google Scholar
S. E. Robertson and S. K. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129-146, 1976.Google Scholar
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW, pages 851-860, 2010. Google Scholar
T. Sprenger. Tweettrader.net: Leveraging crowd wisdom in a stock microblogging forum. In ICWSM, pages 663-664, 2011.Google Scholar
M. Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett., 32(1):41-43, 2004. Google Scholar
A. Tumasjan, T. Sprenger, P. Sandner, and I. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. In AAAI conference on weblogs and social media, pages 178-185, 2010.Google Scholar
Twitter. Streaming apis documentation. https://dev.twitter.com/docs/streaming-apis, 2012.Google Scholar
X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In CIKM, pages 1031-1040, 2011. Google Scholar
Wikipedia. Binomial proportion confidence interval. http://en.wikipedia.org/wiki/Binomial_proportion_ confidence_interval, Oct 2012.Google Scholar

Recommendations

Towards a social media analytics platform: event detection and user profiling for twitter
WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide Web

Microblog data differs significantly from the traditional text data with respect to a variety of dimensions. Microblog data contains short documents, SMS kind of language, and is full of code mixing. Though a lot of it is mere social babble, it also ...
Read More
Towards combating rumors in social networks: Models and metrics
Dynamic Networks and Knowledge Discovery

Rumor is a potentially harmful social phenomenon that has been observed in all human societies in all times. Social networking sites provide a platform for the rapid interchange of information and hence, for the rapid dissemination of unsubstantiated ...
Read More
Towards Events Tweet Contextualization Using Social Influence Model and Users Conversations
WIMS '15: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics

Nowadays, microblogging sites have completely changed the manner in which people communicate and share information. They are among the most relevant source of knowledge where information is created, exchanged and transformed, as witnessed by the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 14
September 2013
384 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 September 2013
Published in pvldb Volume 6, Issue 14
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 373
  Total Downloads
- Downloads (Last 12 months)24
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards social data platform: automatic topic-focused monitor for twitter stream

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Towards a social media analytics platform: event detection and user profiling for twitter

Towards combating rumors in social networks: Models and metrics

Towards Events Tweet Contextualization Using Social Influence Model and Users Conversations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards social data platform: automatic topic-focused monitor for twitter stream

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Towards a social media analytics platform: event detection and user profiling for twitter

Towards combating rumors in social networks: Models and metrics

Towards Events Tweet Contextualization Using Social Influence Model and Users Conversations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media