skip to main content
article

Towards social data platform: automatic topic-focused monitor for twitter stream

Authors Info & Claims
Published:01 September 2013Publication History
Skip Abstract Section

Abstract

Many novel applications have been built based on analyzing tweets about specific topics. While these applications provide different kinds of analysis, they share a common task of monitoring "target" tweets from the Twitter stream for a topic. The current solution for this task tracks a set of manually selected keywords with Twitter APIs. Obviously, this manual approach has many limitations. In this paper, we propose a data platform to automatically monitor target tweets from the Twitter stream for any given topic. To monitor target tweets in an optimal and continuous way, we design Automatic Topic-focused Monitor (ATM), which iteratively 1) samples tweets from the stream and 2) selects keywords to track based on the samples. To realize ATM, we develop a tweet sampling algorithm to sample sufficient unbiased tweets with available Twitter APIs, and a keyword selection algorithm to efficiently select keywords that have a near-optimal coverage of target tweets under cost constraints. We conduct extensive experiments to show the effectiveness of ATM. E.g., ATM covers 90% of target tweets for a topic and improves the manual approach by 49%.

References

  1. E. Agichtein and L. Gravano. Querying text databases for efficient information extraction. In ICDE, pages 113-124, 2003.Google ScholarGoogle Scholar
  2. F. R. Bach. Structured sparsity-inducing norms through submodular functions. In NIPS, pages 118-126, 2010.Google ScholarGoogle Scholar
  3. Z. Bar-Yossef and M. Gurevich. Random sampling from a search engine's index. J. ACM, 55(5):24:1-24:74, Nov. 2008. Google ScholarGoogle Scholar
  4. M. Boanjak, E. Oliveira, J. Martins, E. Mendes Rodrigues, and L. Sarmento. Twitterecho: a distributed focused crawler to support open research with twitter data. In WWW Companion, pages 1233-1240, 2012. Google ScholarGoogle Scholar
  5. S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks, 31(11-16):1623-1640, 1999. Google ScholarGoogle Scholar
  6. M. Efron. Hashtag retrieval in a microblogging environment. In SIGIR, pages 787-788, 2010. Google ScholarGoogle Scholar
  7. M. Hurst and A. Maykov. Social streams blog crawler. In ICDE, pages 1615-1618, 2009. Google ScholarGoogle Scholar
  8. P. G. Ipeirotis, L. Gravano, and M. Sahami. Qprober: A system for automatic classification of hidden-web databases. ACM TOIS, 21:1-41, 2003. Google ScholarGoogle Scholar
  9. L. Katzir, E. Liberty, and O. Somekh. Estimating sizes of social networks via biased sampling. In WWW, pages 597-606, 2011. Google ScholarGoogle Scholar
  10. D. Kempe, J. Kleinberg, and E. Tardos. Maximizing the spread of influence through a social network. In KDD, pages 137-146, 2003. Google ScholarGoogle Scholar
  11. A. Kulik, H. Shachnai, and T. Tamir. Maximizing submodular set functions subject to multiple linear constraints. In SODA, pages 545-554, 2009. Google ScholarGoogle Scholar
  12. R. Li, K. H. Lei, R. Khadiwala, and K. C.-C. Chang. Tedas: A twitter-based event detection and analysis system. In ICDE, pages 1273-1276, 2012. Google ScholarGoogle Scholar
  13. L. Lovsz. Random Walks on Graphs: A Survey, 1993.Google ScholarGoogle Scholar
  14. M. Mathioudakis and N. Koudas. Twittermonitor: trend detection over the twitter stream. In SIGMOD, pages 1155-1158, 2010. Google ScholarGoogle Scholar
  15. G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14:265-294, 1978.Google ScholarGoogle Scholar
  16. C. Olston and M. Najork. Web crawling. Foundations and Trends? in Information Retrieval, 4(3):175-246, 2010. Google ScholarGoogle Scholar
  17. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab, November 1999.Google ScholarGoogle Scholar
  18. S. Petrovic, M. Osborne, and V. Lavrenko. The edinburgh twitter corpus. In Workshop on Computational Linguistics in a World of Social Media, pages 25-26, 2010. Google ScholarGoogle Scholar
  19. X.-H. Phan, L.-M. Nguyen, and S. Horiguchi. Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In WWW, pages 91-100, 2008. Google ScholarGoogle Scholar
  20. S. E. Robertson and S. K. Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27(3):129-146, 1976.Google ScholarGoogle Scholar
  21. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW, pages 851-860, 2010. Google ScholarGoogle Scholar
  22. T. Sprenger. Tweettrader.net: Leveraging crowd wisdom in a stock microblogging forum. In ICWSM, pages 663-664, 2011.Google ScholarGoogle Scholar
  23. M. Sviridenko. A note on maximizing a submodular set function subject to a knapsack constraint. Oper. Res. Lett., 32(1):41-43, 2004. Google ScholarGoogle Scholar
  24. A. Tumasjan, T. Sprenger, P. Sandner, and I. Welpe. Predicting elections with twitter: What 140 characters reveal about political sentiment. In AAAI conference on weblogs and social media, pages 178-185, 2010.Google ScholarGoogle Scholar
  25. Twitter. Streaming apis documentation. https://dev.twitter.com/docs/streaming-apis, 2012.Google ScholarGoogle Scholar
  26. X. Wang, F. Wei, X. Liu, M. Zhou, and M. Zhang. Topic sentiment analysis in twitter: a graph-based hashtag sentiment classification approach. In CIKM, pages 1031-1040, 2011. Google ScholarGoogle Scholar
  27. Wikipedia. Binomial proportion confidence interval. http://en.wikipedia.org/wiki/Binomial_proportion_ confidence_interval, Oct 2012.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 6, Issue 14
    September 2013
    384 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 September 2013
    Published in pvldb Volume 6, Issue 14

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader