ABSTRACT
Near real-time event streams are becoming a key feature of many popular web applications. Many web sites allow users to create a personalized feed by selecting one or more event streams they wish to follow. Examples include Twitter and Facebook, which a user to follow other users' activity, and iGoogle and My Yahoo, which allow users to follow selected RSS streams. How can we efficiently construct a web page showing the latest events from a user's feed? Constructing such a feed must be fast so the page loads quickly, yet reflects recent updates to the underlying event streams. The wide fanout of popular streams (those with many followers) and high skew (fanout and update rates vary widely) make it difficult to scale such applications.
We associate feeds with consumers and event streams with producers. We demonstrate that the best performance results from selectively materializing each consumer's feed: events from high-rate producers are retrieved at query time, while events from lower-rate producers are materialized in advance. A formal analysis of the problem shows the surprising result that we can minimize global cost by making local decisions about each producer/consumer pair, based on the ratio between a given producer's update rate (how often an event is added to the stream) and a given consumer's view rate (how often the feed is viewed). Our experimental results, using Yahoo!'s web-scale database PNUTS, shows that this hybrid strategy results in the lowest system load (and hence improves scalability) under a variety of workloads.
- Twitter API. http://apiwiki.twitter.com/.Google Scholar
- D. Agrawal, A. E. Abbadi, A. K. Singh, and T. Yurek. Efficient view maintenance at data warehouses. In SIGMOD, 1997. Google ScholarDigital Library
- P. Agrawal et al. Asynchronous view maintenance for VLSD databases. In SIGMOD, 2009. Google ScholarDigital Library
- S. Agrawal, S. Chaudhuri, and V. Narasayya. Automated selection of materialized views and indexes for SQL databases. In VLDB, 2000. Google ScholarDigital Library
- S. Babu, K. Munagala, and J. Widom. Adaptive caching for continuous queries. In ICDE, 2005. Google ScholarDigital Library
- M. Cafarella et al. Data management projects at Google. SIGMOD Record, 34--38(1), March 2008. Google ScholarDigital Library
- S. Ceri and J. Widom. Deriving production rules for incremental view maintenance. In VLDB, 1991. Google ScholarDigital Library
- B. Chandramouli and J. Yang. End-to-end support for joins in large-scale publish/subscribe systems. In VLDB, 2008. Google ScholarDigital Library
- B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In VLDB, 2008. Google ScholarDigital Library
- I. Eure. Looking to the future with Cassandra. http://blog.digg.com/?p=966.Google Scholar
- C. Garrod et al. Scalable query result caching for web applications. In VLDB, 2008. Google ScholarDigital Library
- G. Graefe. B-tree indexes for high update rates. SIGMOD Record, 35(1):39--44, March 2006. Google ScholarDigital Library
- H. Gupta. Selection of views to materialize in a data warehouse. In ICDT, 1997. Google ScholarDigital Library
- P. Haas and J. Hellerstein. Ripple joins for online aggregation. In SIGMOD, 1999. Google ScholarDigital Library
- A. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4):270--294, 2001. Google ScholarDigital Library
- G. Luo. Partial Materialized Views. In ICDE, 2007.Google ScholarCross Ref
- G. Luo, J. F. Naughton, C. J. Ellmann, and M. Watzke. A comparison of three methods for join view maintenance in parallel RDBMS. In ICDE, 2003.Google ScholarCross Ref
- S. Madden, M. Franklin, J. Hellerstein, and W. Hong. TinyDB: An acquisitional query processing system for sensor networks. TODS, 30(1):122--173, March 2005. Google ScholarDigital Library
- S. Martello and P. Toth. Knapsack Problems: Algorithms and Computer Implementations. Wiley-Interscience, 1990. Google ScholarDigital Library
- C. Mohan and I. Narang. Algorithms for creating indexes for very large tables without quiescing updates. In SIGMOD, 1992. Google ScholarDigital Library
- C. Olston, J. Jiang, and J. Widom. Adaptive filters for continuous queries over distributed data streams. In SIGMOD, 2003. Google ScholarDigital Library
- C. Re and D. Suciu. Materialized views in probabilistic databases: for information exchange and query optimization. In VLDB, 2007. Google ScholarDigital Library
- K. Salem, K. Beyer, B. Lindsay, and R. Cochrane. How to roll a join: Asynchronous incremental view maintenance. In SIGMOD, 2000. Google ScholarDigital Library
- P. Seshadri and A. Swami. Generalized partial indexes. In ICDE, 1995. Google ScholarDigital Library
- M. Stonebraker. The case for partial indexes. SIGMOD Record, 18(4):4--11, 1989. Google ScholarDigital Library
- E. Weaver. Improving running components at Twitter. http://blog.evanweaver.com/articles/2009/03/13/-qcon-presentation/.Google Scholar
- S. Wu, J. Li, B. Ooi, and K.-L. Tan. Just-in-time query retrieval over partially indexed data on structured P2P overlays. In SIGMOD, 2008. Google ScholarDigital Library
- J. Yang and J. Widom. Temporal view self-maintenance. In EDBT, 2000. Google ScholarDigital Library
- K. Yi et al. Efficient maintenance of materialized top-k views. In ICDE, 2003.Google Scholar
- J. Zhou, P.-A. Larson, and H. G. Elmongui. Lazy maintenance of materialized views. In VLDB, 2007. Google ScholarDigital Library
- Y. Zhou, A. Salehi, and K. Aberer. Scalable delivery of stream query results. In VLDB, 2009. Google ScholarDigital Library
- Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom. View maintenance in a warehousing environment. In SIGMOD, 1995. Google ScholarDigital Library
Index Terms
- Feeding frenzy: selectively materializing users' event feeds
Recommendations
Identifying the influential bloggers in a community
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningBlogging becomes a popular way for a Web user to publish information on the Web. Bloggers write blog posts, share their likes and dislikes, voice their opinions, provide suggestions, report news, and form groups in Blogosphere. Bloggers form their ...
Feeding the world: a comprehensive dataset and analysis of a real world snapshot of web feeds
iiWAS '11: Proceedings of the 13th International Conference on Information Integration and Web-based Applications and ServicesWeb feeds allow users to retrieve new content from pages on the World Wide Web. Feeds are offered by a multitude of web pages, ranging from conventional news sites to pages with user generated content such as wikis, forums or personal blogs. They notify ...
ViewDF: Declarative incremental view maintenance for streaming data
Highlights- This paper presents a flexible and declarative framework for incremental view maintenance over streaming data.
AbstractWe present ViewDF: a flexible and declarative framework for incremental maintenance of materialized views (i.e., results of continuous queries) over streaming data. The main component of the proposed framework is the View Delta ...
Comments