ABSTRACT
Web-based enterprises process events generated by millions of users interacting with their websites. Rich statistical data distilled from combining such interactions in near real-time generates enormous business value. In this paper, we describe the architecture of Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency, where the streams may be unordered or delayed. The system fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention. Photon guarantees that there will be no duplicates in the joined output (at-most-once semantics) at any point in time, that most joinable events will be present in the output in real-time (near-exact semantics), and exactly-once semantics eventually.
Photon is deployed within Google Advertising System to join data streams such as web search queries and user clicks on advertisements. It produces joined logs that are used to derive key business metrics, including billing for advertisers. Our production deployment processes millions of events per minute at peak with an average end-to-end latency of less than 10 seconds. We also present challenges and solutions in maintaining large persistent state across geographically distant locations, and highlight the design principles that emerged from our experience.
- D. J. Abadi et al. "The Design of the Borealis Stream Processing Engine". Proc. of CIDR 2005, pp.277--289.Google Scholar
- J. Baker et al. "Megastore: Providing scalable, highly available storage for interactive devices". Proc. of CIDR 2011, pp.223--234.Google Scholar
- S. Blanas et al. "A comparison of join algorithms for log processing in Mapreduce". Proc. of SIGMOD 2010, pp.975--986. Google ScholarDigital Library
- T. D. Chandra, R. Griesemer, and J. Redstone. "Paxos made live: an engineering perspective". Proc. of ACM PODC 2007, pp.398--407. Google ScholarDigital Library
- S. Chandrasekaran and M. J. Franklin. "Streaming queries over streaming data". Proc. of VLDB 2002, pp.203--214. Google ScholarDigital Library
- F. Chang et al. "Bigtable: A Distributed Storage System for Structured Data". ACM TOCS 2008, 26.2, pp.4:1-4:26. Google ScholarDigital Library
- E. F. Codd. "A Relational Model of Data for Large Shared Data Banks", Communications of the ACM 13 (6): p377--387, 1970. Google ScholarDigital Library
- J. C. Corbett et al. "Spanner: Google's Globally-Distributed Database". Proc. of OSDI 2012. Google ScholarDigital Library
- A. Das, J. Gehrke, and M. Riedewald. "Approximate join processing over data streams". Proc. of SIGMOD 2003, pp.40--51. Google ScholarDigital Library
- J. Dean and S. Ghemawat. "MapReduce: Simplified data processing on large clusters". Proc. of OSDI 2004, pp.137--149. Google ScholarDigital Library
- G. DeCandia et al. "Dynamo: Amazon's Highly Available Key-value Store". Proc. of SOSP. 2007, pp. 205--220. Google ScholarDigital Library
- B. Fitzpatrick. "Distributed Caching with Memcached". Linux Journal, Issue 124, 2004, pp.5. Google ScholarDigital Library
- B. Gedik, P. S. Yu, and R. R. Bordawekar. "Executing stream joins on the cell processor", Proc. of VLDB 2007, pp.363--374. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S-T Leung. "The Google File System". 19th Symposium on Operating Systems Principles 2003, pp.20--43. Google ScholarDigital Library
- M. A. Hammad, W. G. Aref, and A. K. Elmagarmid. "Joining multiple data streams with window constraints". Computer Science Technical Reports, #02-115.Google Scholar
- J. Kang, J. F. Naughton, and S. D. Viglas. "Evaluating window joins over unbounded streams". Proc. of VLDB 2002, pp.341--352.Google Scholar
- D. Karger et al. "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web". Proc. of ACM SOTC 1997, pp.654--663. Google ScholarDigital Library
- L. Lamport. "The part-time parliament", ACM TOCS 16.2 1998, pp.133--169. Google ScholarDigital Library
- P. Mishra and M. H. Eich. "Join processing in relational databases". ACM Computing Surveys 1992, 24(1), pp.63--113. Google ScholarDigital Library
- L. Neumeyer et al. "S4: Distributed Stream Computing Platform". Proc. of KDCloud 2010. Google ScholarDigital Library
- J. Rao, E. J. Shekita, and S. Tata. "Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore". Proc. of VLDB 2011, pp.243--254. Google ScholarDigital Library
- F. B. Schneider. "Implementing fault-tolerant services using the state machine approach: A tutorial", ACM Computing Surveys 22 1990, pp.299--319 (1990). Google ScholarDigital Library
- D. Shasha and P. Bonnet. "Database Tuning: Principles, Experiments, and Troubleshooting Techniques". Proc. of SIGMOD 2004, pp.115--116. Google ScholarDigital Library
- J. Teubner and R. Mueller. "How soccer players would do stream joins". Proc. of SIGMOD 2011, pp.625--636. Google ScholarDigital Library
- J. Xie and J. Yang. "A survey of join processing in data streams". Data Streams - Models and Algorithms 2007, pp.209--236.Google Scholar
- M. Zaharia et al. "Discretized Streams: An Efficient and Fault-tolerant Model for Stream Processing on Large Clusters". Proc. of HotCloud 2012, pp.10--10. Google ScholarDigital Library
Index Terms
- Photon: fault-tolerant and scalable joining of continuous data streams
Recommendations
A load shedding framework for XML stream joins
DEXA'10: Proceedings of the 21st international conference on Database and expert systems applications: Part IJoining data streams using various types of windows is an established method of stream processing. The limitation of window size due to memory constraint takes a heavy toll on the accuracy of the query result. Through this paper, we propose a unique ...
Weighted Reservoir Sampling from Distributed Streams
PODS '19: Proceedings of the 38th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsWe consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights are equal, is ...
Dcell: a scalable and fault-tolerant network structure for data centers
A fundamental challenge in data center networking is how to efficiently interconnect an exponentially increasing number of servers. This paper presents DCell, a novel network structure that has many desirable features for data center networking. DCell ...
Comments