Abstract
Multidimensional data generated by members on websites has seen massive growth in recent years. OLAP is a well-suited solution for mining and analyzing this data. Providing insights derived from this analysis has become crucial for these websites to give members greater value. For example, LinkedIn, the largest professional social network, provides its professional members rich analytics features like "Who's Viewed My Profile?" and "Who's Viewed This Job?" The data behind these features form cubes that must be efficiently served at scale, and can be neatly sharded to do so. To serve our growing 160 million member base, we built a scalable and fast OLAP serving system called Avatara to solve this many, small cubes problem. At LinkedIn, Avatara has been powering several analytics features on the site for the past two years.
- M. O. Akinde, M. H. Böhlen, T. Johnson, L. V. S. Lakshmanan, and D. Srivastava. Efficient OLAP Query Processing in Distributed Data Warehouses. Information Systems, 28(1--2):111--135, 2003. Google Scholar
- R. Almeida, J. Vieira, M. Vieira, H. Madeira, and J. Bernardino. Efficient Data Distribution for DWS. In DaWaK, pages 75--86, 2008. Google Scholar
- B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s Hosted Data Serving Platform. PVLDB, 1(2):1277--1288, 2008. Google Scholar
- J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004. Google Scholar
- G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. SIGOPS Operating Systems Review, 41(6):205--220, 2007. Google Scholar
- M. Faloutsos, P. Faloutsos, and C. Faloutsos. On Power-Law Relationships of the Internet Topology. In SIGCOMM, pages 251--262, 1999. Google Scholar
- R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. John Wiley & Sons, Inc., 2nd edition, 2002. Google Scholar
- A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan. Distributed Cube Materialization on Holistic Measures. In ICDE, pages 183--194, 2011. Google Scholar
- T. B. Pedersen and C. S. Jensen. Multidimensional Database Technology. Computer, 34(12):40--46, 2001. Google Scholar
- R. Sumbaly, J. Kreps, L. Gao, A. Feinberg, C. Soman, and S. Shah. Serving Large-scale Batch Computed Data with Project Voldemort. In FAST, pages 223--235, 2012. Google Scholar
- T. White. Hadoop: The Definitive Guide. O'Reilly Media, 1st edition, 2009. Google Scholar
Comments