skip to main content
research-article

Avatara: OLAP for web-scale analytics products

Published:01 August 2012Publication History
Skip Abstract Section

Abstract

Multidimensional data generated by members on websites has seen massive growth in recent years. OLAP is a well-suited solution for mining and analyzing this data. Providing insights derived from this analysis has become crucial for these websites to give members greater value. For example, LinkedIn, the largest professional social network, provides its professional members rich analytics features like "Who's Viewed My Profile?" and "Who's Viewed This Job?" The data behind these features form cubes that must be efficiently served at scale, and can be neatly sharded to do so. To serve our growing 160 million member base, we built a scalable and fast OLAP serving system called Avatara to solve this many, small cubes problem. At LinkedIn, Avatara has been powering several analytics features on the site for the past two years.

References

  1. M. O. Akinde, M. H. Böhlen, T. Johnson, L. V. S. Lakshmanan, and D. Srivastava. Efficient OLAP Query Processing in Distributed Data Warehouses. Information Systems, 28(1--2):111--135, 2003. Google ScholarGoogle Scholar
  2. R. Almeida, J. Vieira, M. Vieira, H. Madeira, and J. Bernardino. Efficient Data Distribution for DWS. In DaWaK, pages 75--86, 2008. Google ScholarGoogle Scholar
  3. B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. PNUTS: Yahoo!'s Hosted Data Serving Platform. PVLDB, 1(2):1277--1288, 2008. Google ScholarGoogle Scholar
  4. J. Dean and S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137--150, 2004. Google ScholarGoogle Scholar
  5. G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-Value Store. SIGOPS Operating Systems Review, 41(6):205--220, 2007. Google ScholarGoogle Scholar
  6. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On Power-Law Relationships of the Internet Topology. In SIGCOMM, pages 251--262, 1999. Google ScholarGoogle Scholar
  7. R. Kimball and M. Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling. John Wiley & Sons, Inc., 2nd edition, 2002. Google ScholarGoogle Scholar
  8. A. Nandi, C. Yu, P. Bohannon, and R. Ramakrishnan. Distributed Cube Materialization on Holistic Measures. In ICDE, pages 183--194, 2011. Google ScholarGoogle Scholar
  9. T. B. Pedersen and C. S. Jensen. Multidimensional Database Technology. Computer, 34(12):40--46, 2001. Google ScholarGoogle Scholar
  10. R. Sumbaly, J. Kreps, L. Gao, A. Feinberg, C. Soman, and S. Shah. Serving Large-scale Batch Computed Data with Project Voldemort. In FAST, pages 223--235, 2012. Google ScholarGoogle Scholar
  11. T. White. Hadoop: The Definitive Guide. O'Reilly Media, 1st edition, 2009. Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 5, Issue 12
    August 2012
    340 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 August 2012
    Published in pvldb Volume 5, Issue 12

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader