skip to main content
10.1145/2168836.2168846acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Kineograph: taking the pulse of a fast-changing and connected world

Authors Info & Claims
Published:10 April 2012Publication History

ABSTRACT

Kineograph is a distributed system that takes a stream of incoming data to construct a continuously changing graph, which captures the relationships that exist in the data feed. As a computing platform, Kineograph further supports graph-mining algorithms to extract timely insights from the fast-changing graph structure. To accommodate graph-mining algorithms that assume a static underlying graph, Kineograph creates a series of consistent snapshots, using a novel and efficient epoch commit protocol. To keep up with continuous updates on the graph, Kineograph includes an incremental graph-computation engine. We have developed three applications on top of Kineograph to analyze Twitter data: user ranking, approximate shortest paths, and controversial topic detection. For these applications, Kineograph takes a live Twitter data feed and maintains a graph of edges between all users and hashtags. Our evaluation shows that with 40 machines processing 100K tweets per second, Kineograph is able to continuously compute global properties, such as user ranks, with less than 2.5-minute timeliness guarantees. This rate of traffic is more than 10 times the reported peak rate of Twitter as of October 2011.

References

  1. R. Angles and C. Gutierrez. Survey of graph database models. ACM Computing Surveys (CSUR), 40 (1): 1--39, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bhatotia, A. Wieder, R. Rodrigues, U. Acar, and R. Pasquini. Incoop: MapReduce for incremental computations. In ACM SoCC, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Y. Bu, B. Howe, M. Balazinska, and M. Ernst. HaLoop: Efficient iterative data processing on large clusters. In VLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In OSDI, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams -- a new class of data management applications. In VLDB, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Chen, D. J. Dewitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. MIT Press and McGraw-Hill, 2nd. edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51 (1): 107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Gunda, L. Ravindranath, C. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic management of data and computation in datacenters. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. He, M. Yang, Z. Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: Batched stream processing in data intensive distributed computing. In ACM SoCC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for internet-scale systems. In USENIX ATC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: A high-performance, distributed main memory transaction processing system. In VLDB, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system. In IEEE International Conference on Data Mining, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16 (2): 133--169, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Logothetis, C. Olston, B. Reed, K. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In ACM SoCC, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence(UAI), 2010.Google ScholarGoogle Scholar
  18. G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. memcached. Memcached: A distributed memory object caching system, 2011. http://memcached.org.Google ScholarGoogle Scholar
  20. Neo4j. Neo4j: The graph database, 2011. http://neo4j.org. Accessed October, 2011.Google ScholarGoogle Scholar
  21. D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in ramcloud. In SOSP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Stanford Technical Report, 1999.Google ScholarGoogle Scholar
  23. R. Pearce, M. Gokhale, and N. Amato. Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. L. Popa, M. Budiu, Y. Yu, and M. Isard. Dryadinc: Reusing work in large-scale computations. In HotCloud, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on twitter. In WWW, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Sarma, S. Gollapudi, M. Najork, and R. Panigrahy. A sketch-based distance oracle for web-scale graphs. In WSDM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Sullivan. Tweets about steve jobs spike but don't break twitter peak record, 2011. http://searchengineland.com/tweets-about-steve-jobs-spike-but-dont-break-twitter-record-96048.Google ScholarGoogle Scholar
  30. A. Tanenbaum. Distributed Operating Systems. Prentice Hall, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Tunkelang. A twitter analog to pagerank. Retrieved from http://thenoisychannel. com/2009/01/13/a-twitter-analog-to-pagerank, 2009.Google ScholarGoogle Scholar

Index Terms

  1. Kineograph: taking the pulse of a fast-changing and connected world

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems
          April 2012
          394 pages
          ISBN:9781450312233
          DOI:10.1145/2168836

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 April 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate241of1,308submissions,18%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader