research-article

Kineograph: taking the pulse of a fast-changing and connected world

Authors:
Raymond Cheng

University of Washington, Seattle, USA

University of Washington, Seattle, USA
View Profile

,
Ji Hong

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Aapo Kyrola

Carnegie Mellon University, Pittsburgh, USA

Carnegie Mellon University, Pittsburgh, USA
View Profile

,
Youshan Miao

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Xuetian Weng

Peking University, Beijing, China

Peking University, Beijing, China
View Profile

,
Ming Wu

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Fan Yang

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Lidong Zhou

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Feng Zhao

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Enhong Chen

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

EuroSys '12: Proceedings of the 7th ACM european conference on Computer SystemsApril 2012Pages 85–98https://doi.org/10.1145/2168836.2168846

Published:10 April 2012Publication History

EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems

Pages 85–98

ABSTRACT

Kineograph is a distributed system that takes a stream of incoming data to construct a continuously changing graph, which captures the relationships that exist in the data feed. As a computing platform, Kineograph further supports graph-mining algorithms to extract timely insights from the fast-changing graph structure. To accommodate graph-mining algorithms that assume a static underlying graph, Kineograph creates a series of consistent snapshots, using a novel and efficient epoch commit protocol. To keep up with continuous updates on the graph, Kineograph includes an incremental graph-computation engine. We have developed three applications on top of Kineograph to analyze Twitter data: user ranking, approximate shortest paths, and controversial topic detection. For these applications, Kineograph takes a live Twitter data feed and maintains a graph of edges between all users and hashtags. Our evaluation shows that with 40 machines processing 100K tweets per second, Kineograph is able to continuously compute global properties, such as user ranks, with less than 2.5-minute timeliness guarantees. This rate of traffic is more than 10 times the reported peak rate of Twitter as of October 2011.

References

R. Angles and C. Gutierrez. Survey of graph database models. ACM Computing Surveys (CSUR), 40 (1): 1--39, 2008. Google ScholarDigital Library
P. Bhatotia, A. Wieder, R. Rodrigues, U. Acar, and R. Pasquini. Incoop: MapReduce for incremental computations. In ACM SoCC, 2011. Google ScholarDigital Library
Y. Bu, B. Howe, M. Balazinska, and M. Ernst. HaLoop: Efficient iterative data processing on large clusters. In VLDB, 2010. Google ScholarDigital Library
M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In OSDI, 2006. Google ScholarDigital Library
D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams -- a new class of data management applications. In VLDB, 2002. Google ScholarDigital Library
S. Chandrasekaran, O. Cooper, A. Deshpande, M. Franklin, J. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, and M. Shah. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, 2003. Google ScholarDigital Library
J. Chen, D. J. Dewitt, F. Tian, and Y. Wang. NiagaraCQ: A scalable continuous query system for internet databases. In SIGMOD, 2000. Google ScholarDigital Library
T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. MIT Press and McGraw-Hill, 2nd. edition, 2001. Google ScholarDigital Library
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51 (1): 107--113, 2008. Google ScholarDigital Library
P. Gunda, L. Ravindranath, C. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic management of data and computation in datacenters. In OSDI, 2010. Google ScholarDigital Library
B. He, M. Yang, Z. Guo, R. Chen, B. Su, W. Lin, and L. Zhou. Comet: Batched stream processing in data intensive distributed computing. In ACM SoCC, 2010. Google ScholarDigital Library
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for internet-scale systems. In USENIX ATC, 2010. Google ScholarDigital Library
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. C. Jones, S. Madden, M. Stonebraker, Y. Zhang, J. Hugg, and D. J. Abadi. H-Store: A high-performance, distributed main memory transaction processing system. In VLDB, 2008. Google ScholarDigital Library
U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: A peta-scale graph mining system. In IEEE International Conference on Data Mining, 2009. Google ScholarDigital Library
L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16 (2): 133--169, 1998. Google ScholarDigital Library
D. Logothetis, C. Olston, B. Reed, K. Webb, and K. Yocum. Stateful bulk processing for incremental analytics. In ACM SoCC, 2010. Google ScholarDigital Library
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. Hellerstein. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence(UAI), 2010.Google Scholar
G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010. Google ScholarDigital Library
memcached. Memcached: A distributed memory object caching system, 2011. http://memcached.org.Google Scholar
Neo4j. Neo4j: The graph database, 2011. http://neo4j.org. Accessed October, 2011.Google Scholar
D. Ongaro, S. M. Rumble, R. Stutsman, J. Ousterhout, and M. Rosenblum. Fast crash recovery in ramcloud. In SOSP, 2011. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Stanford Technical Report, 1999.Google Scholar
R. Pearce, M. Gokhale, and N. Amato. Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2010. Google ScholarDigital Library
D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In OSDI, 2010. Google ScholarDigital Library
L. Popa, M. Budiu, Y. Yu, and M. Isard. Dryadinc: Reusing work in large-scale computations. In HotCloud, 2009. Google ScholarDigital Library
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI, 2010. Google ScholarDigital Library
D. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: Idioms, political hashtags, and complex contagion on twitter. In WWW, 2011. Google ScholarDigital Library
A. Sarma, S. Gollapudi, M. Najork, and R. Panigrahy. A sketch-based distance oracle for web-scale graphs. In WSDM, 2010. Google ScholarDigital Library
D. Sullivan. Tweets about steve jobs spike but don't break twitter peak record, 2011. http://searchengineland.com/tweets-about-steve-jobs-spike-but-dont-break-twitter-record-96048.Google Scholar
A. Tanenbaum. Distributed Operating Systems. Prentice Hall, 1995. Google ScholarDigital Library
D. Tunkelang. A twitter analog to pagerank. Retrieved from http://thenoisychannel. com/2009/01/13/a-twitter-analog-to-pagerank, 2009.Google Scholar

Index Terms

Kineograph: taking the pulse of a fast-changing and connected world
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Distributed programming languages
  2. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles

Recommendations

Large-Scale Stream Graph Processing: Doctoral Symposium
DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems

Dynamically changing graphs are a powerful abstraction used to represent temporal relationships and connections occurring between data entities in various real-world organizations, such as social and telecommunication networks. The increasing volume, ...
Read More
Graphtides: a framework for evaluating stream-based graph processing platforms
GRADES-NDA '18: Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)

Stream-based graph systems continuously ingest graph-changing events via an established input stream, performing the required computation on the corresponding graph. While there are various benchmarking and evaluation approaches for traditional, batch-...
Read More
KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations
Asplos'17

Continuous processing of a streaming graph maintains an approximate result of the iterative computation on a recent version of the graph. Upon a user query, the accurate result on the current graph can be quickly computed by feeding the approximate ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems
April 2012
394 pages
ISBN:9781450312233
DOI:10.1145/2168836
General Chair:
Pascal Felber
University of Neuchâtel, Switzerland
,
Program Chairs:
Frank Bellosa
KIT, Germany
,
Herbert Bos
Vrije Universiteit Amsterdam, The Netherlands
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 April 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
distributed storage
graph processing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate241of1,308submissions,18%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 180
  Total Citations
  View Citations
- 2,029
  Total Downloads
- Downloads (Last 12 months)73
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Kineograph: taking the pulse of a fast-changing and connected world

EuroSys '12: Proceedings of the 7th ACM european conference on Computer Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Large-Scale Stream Graph Processing: Doctoral Symposium

Graphtides: a framework for evaluating stream-based graph processing platforms

KickStarter: Fast and Accurate Computations on Streaming Graphs via Trimmed Approximations