Abstract
Massive graphs arise in any application where there is data about both basic entities and the relationships between these entities, e.g., web-pages and hyperlinks; neurons and synapses; papers and citations; IP addresses and network flows; people and their friendships. Graphs have also become the de facto standard for representing many types of highly structured data. However, the sheer size of many of these graphs renders classical algorithms inapplicable when it comes to analyzing such graphs. In addition, these existing algorithms are typically ill-suited to processing distributed or stream data.
Various platforms have been developed for processing large data sets. At the same time, there is the need to develop new algorithmic ideas and paradigms. In the case of graph processing, a lot of recent work has focused on understanding the important algorithmic issues. An central aspect of this is the question of how to construct and leverage small-space synopses in graph processing. The goal of this tutorial is to survey recent work on this question and highlight interesting directions for future research.
- K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph structure via linear measurements. In ACM-SIAM Symposium on Discrete Algorithms, 2012. Google Scholar
- K. J. Ahn, S. Guha, and A. McGregor. Graph sketching: Sparsification, spanners and subgraphs. In ACM Symposium on Principles of Database Systems, 2012. Google Scholar
- J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theor. Comput. Sci., 348(2):207--216, 2005. Google Scholar
- Giraph. http://incubator.apache.org/giraph/.Google Scholar
- Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), Catalina Island, California, July 2010.Google Scholar
- G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, pages 135--146, 2010. Google Scholar
- S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers, 2006.Google Scholar
- D. Olteanu. Evaluation of XPath Queries against XML Streams. PhD thesis, University of Munich, 2004.Google Scholar
Index Terms
- Graph synopses, sketches, and streams: a survey
Recommendations
Graph sketches: sparsification, spanners, and subgraphs
PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database SystemsWhen processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly ...
Data streams and data synopses for massive data sets
ECMLPKDD'05: Proceedings of the 9th European Conference on European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in DatabasesWith the proliferation of data intensive applications, it has become necessary to develop new techniques to handle massive data sets. Traditional algorithmic techniques and data structures are not always suitable to handle the amount of data that is ...
Data streams and data synopses for massive data sets
ECML'05: Proceedings of the 16th European conference on Machine LearningWith the proliferation of data intensive applications, it has become necessary to develop new techniques to handle massive data sets. Traditional algorithmic techniques and data structures are not always suitable to handle the amount of data that is ...
Comments