ABSTRACT
Graphs are an intuitive way to model complex relationships between real-world data objects. Thus, graph analytics plays an important role in research and industry. As graphs often reflect heterogeneous domain data, their representation requires an expressive data model including the abstraction of graph collections, for example, to analyze communities inside a social network. Further on, answering complex analytical questions about such graphs entails combining multiple analytical operations. To satisfy these requirements, we propose the Extended Property Graph Model, which is semantically rich, schema-free and supports multiple distinct graphs. Based on this representation, it provides declarative and combinable operators to analyze both single graphs and graph collections. Our current implementation is based on the distributed dataflow framework Apache Flink. We present the results of a first experimental study showing the scalability of our implementation on social network data with up to 11 billion edges.
- A. Alexandrov et. al. The Stratosphere Platform for Big Data Analytics. The VLDB Journal, 23(6), 2014. Google ScholarDigital Library
- R. Angles. A Comparison of Current Graph Database Models. In Proc. ICDEW, 2012. Google ScholarDigital Library
- R. Angles and C. Gutiérrez. Survey of graph database models. ACM Comput. Surv., 40(1), 2008. Google ScholarDigital Library
- M. Capotă et. al. Graphalytics: A Big Data Benchmark for Graph-Processing Platforms. In Proc. GRADES, 2015. Google ScholarDigital Library
- M. Curtiss et. al. Unicorn: A System for Searching the Social Graph. PVLDB, 6(11), 2013. Google ScholarDigital Library
- A. Dries, S. Nijssen, and L. De Raedt. A Query Language for Analyzing Networks. In Proc. CIKM, 2009. Google ScholarDigital Library
- S. Fortunato. Community detection in graphs. Physics Reports, 486(3-5):75--174, 2010.Google ScholarCross Ref
- B. Gallagher. Matching structure and semantics: A survey on graph-based pattern matching. AAAI FS, 6:45--53, 2006.Google Scholar
- A. Ghrab et al. A Framework for Building OLAP Cubes on Graphs. In Proc. ADBIS, 2015.Google ScholarCross Ref
- H. He and A. K. Singh. Graphs-at-a-time: Query Language and Access Methods for Graph Databases. In Proc. SIGMOD, 2008. Google ScholarDigital Library
- C. Jiang et al. A survey of Frequent Subgraph Mining algorithms. Knowledge Eng. Review, 28(1):75--105, 2013.Google ScholarCross Ref
- M. Junghanns, A. Petermann, K. Gómez, and E. Rahm. GRADOOP: Scalable Graph Data Management and Analytics with Hadoop. arXiv:1506.00548, 2015.Google Scholar
- Z. J. Ling et. al. GEMINI: An Integrative Healthcare Analytics System. PVLDB, 7(13), 2014. Google ScholarDigital Library
- A. Petermann et. al. BIIIG: Enabling Business Intelligence with Integrated Instance Graphs. In Proc. ICDEW, 2014.Google Scholar
- A. Petermann et. al. Graph-based Data Integration and Business Intelligence with BIIIG. PVLDB, 7(13), 2014. Google ScholarDigital Library
- U. N. Raghavan et. al. Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E, 76:036106, 2007.Google ScholarCross Ref
- M. A. Rodriguez and P. Neubauer. Constructions from Dots and Lines. arXiv:1006.2361v1, 2010.Google Scholar
- R. S. Xin et. al. GraphX: A Resilient Distributed Graph System on Spark. In Proc. GRADES, 2013. Google ScholarDigital Library
Index Terms
- Analyzing extended property graphs with Apache Flink
Recommendations
A Study on the Performance and Scalability of Apache Flink Over Hadoop MapReduce
With the advancements in science and technology, data is being generated at a staggering rate. The raw data generated is generally of high value and may conceal important information with the potential to solve several real-world problems. In order to ...
Video2Flink: real-time video partitioning in Apache Flink and the cloud
AbstractVideo2Flink is a distributed highly scalable video processing system for bounded (i.e., stored) or unbounded (i.e., continuous) and real-time video streams with the same efficiency. It shows how complicated video processing tasks can be expressed ...
HYAS: Hybrid Autoscaler Agent for Apache Flink
Web EngineeringAbstractApache Flink is a distributed processing engine for stateful computations over unbounded and bounded data streams. Despite its versatility, Apache Flink cannot automatically and optimally adjust its computing resources to match the requirements of ...
Comments