Abstract
This article studies discovery of Graph Functional Dependencies (GFDs), a class of functional dependencies defined on graphs. We investigate the fixed-parameter tractability of three fundamental problems related to GFD discovery. We show that the implication and satisfiability problems are fixed-parameter tractable, but the validation problem is co-W[1]-hard in general. We introduce notions of reduced GFDs and their topological support, and formalize the discovery problem for GFDs. We develop algorithms for discovering GFDs and computing their covers. Moreover, we show that GFD discovery is feasible over large-scale graphs, by providing parallel scalable algorithms that guarantee to reduce running time when more processors are used. Using real-life and synthetic data, we experimentally verify the effectiveness and scalability of the algorithms.
- IMDB. 2008. IMDB. Retrieved from http://www.imdb.com/interfaces.Google Scholar
- DBpedia. 2015. DBpedia. Retrieved from http://wiki.dbpedia.org/Datasets.Google Scholar
- Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- Gagan Aggarwal, Rajeev Motwani, and An Zhu. 2003. The load rebalancing problem. In Proceedings of the SPAA. Google ScholarDigital Library
- Waseem Akhtar, Alvaro Cortés-Calabuig, and Jan Paredaens. 2010. Constraints in RDF. In Proceedings of the SDKB. 2339. Google ScholarDigital Library
- Khaled Ammar and M. Tamer Özsu. 2018. Experimental analysis of distributed graph systems. Proc. Very Large Data Base 11, 10 (2018), 11511164. Google ScholarDigital Library
- Anonymous. 2018. Details omitted due to double-blind reviewing.Google Scholar
- Ismailcem Budak Arpinar, Karthikeyan Giriloganathan, and Boanerges Aleman-Meza. 2006. Ontology quality by detection of conflicts in metadata. In Proceedings of the EON.Google Scholar
- Vladimir Batagelj and Andrej Mrvar. 2001. A subquadratic triad census algorithm for large sparse networks with small maximum degree. Soc. Netw. 23, 3 (2001), 237243.Google ScholarCross Ref
- Angela Bonifati, Wim Martens, and Thomas Timm. 2017. An analytical study of large SPARQL query logs. Proc. Very Large Data Base 11, 2 (2017), 149161. Google ScholarDigital Library
- Diego Calvanese, Wolfgang Fischl, Reinhard Pichler, Emanuel Sallinger, and Mantas Simkus. 2014. Capturing relational schemas and functional dependencies in RDFS. In Proceedings of the AAAI. Google ScholarDigital Library
- Yang Chen, Sean Goldberg, Daisy Zhe Wang, and Soumitra Siddharth Johri. 2016. Ontological pathfinding. In Proceedings of the SIGMOD. 835846. Google ScholarDigital Library
- Fei Chiang and Renee Miller. 2008. Discovering data quality rules. In Proceedings of the VLDB. Google ScholarDigital Library
- Xu Chu, Ihab F. Ilyas, and Paolo Papotti. 2013. Discovering denial constraints. Proc. Very Large Data Base 6, 13 (2013), 14981509. Google ScholarDigital Library
- Alvaro Cortés-Calabuig and Jan Paredaens. 2012. Semantics of constraints in RDFS. In Proceedings of the AMW. 7590.Google Scholar
- Rodney G. Downey and Michael R. Fellows. 1995. Fixed-parameter tractability and completeness II: On completeness for W[1]. Theor. Comput. Sci. 141, 182 (1995), 109131. Google ScholarDigital Library
- Rodney G. Downey and Michael R. Fellows. 2013. Fundamentals of Parameterized Complexity. Springer. Google ScholarDigital Library
- Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GRAMI: Frequent subgraph and pattern mining in a single large graph. Proc. Very Large Data Base 7, 7 (2014), 517528. Google ScholarDigital Library
- David Eppstein. 1995. Subgraph isomorphism in planar graphs and related problems. In Proceedings of the SODA, Vol. 95. 632640. Google ScholarDigital Library
- Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong. 2015. Keys for graphs. Proc. Very Large Data Base 8, 12 (2015), 15901601. Google ScholarDigital Library
- Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. Conditional functional dependencies for capturing data inconsistencies. Trans. Database Syst. 33, 1 (2008). Google ScholarDigital Library
- Wenfei Fan, Floris Geerts, Jianzhong Li, and Ming Xiong. 2011. Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng. 23, 5 (2011), 683698. Google ScholarDigital Library
- Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Yinghui Wu. 2011. Adding regular expressions to graph reachability and pattern queries. In Proceedings of the ICDE. 3950. Google ScholarDigital Library
- Wenfei Fan, Xueli Liu, Ping Lu, and Chao Tian. 2018. Catching numeric inconsistencies in graphs. In Proceedings of the SIGMOD. Google ScholarDigital Library
- Wenfei Fan and Ping Lu. 2017. Dependencies for graphs. In Proceedings of the PODS. Google ScholarDigital Library
- Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. 2015. Association rules with graph patterns. Proc. Very Large Data Base 8, 12 (2015), 15021513. Google ScholarDigital Library
- Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional dependencies for graphs. In Proceedings of the SIGMOD. Google ScholarDigital Library
- Jörg Flum and Martin Grohe. 2006. Parameterized Complexity Theory. Springer. Google ScholarDigital Library
- Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the WWW. Google ScholarDigital Library
- Mario Arias Gallego, Javier D. Fernández, Miguel A. Martínez-Prieto, and Pablo de la Fuente. 2011. An empirical study of real-world SPARQL queries. In USEWOD Workshop.Google Scholar
- R. L. Graham. 1966. Bounds for certain multiprocessing anomalies. Bell Syst. Tech. J. 45, 9 (1966), 15631581.Google ScholarCross Ref
- Binbin He, Lei Zou, and Dongyan Zhao. 2014. Using conditional functional dependency to discover abnormal data in RDF graphs. In Proceedings of the SWIM. 17. Google ScholarDigital Library
- Jelle Hellings, Marc Gyssens, Jan Paredaens, and Yuqing Wu. 2014. Implication and axiomatization of functional constraints on patterns with an application to the RDF data model. In Proceedings of the FoIKS. Google ScholarDigital Library
- Jun Huan, Wei Wang, Jan Prins, and Jiong Yang. 2004. Spin: Mining maximal frequent subgraphs from graph databases. In Proceedings of the SIGKDD. Google ScholarDigital Library
- Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42, 2 (1999), 100111.Google ScholarCross Ref
- Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. 2000. An Apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the PKDD. Google ScholarDigital Library
- Takehiro Ito, Marcin Kamiński, Hirotaka Ono, Akira Suzuki, Ryuhei Uehara, and Katsuhisa Yamanaka. 2014. On the parameterized complexity for token jumping on graphs. In Proceedings of the TAMC. Springer, 341351.Google ScholarCross Ref
- Chuntao Jiang, Frans Coenen, and Michele Zito. 2013. A survey of frequent subgraph mining algorithms. Knowledge Eng. Rev. 28, 01 (2013), 75105.Google ScholarCross Ref
- Yiping Ke, James Cheng, and Jeffrey Xu Yu. 2009. Efficient discovery of frequent correlated subgraph pairs. In Proceedings of the ICDM. Google ScholarDigital Library
- Mijung Kim and K. Selçuk Candan. 2012. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data Knowledge Eng. 72 (2012), 285303. Google ScholarDigital Library
- Myung-Sup Kim, Hun-Jeong Kong, Seong-Cheol Hong, Seung-Hwa Chung, and James W. Hong. 2004. A flow-based method for abnormal network traffic detection. In Proceedings of the NOMS, Vol. 1. IEEE, 599612.Google Scholar
- Christian Komusiewicz and Manuel Sorge. 2012. Finding dense subgraphs of sparse graphs. In Proceedings of the IPEC. Springer, 242251. Google ScholarDigital Library
- Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali Zaveri. 2014. Test-driven evaluation of linked data quality. In Proceedings of the WWW. 747758. Google ScholarDigital Library
- Richard E. Korf. 2009. Multi-Way Number Partitioning. In Proceedings of the IJCAI. Google ScholarDigital Library
- Clyde P. Kruskal, Larry Rudolph, and Marc Snir. 1990. A complexity theory of efficient parallel algorithms. Trans. Comput. Sci. 71, 1 (1990), 95132. Google ScholarDigital Library
- Georg Lausen, Michael Meier, and Michael Schmidt. 2008. SPARQLing constraints for RDF. In Proceedings of the EDBT. ACM, 499509. Google ScholarDigital Library
- Wenqing Lin, Xiaokui Xiao, and Gabriel Ghinita. 2014. Large-scale frequent subgraph mining in MapReduce. In Proceedings of the ICDE.Google ScholarCross Ref
- Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. 2015. Yago3: A knowledge base from multilingual wikipedias. In Proceedings of the CIDR.Google Scholar
- Dániel Marx. 2006. Parameterized graph separation problems. Theoret. Comput. Sci. 351, 3 (2006), 394406. Google ScholarDigital Library
- Luke Mathieson and Stefan Szeider. 2008. Parameterized graph editing with chosen vertex degrees. In Proceedings of the COCOA. Springer, 1322. Google ScholarDigital Library
- Linsey Xiaolin Pang, Sanjay Chawla, Wei Liu, and Yu Zheng. 2011. On mining anomalous patterns in road traffic streams. In Proceedings of the ADMA. Springer, 237251. Google ScholarDigital Library
- François Picalausa and Stijn Vansummeren. 2011. What are real SPARQL queries like? In Proceedings of the SWIM. Google ScholarDigital Library
- Prakash Shelokar, Arnaud Quirin, and Óscar Cordón. 2014. Three-objective subgraph mining using multiobjective evolutionary programming. J. Comput. Syst. Sci. 80, 1 (2014), 1626. Google ScholarDigital Library
- Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge. In Proceedings of the WWW. 697706. Google ScholarDigital Library
- Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: A system for distributed graph mining. In Proceedings of the SOSP. 425440. Google ScholarDigital Library
- David P. Woodruff and Qin Zhang. 2013. When distributed computation is communication expensive. In Proceedings of the DISC. Google ScholarDigital Library
- Catharine Wyss, Chris Giannella, and Edward Robertson. 2001. FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract. In Proceedings of the DaWaK. Google ScholarDigital Library
- Yang Yu and Jeff Heflin. 2011. Extending functional dependency to detect abnormal data in RDF graphs. In Proceedings of the ISWC. 794809. Google ScholarDigital Library
- Amrapali Zaveri, Dimitris Kontokostas, Mohamed Ahmed Sherif, Lorenz Bühmann, Mohamed Morsey, Sören Auer, and Jens Lehmann. 2013. User-driven quality evaluation of DBpedia. In Proceedings of the ISEM. 97104. Google ScholarDigital Library
Index Terms
- Discovering Graph Functional Dependencies
Recommendations
Functional Dependencies for Graphs
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataWe propose a class of functional dependencies for graphs, referred to as GFDs. GFDs capture both attribute-value dependencies and topological structures of entities, and subsume conditional functional dependencies (CFDs) as a special case. We show that ...
Discovering Graph Functional Dependencies
SIGMOD '18: Proceedings of the 2018 International Conference on Management of DataThis paper studies discovery of GFDs, a class of functional dependencies defined on graphs. We investigate the fixed-parameter tractability of three fundamental problems related to GFD discovery. We show that the implication and satisfiability problems ...
Tree-width and functional dependencies in databases
PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsConjunctive query (CQ) evaluation on relational databases is NP-complete in general. Several restrictions, like bounded tree-width and bounded hypertree-width, allow polynomial time evaluations.We extend the framework in the presence of functional ...
Comments