skip to main content
research-article

Discovering Graph Functional Dependencies

Authors Info & Claims
Published:11 September 2020Publication History
Skip Abstract Section

Abstract

This article studies discovery of Graph Functional Dependencies (GFDs), a class of functional dependencies defined on graphs. We investigate the fixed-parameter tractability of three fundamental problems related to GFD discovery. We show that the implication and satisfiability problems are fixed-parameter tractable, but the validation problem is co-W[1]-hard in general. We introduce notions of reduced GFDs and their topological support, and formalize the discovery problem for GFDs. We develop algorithms for discovering GFDs and computing their covers. Moreover, we show that GFD discovery is feasible over large-scale graphs, by providing parallel scalable algorithms that guarantee to reduce running time when more processors are used. Using real-life and synthetic data, we experimentally verify the effectiveness and scalability of the algorithms.

References

  1. IMDB. 2008. IMDB. Retrieved from http://www.imdb.com/interfaces.Google ScholarGoogle Scholar
  2. DBpedia. 2015. DBpedia. Retrieved from http://wiki.dbpedia.org/Datasets.Google ScholarGoogle Scholar
  3. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gagan Aggarwal, Rajeev Motwani, and An Zhu. 2003. The load rebalancing problem. In Proceedings of the SPAA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Waseem Akhtar, Alvaro Cortés-Calabuig, and Jan Paredaens. 2010. Constraints in RDF. In Proceedings of the SDKB. 2339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Khaled Ammar and M. Tamer Özsu. 2018. Experimental analysis of distributed graph systems. Proc. Very Large Data Base 11, 10 (2018), 11511164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Anonymous. 2018. Details omitted due to double-blind reviewing.Google ScholarGoogle Scholar
  8. Ismailcem Budak Arpinar, Karthikeyan Giriloganathan, and Boanerges Aleman-Meza. 2006. Ontology quality by detection of conflicts in metadata. In Proceedings of the EON.Google ScholarGoogle Scholar
  9. Vladimir Batagelj and Andrej Mrvar. 2001. A subquadratic triad census algorithm for large sparse networks with small maximum degree. Soc. Netw. 23, 3 (2001), 237243.Google ScholarGoogle ScholarCross RefCross Ref
  10. Angela Bonifati, Wim Martens, and Thomas Timm. 2017. An analytical study of large SPARQL query logs. Proc. Very Large Data Base 11, 2 (2017), 149161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Diego Calvanese, Wolfgang Fischl, Reinhard Pichler, Emanuel Sallinger, and Mantas Simkus. 2014. Capturing relational schemas and functional dependencies in RDFS. In Proceedings of the AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Yang Chen, Sean Goldberg, Daisy Zhe Wang, and Soumitra Siddharth Johri. 2016. Ontological pathfinding. In Proceedings of the SIGMOD. 835846. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fei Chiang and Renee Miller. 2008. Discovering data quality rules. In Proceedings of the VLDB. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Xu Chu, Ihab F. Ilyas, and Paolo Papotti. 2013. Discovering denial constraints. Proc. Very Large Data Base 6, 13 (2013), 14981509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Alvaro Cortés-Calabuig and Jan Paredaens. 2012. Semantics of constraints in RDFS. In Proceedings of the AMW. 7590.Google ScholarGoogle Scholar
  16. Rodney G. Downey and Michael R. Fellows. 1995. Fixed-parameter tractability and completeness II: On completeness for W[1]. Theor. Comput. Sci. 141, 182 (1995), 109131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Rodney G. Downey and Michael R. Fellows. 2013. Fundamentals of Parameterized Complexity. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mohammed Elseidy, Ehab Abdelhamid, Spiros Skiadopoulos, and Panos Kalnis. 2014. GRAMI: Frequent subgraph and pattern mining in a single large graph. Proc. Very Large Data Base 7, 7 (2014), 517528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. David Eppstein. 1995. Subgraph isomorphism in planar graphs and related problems. In Proceedings of the SODA, Vol. 95. 632640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wenfei Fan, Zhe Fan, Chao Tian, and Xin Luna Dong. 2015. Keys for graphs. Proc. Very Large Data Base 8, 12 (2015), 15901601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. 2008. Conditional functional dependencies for capturing data inconsistencies. Trans. Database Syst. 33, 1 (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Wenfei Fan, Floris Geerts, Jianzhong Li, and Ming Xiong. 2011. Discovering conditional functional dependencies. IEEE Trans. Knowl. Data Eng. 23, 5 (2011), 683698. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and Yinghui Wu. 2011. Adding regular expressions to graph reachability and pattern queries. In Proceedings of the ICDE. 3950. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Wenfei Fan, Xueli Liu, Ping Lu, and Chao Tian. 2018. Catching numeric inconsistencies in graphs. In Proceedings of the SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Wenfei Fan and Ping Lu. 2017. Dependencies for graphs. In Proceedings of the PODS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Wenfei Fan, Xin Wang, Yinghui Wu, and Jingbo Xu. 2015. Association rules with graph patterns. Proc. Very Large Data Base 8, 12 (2015), 15021513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wenfei Fan, Yinghui Wu, and Jingbo Xu. 2016. Functional dependencies for graphs. In Proceedings of the SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jörg Flum and Martin Grohe. 2006. Parameterized Complexity Theory. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Luis Antonio Galárraga, Christina Teflioudi, Katja Hose, and Fabian Suchanek. 2013. AMIE: Association rule mining under incomplete evidence in ontological knowledge bases. In Proceedings of the WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mario Arias Gallego, Javier D. Fernández, Miguel A. Martínez-Prieto, and Pablo de la Fuente. 2011. An empirical study of real-world SPARQL queries. In USEWOD Workshop.Google ScholarGoogle Scholar
  31. R. L. Graham. 1966. Bounds for certain multiprocessing anomalies. Bell Syst. Tech. J. 45, 9 (1966), 15631581.Google ScholarGoogle ScholarCross RefCross Ref
  32. Binbin He, Lei Zou, and Dongyan Zhao. 2014. Using conditional functional dependency to discover abnormal data in RDF graphs. In Proceedings of the SWIM. 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Jelle Hellings, Marc Gyssens, Jan Paredaens, and Yuqing Wu. 2014. Implication and axiomatization of functional constraints on patterns with an application to the RDF data model. In Proceedings of the FoIKS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jun Huan, Wei Wang, Jan Prins, and Jiong Yang. 2004. Spin: Mining maximal frequent subgraphs from graph databases. In Proceedings of the SIGKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ykä Huhtala, Juha Kärkkäinen, Pasi Porkka, and Hannu Toivonen. 1999. TANE: An efficient algorithm for discovering functional and approximate dependencies. Comput. J. 42, 2 (1999), 100111.Google ScholarGoogle ScholarCross RefCross Ref
  36. Akihiro Inokuchi, Takashi Washio, and Hiroshi Motoda. 2000. An Apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the PKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Takehiro Ito, Marcin Kamiński, Hirotaka Ono, Akira Suzuki, Ryuhei Uehara, and Katsuhisa Yamanaka. 2014. On the parameterized complexity for token jumping on graphs. In Proceedings of the TAMC. Springer, 341351.Google ScholarGoogle ScholarCross RefCross Ref
  38. Chuntao Jiang, Frans Coenen, and Michele Zito. 2013. A survey of frequent subgraph mining algorithms. Knowledge Eng. Rev. 28, 01 (2013), 75105.Google ScholarGoogle ScholarCross RefCross Ref
  39. Yiping Ke, James Cheng, and Jeffrey Xu Yu. 2009. Efficient discovery of frequent correlated subgraph pairs. In Proceedings of the ICDM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Mijung Kim and K. Selçuk Candan. 2012. SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices. Data Knowledge Eng. 72 (2012), 285303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Myung-Sup Kim, Hun-Jeong Kong, Seong-Cheol Hong, Seung-Hwa Chung, and James W. Hong. 2004. A flow-based method for abnormal network traffic detection. In Proceedings of the NOMS, Vol. 1. IEEE, 599612.Google ScholarGoogle Scholar
  42. Christian Komusiewicz and Manuel Sorge. 2012. Finding dense subgraphs of sparse graphs. In Proceedings of the IPEC. Springer, 242251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens Lehmann, Roland Cornelissen, and Amrapali Zaveri. 2014. Test-driven evaluation of linked data quality. In Proceedings of the WWW. 747758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Richard E. Korf. 2009. Multi-Way Number Partitioning. In Proceedings of the IJCAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Clyde P. Kruskal, Larry Rudolph, and Marc Snir. 1990. A complexity theory of efficient parallel algorithms. Trans. Comput. Sci. 71, 1 (1990), 95132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Georg Lausen, Michael Meier, and Michael Schmidt. 2008. SPARQLing constraints for RDF. In Proceedings of the EDBT. ACM, 499509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wenqing Lin, Xiaokui Xiao, and Gabriel Ghinita. 2014. Large-scale frequent subgraph mining in MapReduce. In Proceedings of the ICDE.Google ScholarGoogle ScholarCross RefCross Ref
  48. Farzaneh Mahdisoltani, Joanna Biega, and Fabian Suchanek. 2015. Yago3: A knowledge base from multilingual wikipedias. In Proceedings of the CIDR.Google ScholarGoogle Scholar
  49. Dániel Marx. 2006. Parameterized graph separation problems. Theoret. Comput. Sci. 351, 3 (2006), 394406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Luke Mathieson and Stefan Szeider. 2008. Parameterized graph editing with chosen vertex degrees. In Proceedings of the COCOA. Springer, 1322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Linsey Xiaolin Pang, Sanjay Chawla, Wei Liu, and Yu Zheng. 2011. On mining anomalous patterns in road traffic streams. In Proceedings of the ADMA. Springer, 237251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. François Picalausa and Stijn Vansummeren. 2011. What are real SPARQL queries like? In Proceedings of the SWIM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Prakash Shelokar, Arnaud Quirin, and Óscar Cordón. 2014. Three-objective subgraph mining using multiobjective evolutionary programming. J. Comput. Syst. Sci. 80, 1 (2014), 1626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. Yago: A core of semantic knowledge. In Proceedings of the WWW. 697706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Carlos H. C. Teixeira, Alexandre J. Fonseca, Marco Serafini, Georgos Siganos, Mohammed J. Zaki, and Ashraf Aboulnaga. 2015. Arabesque: A system for distributed graph mining. In Proceedings of the SOSP. 425440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. David P. Woodruff and Qin Zhang. 2013. When distributed computation is communication expensive. In Proceedings of the DISC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Catharine Wyss, Chris Giannella, and Edward Robertson. 2001. FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract. In Proceedings of the DaWaK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yang Yu and Jeff Heflin. 2011. Extending functional dependency to detect abnormal data in RDF graphs. In Proceedings of the ISWC. 794809. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Amrapali Zaveri, Dimitris Kontokostas, Mohamed Ahmed Sherif, Lorenz Bühmann, Mohamed Morsey, Sören Auer, and Jens Lehmann. 2013. User-driven quality evaluation of DBpedia. In Proceedings of the ISEM. 97104. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discovering Graph Functional Dependencies

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 45, Issue 3
      Best of ICDT 2019 and Regular Papers
      September 2020
      213 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/3420008
      Issue’s Table of Contents

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 September 2020
      • Online AM: 7 May 2020
      • Accepted: 1 April 2020
      • Revised: 1 February 2020
      • Received: 1 March 2019
      Published in tods Volume 45, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format