skip to main content
10.1145/2938503.2938519acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

On the Discovery of Relaxed Functional Dependencies

Published:11 July 2016Publication History

ABSTRACT

Functional dependencies (fds) express important relationships among data, which can be used for several goals, including schema normalization and data cleansing. However, to solve several issues in emerging application domains, such as the identification of data inconsistencies or patterns of semantically related data, it has been necessary to relax the fd definition through the introduction of approximations in data comparison and/or validity. Moreover, while fds were originally specified at design time, with the availability of massive data and computational power many algorithms have been devised to automatically discover them from data, including algorithms for discovering some types of relaxed fds. In this paper we present a technique that exploits lattice-based algorithms for the discovery of fds from data, in order to detect relaxed fds. Moreover, we introduce an algorithm to determine a proper distance threshold for a given relaxed fd holding over the entire database.

References

  1. Z. Abedjan, P. Schulze, and F. Naumann. DFD: Efficient functional dependency discovery. In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, CIKM '14, pages 949--958, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Arenas and L. Libkin. A normal form for XML documents. ACM Transactions on Database Systems, 29(1):195--232, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In Proceedings of the 25th International Conference on Data Engineering, ICDE '07, pages 746--755, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Caruccio, V. Deufemia, and G. Polese. Visual data integration based on description logic reasoning. In Proceedings of 18th International Database Engineering & Applications Symposium, IDEAS '14, pages 19--28, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Caruccio, V. Deufemia, and G. Polese. Relaxed functional dependencies -- A survey of approaches. IEEE Transactions on Knowledge and Data Engineering, 28(1):147--165, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S.-K. Chang, V. Deufemia, G. Polese, and M. Vacca. A normalization framework for multimedia databases. IEEE Transactions on Knowledge and Data Engineering, 19(12):1666--1679, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Chiang and R. J. Miller. Discovering data quality rules. Proc. VLDB Endow., 1(1):1166--1177, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. F. Codd. A relational model of data for large shared data banks. Comm. ACM, 13(6):377--387, 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(1):1--16, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Fan, H. Gao, X. Jia, J. Li, and S. Ma. Dynamic constraints for record matching. The VLDB Journal, 20:495--520, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Fan, F. Geerts, L. V. S. Lakshmanan, and M. Xiong. Discovering conditional functional dependencies. In Proceedings of the 25th International Conference on Data Engineering, ICDE'09, pages 1231--1234, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. A. Flach and I. Savnik. Database dependency discovery: A machine learning approach. AI Commun., 12(3):139--160, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. Giannella and E. Robertson. On approximation measures for functional dependencies. Inform. Syst., 29(6):483--507, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Golab, H. Karloff, F. Korn, D. Srivastava, and B. Yu. On generating near-optimal tableaux for conditional functional dependencies. PVLDB, 1(1):376--390, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100--111, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  16. I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic discovery of correlations and soft functional dependencies. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD '04, pages 647--658, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. King and J. Oil. Discovery of functional and approximate functional dependencies in relational databases. J. Applied Math. and Decision Sciences, 7(1):49--59, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Kivinen and H. Mannila. Approximate inference of functional dependencies from relations. Theor. Comput. Sci., 149(1):129--149, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. S. Kwashie, J. Liu, J. Li, and F. Ye. Mining differential dependencies: A subspace clustering approach. In H. Wang and M. A. Sharaf, editors, Proceedings of Australasian Database Conference, ADC '14, pages 50--61, 2014.Google ScholarGoogle Scholar
  20. S. Kwashie, J. Liu, J. Li, and F. Ye. Efficient discovery of differential dependencies through association rules mining. In Proceedings of Australasian Database Conference, ADC '15, pages 3--15, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  21. M.-L. Lee, T. W. Ling, and W. L. Low. Designing functional dependencies for XML. In Proceedings of the 8th International Conference on Extending Database Technology, EDBT '02, pages 124--141, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Li. On optimal rule discovery. IEEE Transactions on Knowledge and Data Engineering, 18(4):460--471, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Liu, J. Li, C. Liu, and Y. Chen. Discover dependencies from data - A review. IEEE Transactions on Knowledge and Data Engineering, 24(2):251--264, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Lopes, J.-M. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In Proceedings of the 7th International Conference on Extending Database Technology, EDBT '00, pages 350--364, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. U. Nambiar and S. Kambhampati. Mining approximate functional dependencies and concept similarities to answer imprecise queries. In Proceedings of International Workshop on the Web and Databases, WebDB '04, pages 73--78, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Novelli and R. Cicchetti. Fun: An efficient algorithm for mining functional and embedded dependencies. In Proceedings of 8th International Conference Database Theory, ICDT '01, pages 189--203. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J.-P. Rudolph, M. Schönberg, J. Zwiener, and F. Naumann. Functional dependency discovery: An experimental evaluation of seven algorithms. Proc. VLDB Endow., 8(10):1082--1093, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. V. S. V. N. Raju and A. K. Majumdar. Fuzzy functional dependencies and lossless join decomposition of fuzzy relational database systems. ACM Transactions on Database Systems, 13(2):129--166, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Seligman, P. Mork, A. Halevy, K. Smith, M. J. Carey, K. Chen, C. Wolf, J. Madhavan, A. Kannan, and D. Burdick. OpenII: An open source information integration toolkit. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 1057--1060, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Song and L. Chen. Differential dependencies: Reasoning and discovery. ACM Transactions on Database Systems, 36:16, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Song and L. Chen. Efficient discovery of similarity constraints for matching dependencies. Data & Knowledge Engineering, 87:146--166, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Song, L. Chen, and H. Cheng. Efficient determination of distance thresholds for differential dependencies. IEEE Transactions on Knowledge and Data Engineering, 26(9):2179--2192, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  33. V. Vianu. Dynamic functional dependencies and database aging. Journal of the ACM, 34(1):28--59, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Wyss, C. Giannella, and E. Robertson. FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract. In Proceedings of Third International Conference on Data Warehousing and Knowledge Discovery, DaWaK '01, pages 101--110, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Yao and H. J. Hamilton. Mining functional dependencies from data. Data Mining and Knowledge Discovery, 16(2):197--219, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. Yao, H. J. Hamilton, and C. J. Butz. FD Mine: Discovering functional dependencies in a database using equivalences. In Proceedings of IEEE International Conference on Data Mining, ICDM '02, pages 729--732, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On the Discovery of Relaxed Functional Dependencies

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        IDEAS '16: Proceedings of the 20th International Database Engineering & Applications Symposium
        July 2016
        420 pages
        ISBN:9781450341189
        DOI:10.1145/2938503

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 July 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate74of210submissions,35%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader