ABSTRACT
Functional dependencies (fds) express important relationships among data, which can be used for several goals, including schema normalization and data cleansing. However, to solve several issues in emerging application domains, such as the identification of data inconsistencies or patterns of semantically related data, it has been necessary to relax the fd definition through the introduction of approximations in data comparison and/or validity. Moreover, while fds were originally specified at design time, with the availability of massive data and computational power many algorithms have been devised to automatically discover them from data, including algorithms for discovering some types of relaxed fds. In this paper we present a technique that exploits lattice-based algorithms for the discovery of fds from data, in order to detect relaxed fds. Moreover, we introduce an algorithm to determine a proper distance threshold for a given relaxed fd holding over the entire database.
- Z. Abedjan, P. Schulze, and F. Naumann. DFD: Efficient functional dependency discovery. In Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, CIKM '14, pages 949--958, 2014. Google ScholarDigital Library
- M. Arenas and L. Libkin. A normal form for XML documents. ACM Transactions on Database Systems, 29(1):195--232, 2004. Google ScholarDigital Library
- P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis. Conditional functional dependencies for data cleaning. In Proceedings of the 25th International Conference on Data Engineering, ICDE '07, pages 746--755, 2007.Google ScholarCross Ref
- L. Caruccio, V. Deufemia, and G. Polese. Visual data integration based on description logic reasoning. In Proceedings of 18th International Database Engineering & Applications Symposium, IDEAS '14, pages 19--28, 2014. Google ScholarDigital Library
- L. Caruccio, V. Deufemia, and G. Polese. Relaxed functional dependencies -- A survey of approaches. IEEE Transactions on Knowledge and Data Engineering, 28(1):147--165, 2016. Google ScholarDigital Library
- S.-K. Chang, V. Deufemia, G. Polese, and M. Vacca. A normalization framework for multimedia databases. IEEE Transactions on Knowledge and Data Engineering, 19(12):1666--1679, 2007. Google ScholarDigital Library
- F. Chiang and R. J. Miller. Discovering data quality rules. Proc. VLDB Endow., 1(1):1166--1177, 2008. Google ScholarDigital Library
- E. F. Codd. A relational model of data for large shared data banks. Comm. ACM, 13(6):377--387, 1970. Google ScholarDigital Library
- A. K. Elmagarmid, P. G. Ipeirotis, and V. S. Verykios. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(1):1--16, 2007. Google ScholarDigital Library
- W. Fan, H. Gao, X. Jia, J. Li, and S. Ma. Dynamic constraints for record matching. The VLDB Journal, 20:495--520, 2011. Google ScholarDigital Library
- W. Fan, F. Geerts, L. V. S. Lakshmanan, and M. Xiong. Discovering conditional functional dependencies. In Proceedings of the 25th International Conference on Data Engineering, ICDE'09, pages 1231--1234, 2009. Google ScholarDigital Library
- P. A. Flach and I. Savnik. Database dependency discovery: A machine learning approach. AI Commun., 12(3):139--160, 1999. Google ScholarDigital Library
- C. Giannella and E. Robertson. On approximation measures for functional dependencies. Inform. Syst., 29(6):483--507, 2004. Google ScholarDigital Library
- L. Golab, H. Karloff, F. Korn, D. Srivastava, and B. Yu. On generating near-optimal tableaux for conditional functional dependencies. PVLDB, 1(1):376--390, 2008. Google ScholarDigital Library
- Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. TANE: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100--111, 1999.Google ScholarCross Ref
- I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic discovery of correlations and soft functional dependencies. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, SIGMOD '04, pages 647--658, 2004. Google ScholarDigital Library
- R. King and J. Oil. Discovery of functional and approximate functional dependencies in relational databases. J. Applied Math. and Decision Sciences, 7(1):49--59, 2003.Google ScholarCross Ref
- J. Kivinen and H. Mannila. Approximate inference of functional dependencies from relations. Theor. Comput. Sci., 149(1):129--149, 1995. Google ScholarDigital Library
- S. Kwashie, J. Liu, J. Li, and F. Ye. Mining differential dependencies: A subspace clustering approach. In H. Wang and M. A. Sharaf, editors, Proceedings of Australasian Database Conference, ADC '14, pages 50--61, 2014.Google Scholar
- S. Kwashie, J. Liu, J. Li, and F. Ye. Efficient discovery of differential dependencies through association rules mining. In Proceedings of Australasian Database Conference, ADC '15, pages 3--15, 2015.Google ScholarCross Ref
- M.-L. Lee, T. W. Ling, and W. L. Low. Designing functional dependencies for XML. In Proceedings of the 8th International Conference on Extending Database Technology, EDBT '02, pages 124--141, 2002. Google ScholarDigital Library
- J. Li. On optimal rule discovery. IEEE Transactions on Knowledge and Data Engineering, 18(4):460--471, 2006. Google ScholarDigital Library
- J. Liu, J. Li, C. Liu, and Y. Chen. Discover dependencies from data - A review. IEEE Transactions on Knowledge and Data Engineering, 24(2):251--264, 2012. Google ScholarDigital Library
- S. Lopes, J.-M. Petit, and L. Lakhal. Efficient discovery of functional dependencies and armstrong relations. In Proceedings of the 7th International Conference on Extending Database Technology, EDBT '00, pages 350--364, 2000. Google ScholarDigital Library
- U. Nambiar and S. Kambhampati. Mining approximate functional dependencies and concept similarities to answer imprecise queries. In Proceedings of International Workshop on the Web and Databases, WebDB '04, pages 73--78, 2004. Google ScholarDigital Library
- N. Novelli and R. Cicchetti. Fun: An efficient algorithm for mining functional and embedded dependencies. In Proceedings of 8th International Conference Database Theory, ICDT '01, pages 189--203. 2001. Google ScholarDigital Library
- T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J.-P. Rudolph, M. Schönberg, J. Zwiener, and F. Naumann. Functional dependency discovery: An experimental evaluation of seven algorithms. Proc. VLDB Endow., 8(10):1082--1093, 2015. Google ScholarDigital Library
- K. V. S. V. N. Raju and A. K. Majumdar. Fuzzy functional dependencies and lossless join decomposition of fuzzy relational database systems. ACM Transactions on Database Systems, 13(2):129--166, 1988. Google ScholarDigital Library
- L. Seligman, P. Mork, A. Halevy, K. Smith, M. J. Carey, K. Chen, C. Wolf, J. Madhavan, A. Kannan, and D. Burdick. OpenII: An open source information integration toolkit. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 1057--1060, 2010. Google ScholarDigital Library
- S. Song and L. Chen. Differential dependencies: Reasoning and discovery. ACM Transactions on Database Systems, 36:16, 2011. Google ScholarDigital Library
- S. Song and L. Chen. Efficient discovery of similarity constraints for matching dependencies. Data & Knowledge Engineering, 87:146--166, 2013. Google ScholarDigital Library
- S. Song, L. Chen, and H. Cheng. Efficient determination of distance thresholds for differential dependencies. IEEE Transactions on Knowledge and Data Engineering, 26(9):2179--2192, 2014.Google ScholarCross Ref
- V. Vianu. Dynamic functional dependencies and database aging. Journal of the ACM, 34(1):28--59, 1987. Google ScholarDigital Library
- C. Wyss, C. Giannella, and E. Robertson. FastFDs: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract. In Proceedings of Third International Conference on Data Warehousing and Knowledge Discovery, DaWaK '01, pages 101--110, 2001. Google ScholarDigital Library
- H. Yao and H. J. Hamilton. Mining functional dependencies from data. Data Mining and Knowledge Discovery, 16(2):197--219, 2007. Google ScholarDigital Library
- H. Yao, H. J. Hamilton, and C. J. Butz. FD Mine: Discovering functional dependencies in a database using equivalences. In Proceedings of IEEE International Conference on Data Mining, ICDM '02, pages 729--732, 2002. Google ScholarDigital Library
Index Terms
- On the Discovery of Relaxed Functional Dependencies
Recommendations
Incremental Discovery of Imprecise Functional Dependencies
Special Issue on Metadata Discovery for Assessing Data QualityFunctional dependencies (fds) are one of the metadata used to assess data quality and to perform data cleaning operations. However, to pursue robustness with respect to data errors, it has been necessary to devise imprecise versions of functional ...
Mining relaxed functional dependencies from data
AbstractRelaxed functional dependencies (rfds) are properties expressing important relationships among data. Thanks to the introduction of approximations in data comparison and/or validity, they can capture constraints useful for several purposes, such as ...
On Defining Functional Dependency for XML
ICSC '09: Proceedings of the 2009 IEEE International Conference on Semantic ComputingFunctional dependency (FD) is one of the integrity constraints for any data model. In relational data model, FDs are well studied and are widely used in normalization theory and in key algorithm. In recent years, XML has emerged as an widely used data ...
Comments