Skip to main content
Top

2015 | OriginalPaper | Chapter

Schema Matching Based on Source Codes

Authors : Guohui Ding, Guoren Wang, Chunlong Fan, Shuo Chen

Published in: Database Systems for Advanced Applications

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Schema matching is a critical step in numerous database applications, such as web data sources integrating, data warehouse loading and information exchanging among several authorities. Existing techniques for schema matching are classified as either schema-based, instance-based, or a combination of both. In this paper, we propose a new class of techniques, called schema matching based on source codes. The idea is to exploit the exterior schema extracted from the source codes to find semantic correspondences between attributes in the schemas to be matched. Essentially, the exterior schema is a schema that is used to be exposed to final users and is in the outermost shell of applications. Thus, it typically contains complete semantics of data, which is very helpful in the solution of schema matching. We present a framework for schema matching based on source codes, which includes three key components: extracting the exterior schema, evaluating the quality of matching and finding the optimal mapping. We also present some helpful features and rules of the source codes for the implementation of each component, and address the corresponding challenges in details.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33(1), 49–84 (2000)CrossRefMATH Li, W.-S., Clifton, C.: SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng. 33(1), 49–84 (2000)CrossRefMATH
2.
go back to reference Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: a machine-learning approach. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 509–520 (2001) Doan, A., Domingos, P., Halevy, A.: Reconciling schemas of disparate data sources: a machine-learning approach. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 509–520 (2001)
3.
go back to reference Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. J. Very Large Data Bases (VLDB) 10(4), 334–350 (2001)CrossRefMATH Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. J. Very Large Data Bases (VLDB) 10(4), 334–350 (2001)CrossRefMATH
4.
go back to reference Do, H.-H., Rahm, E.: COMA - A system for flexible combination of schema matching approaches. In: Proceedings of Very Large Data Bases (VLDB), pp. 610–621 (2002) Do, H.-H., Rahm, E.: COMA - A system for flexible combination of schema matching approaches. In: Proceedings of Very Large Data Bases (VLDB), pp. 610–621 (2002)
5.
go back to reference Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 117–128 (2002) Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity flooding: a versatile graph matching algorithm and its application to schema matching. In: Proceedings of the International Conference on Data Engineering (ICDE), pp. 117–128 (2002)
6.
go back to reference Kang, J., Naughton, J.F.: On schema matching with opaque column names and data values. In: Proceedings of the Special Interest Group on Management Of Data (SIGMOD), pp. 205–216 (2003) Kang, J., Naughton, J.F.: On schema matching with opaque column names and data values. In: Proceedings of the Special Interest Group on Management Of Data (SIGMOD), pp. 205–216 (2003)
7.
go back to reference Cohen, W. W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI Workshop on Information Integration on the Web (IIWeb), pp. 73–78 (2003) Cohen, W. W., Ravikumar, P., Fienberg, S.E.: A comparison of string distance metrics for name-matching tasks. In: Proceedings of the IJCAI Workshop on Information Integration on the Web (IIWeb), pp. 73–78 (2003)
8.
go back to reference He, B., Chang, K.C.: Statistical schema matching across web query interfaces. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 217–228 (2003) He, B., Chang, K.C.: Statistical schema matching across web query interfaces. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 217–228 (2003)
9.
go back to reference He, B., Chang, K.C.-C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach. In: Proceedings of Knowledge Discovery and Data Mining (KDD), pp. 148–157 (2004) He, B., Chang, K.C.-C., Han, J.: Discovering complex matchings across web query interfaces: a correlation mining approach. In: Proceedings of Knowledge Discovery and Data Mining (KDD), pp. 148–157 (2004)
10.
go back to reference Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 69–80 (2005) Bilke, A., Naumann, F.: Schema Matching using Duplicates. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 69–80 (2005)
11.
go back to reference Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 57–68 (2005) Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.Y.: Corpus-based schema matching. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 57–68 (2005)
12.
go back to reference Warren, R.H., Tompa, F.: Multicolumn substring matching for database schema translation. In: Proceedings of Very Large Data Bases (VLDB), pp. 331–342 (2006) Warren, R.H., Tompa, F.: Multicolumn substring matching for database schema translation. In: Proceedings of Very Large Data Bases (VLDB), pp. 331–342 (2006)
13.
go back to reference Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting context into schema matching. In: Proceedings of Very Large Data Bases (VLDB), pp. 307–318 (2006) Bohannon, P., Elnahrawy, E., Fan, W., Flaster, M.: Putting context into schema matching. In: Proceedings of Very Large Data Bases (VLDB), pp. 307–318 (2006)
14.
go back to reference Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of Very Large Data Bases (VLDB), pp. 687–698 (2007) Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of Very Large Data Bases (VLDB), pp. 687–698 (2007)
15.
go back to reference An, Y., Borgid, A., Miller, R.J.: A semantic approach to discovering schema mapping expressions. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 206–215 (2007) An, Y., Borgid, A., Miller, R.J.: A semantic approach to discovering schema mapping expressions. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 206–215 (2007)
16.
go back to reference Dai, B.T., Koudas, N., Srivastavat, D., Tung, A.K.H., Venkatasubramaniant, S.: Validating Multi-column Schema Matchings by Type. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 120–129 (2008) Dai, B.T., Koudas, N., Srivastavat, D., Tung, A.K.H., Venkatasubramaniant, S.: Validating Multi-column Schema Matchings by Type. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 120–129 (2008)
17.
go back to reference Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 861–874 (2008) Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 861–874 (2008)
18.
go back to reference Chan, C., Elmeleegy, H.V.J.H., Ouzzani, M., Elmagarmid, A.: Usage-based schema matching. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 20–29 (2008) Chan, C., Elmeleegy, H.V.J.H., Ouzzani, M., Elmagarmid, A.: Usage-based schema matching. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 20–29 (2008)
19.
go back to reference Nguyen, T., Moreira, V., Nguyen, H., Nguyen, H., Freire, J.: Multilingual schema matching for wikipedia infoboxes. In: Proceedings of Very Large Data Bases (VLDB), pp. 133–144 (2011) Nguyen, T., Moreira, V., Nguyen, H., Nguyen, H., Freire, J.: Multilingual schema matching for wikipedia infoboxes. In: Proceedings of Very Large Data Bases (VLDB), pp. 133–144 (2011)
20.
go back to reference Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 306–317 (2012) Peukert, E., Eberius, J., Rahm, E.: A self-configuring schema matching system. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 306–317 (2012)
21.
go back to reference Qian, L., Cafarella, M.J., Jagadish, H.V.: Sample-driven schema mapping. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 73–84 (2012) Qian, L., Cafarella, M.J., Jagadish, H.V.: Sample-driven schema mapping. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 73–84 (2012)
22.
go back to reference Zhang, M., Chakrabarti, K.: Infogather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 145–156 (2013) Zhang, M., Chakrabarti, K.: Infogather+: semantic matching and annotation of numeric and time-varying attributes in web tables. In: Proceedings of Special Interest Group on Management Of Data (SIGMOD), pp. 145–156 (2013)
Metadata
Title
Schema Matching Based on Source Codes
Authors
Guohui Ding
Guoren Wang
Chunlong Fan
Shuo Chen
Copyright Year
2015
DOI
https://doi.org/10.1007/978-3-319-22324-7_8

Premium Partner