Skip to main content
Top

2017 | OriginalPaper | Chapter

Semantics and Verification of Entity Resolution and Data Fusion Operations via Transformation into a Formal Notation

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

During all the period of development of data integration methods and tools the issues of formal semantics definition and verification were arising. Three levels of integration can be distinguished: data model integration, schema matching and integration and data integration proper. This paper is aimed at development of methods and tools for formal semantics definition and verification on the third level – level of data proper. An approach for definition of formal semantics for high-level data integration programs is proposed. The semantics is defined using a transformation into a formal specification language supported by automatic/interactive provers. The semantics is applied for verification of structured data integration workflows. Workflow properties to be verified are presented as expressions of the specification language chosen. After that a semantic specification of the data integration workflow is verified w.r.t. required properties. A practical aim of the work is to define a basis for formal verification of data integration workflows during problem solving in various integration environments.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Abrial, J.-R.: The B-Book: Assigning Programs to Meanings. Cambridge University Press, Cambridge (1996)CrossRefMATH Abrial, J.-R.: The B-Book: Assigning Programs to Meanings. Cambridge University Press, Cambridge (1996)CrossRefMATH
5.
go back to reference Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.-C., Ozcan, F., Shekita, E.J.: Jaql: a scripting language for large scale semistructured data analysis. In: 37th International Conference on Very Large Data Bases VLDB, pp. 1272–1283. Curran Associates, New York (2011) Beyer, K.S., Ercegovac, V., Gemulla, R., Balmin, A., Eltabakh, M., Kanne, C.-C., Ozcan, F., Shekita, E.J.: Jaql: a scripting language for large scale semistructured data analysis. In: 37th International Conference on Very Large Data Bases VLDB, pp. 1272–1283. Curran Associates, New York (2011)
7.
go back to reference Bleiholde, J.: Data fusion and conflict resolution in integrated information systems. D.Sc. Diss., 184 p., Hasso-Plattner-Institut, Potsdam (2010) Bleiholde, J.: Data fusion and conflict resolution in integrated information systems. D.Sc. Diss., 184 p., Hasso-Plattner-Institut, Potsdam (2010)
8.
go back to reference Burdick, D., Hernández, M.A., Ho, H., Koutrika, G., Krishnamurthy, R., Popa, L., Stanoi, I.R., Vaithyanathan, S., Das, S.: Extracting, linking and integrating data from public sources: a financial case study. IEEE Data Eng. Bull. 34(3), 60–67 (2011) Burdick, D., Hernández, M.A., Ho, H., Koutrika, G., Krishnamurthy, R., Popa, L., Stanoi, I.R., Vaithyanathan, S., Das, S.: Extracting, linking and integrating data from public sources: a financial case study. IEEE Data Eng. Bull. 34(3), 60–67 (2011)
9.
go back to reference Calegari, D., Szasz, N.: Verification of model transformations: a survey of the state-of-the-art. Electronic Notes in Theoretical Computer Science 292, 5–25 (2013)CrossRef Calegari, D., Szasz, N.: Verification of model transformations: a survey of the state-of-the-art. Electronic Notes in Theoretical Computer Science 292, 5–25 (2013)CrossRef
10.
go back to reference Luna Dong, X., Naumann, F.: Data fusion — resolving data conflicts in integration. Proc. VLDB Endowment 2(2), 1654–1655 (2009)CrossRef Luna Dong, X., Naumann, F.: Data fusion — resolving data conflicts in integration. Proc. VLDB Endowment 2(2), 1654–1655 (2009)CrossRef
11.
go back to reference Fagin, R., Kolaitis, P., Miller, R., Popa, L.: Data exchange: semantics and query answering. Theoret. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefMATH Fagin, R., Kolaitis, P., Miller, R., Popa, L.: Data exchange: semantics and query answering. Theoret. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefMATH
12.
go back to reference Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: schema mapping creation and data exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02463-4_12 CrossRef Fagin, R., Haas, L.M., Hernández, M., Miller, R.J., Popa, L., Velegrakis, Y.: Clio: schema mapping creation and data exchange. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 198–236. Springer, Heidelberg (2009). doi:10.​1007/​978-3-642-02463-4_​12 CrossRef
13.
go back to reference Getoor, L., Machanavajjhala, A.: Entity resolution for big data. In: KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Proceedings, pp. 1527–1527 (2013) Getoor, L., Machanavajjhala, A.: Entity resolution for big data. In: KDD 2013: 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining Proceedings, pp. 1527–1527 (2013)
15.
go back to reference Hernandez, M., Koutrika, G., Krishnamurthy, R., Popa, L., Wisnesky, R.: HIL: a high-level scripting language for entity integration. In: 16th Conference (International) on Extending Database Technology Proceedings EDBT 2013, pp. 549–560 (2013) Hernandez, M., Koutrika, G., Krishnamurthy, R., Popa, L., Wisnesky, R.: HIL: a high-level scripting language for entity integration. In: 16th Conference (International) on Extending Database Technology Proceedings EDBT 2013, pp. 549–560 (2013)
16.
go back to reference Kalinichenko, L.A.: Method for data models integration in the common paradigm. In: Proceedings of the First East-European Symposium on Advances in Databases and Information Systems ADBIS 1997, vol. 1: Regular Papers, pp. 275–284. Nevsky Dialect, St.-Petersburg (1997) Kalinichenko, L.A.: Method for data models integration in the common paradigm. In: Proceedings of the First East-European Symposium on Advances in Databases and Information Systems ADBIS 1997, vol. 1: Regular Papers, pp. 275–284. Nevsky Dialect, St.-Petersburg (1997)
17.
go back to reference Kalinichenko, L., Stupnikov, S., Zemtsov, N.: Extensible canonical process model synthesis applying formal interpretation. In: Eder, J., Haav, H.-M., Kalja, A., Penjam, J. (eds.) ADBIS 2005. LNCS, vol. 3631, pp. 183–198. Springer, Heidelberg (2005). doi:10.1007/11547686_14 CrossRef Kalinichenko, L., Stupnikov, S., Zemtsov, N.: Extensible canonical process model synthesis applying formal interpretation. In: Eder, J., Haav, H.-M., Kalja, A., Penjam, J. (eds.) ADBIS 2005. LNCS, vol. 3631, pp. 183–198. Springer, Heidelberg (2005). doi:10.​1007/​11547686_​14 CrossRef
18.
go back to reference Kalinichenko, L.A., Stupnikov, S.A.: Constructing of mappings of heterogeneous information models into the canonical models of integrated information systems. In: Advances in Databases and Information Systems: Proceedings of the 12th East-European Conference, pp. 106–122. Tampere University of Technology, Pori (2008) Kalinichenko, L.A., Stupnikov, S.A.: Constructing of mappings of heterogeneous information models into the canonical models of integrated information systems. In: Advances in Databases and Information Systems: Proceedings of the 12th East-European Conference, pp. 106–122. Tampere University of Technology, Pori (2008)
19.
go back to reference Kalinichenko, L.A., Stupnikov, S.A.: Heterogeneous information model unification as a pre-requisite to resource schema mapping. In: D’Atri, A., Saccà, D. (eds.) Information Systems: People, Organizations, Institutions, and Technologies - Proceedings of the V Conference of the Italian Chapter of Association for Information Systems itAIS, pp. 373–380. Springer Physica Verlag, Heidelberg (2010) Kalinichenko, L.A., Stupnikov, S.A.: Heterogeneous information model unification as a pre-requisite to resource schema mapping. In: D’Atri, A., Saccà, D. (eds.) Information Systems: People, Organizations, Institutions, and Technologies - Proceedings of the V Conference of the Italian Chapter of Association for Information Systems itAIS, pp. 373–380. Springer Physica Verlag, Heidelberg (2010)
20.
go back to reference Kalinichenko, L.A., Stupnikov, S.A.: OWL as yet another data model to be integrated. In: Advances in Databases and Information Systems: Proceedings II of the 15th East-European Conference, pp. 178–189. Austrian Computer Society, Vienna (2011) Kalinichenko, L.A., Stupnikov, S.A.: OWL as yet another data model to be integrated. In: Advances in Databases and Information Systems: Proceedings II of the 15th East-European Conference, pp. 178–189. Austrian Computer Society, Vienna (2011)
21.
go back to reference Kalinichenko, L., Stupnikov, S., Vovchenko, A., Kovalev, D.: Rule-based multi-dialect infrastructure for conceptual problem solving over heterogeneous distributed information resources. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol. 241, pp. 61–68. Springer, Cham (2014)CrossRef Kalinichenko, L., Stupnikov, S., Vovchenko, A., Kovalev, D.: Rule-based multi-dialect infrastructure for conceptual problem solving over heterogeneous distributed information resources. In: Catania, B., et al. (eds.) New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol. 241, pp. 61–68. Springer, Cham (2014)CrossRef
22.
go back to reference Kalinichenko, L.A., Stupnikov, S.A., Vovchenko, A.E., Kovalev, D.Y.: Conceptual modeling of multi-dialect workflows. Informatics and Applications 8(4), 110–124 (2014) Kalinichenko, L.A., Stupnikov, S.A., Vovchenko, A.E., Kovalev, D.Y.: Conceptual modeling of multi-dialect workflows. Informatics and Applications 8(4), 110–124 (2014)
23.
go back to reference Kopcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endowment 3(1–2), 484–493 (2010)CrossRef Kopcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endowment 3(1–2), 484–493 (2010)CrossRef
24.
go back to reference Larsen, P.G., Plat, N., Toetenel, H.: A formal semantics of data flow diagrams. Formal Aspects Comput. 6(6), 586–606 (1994)CrossRefMATH Larsen, P.G., Plat, N., Toetenel, H.: A formal semantics of data flow diagrams. Formal Aspects Comput. 6(6), 586–606 (1994)CrossRefMATH
25.
go back to reference Lano, K., Bicarregui, J., Evans, A.: Structured axiomatic semantics for UML models. In: Rigorous Object-Oriented Methods: Proceedings of the Conference, p. 5 (2000) Lano, K., Bicarregui, J., Evans, A.: Structured axiomatic semantics for UML models. In: Rigorous Object-Oriented Methods: Proceedings of the Conference, p. 5 (2000)
26.
go back to reference Lano, K., Kolahdouz-Rahimi, S., Clark, T.: Language-independent model transformation verification. In: Verification of Model Transformations, Proceedings of the Third International Workshop on Verification of Model Transformations, CEUR Workshop Proceedings, vol. 1325, pp. 36–45 (2014) Lano, K., Kolahdouz-Rahimi, S., Clark, T.: Language-independent model transformation verification. In: Verification of Model Transformations, Proceedings of the Third International Workshop on Verification of Model Transformations, CEUR Workshop Proceedings, vol. 1325, pp. 36–45 (2014)
27.
go back to reference Miner, D.: MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems. O’Reilly Media, Sebastopol (2012) Miner, D.: MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems. O’Reilly Media, Sebastopol (2012)
28.
go back to reference Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps: resolving inconsistencies at schema-, tuple-, and value-level. IEEE Data Engineering Bulletin 29(2), 21–31 (2006) Naumann, F., Bilke, A., Bleiholder, J., Weis, M.: Data fusion in three steps: resolving inconsistencies at schema-, tuple-, and value-level. IEEE Data Engineering Bulletin 29(2), 21–31 (2006)
29.
go back to reference Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the SIGMOD Conference, pp. 1099–1110 (2008) Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the SIGMOD Conference, pp. 1099–1110 (2008)
30.
go back to reference Bellahsene, Z., Bonifati, A., Rahm, E. (eds.): Schema Matching and Mapping. Springer, Heidelberg (2011)MATH Bellahsene, Z., Bonifati, A., Rahm, E. (eds.): Schema Matching and Mapping. Springer, Heidelberg (2011)MATH
31.
go back to reference Stupnikov, S., Kalinichenko, L., Bressan, S.: Interactive discovery and composition of complex web services. In: Manolopoulos, Y., Pokorný, J., Sellis, T.K. (eds.) ADBIS 2006. LNCS, vol. 4152, pp. 216–231. Springer, Heidelberg (2006). doi:10.1007/11827252_18 CrossRef Stupnikov, S., Kalinichenko, L., Bressan, S.: Interactive discovery and composition of complex web services. In: Manolopoulos, Y., Pokorný, J., Sellis, T.K. (eds.) ADBIS 2006. LNCS, vol. 4152, pp. 216–231. Springer, Heidelberg (2006). doi:10.​1007/​11827252_​18 CrossRef
32.
go back to reference Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A generic and customizable framework for the design of ETL scenarios. Inf. Syst. 30(7), 492–525 (2005)CrossRef Vassiliadis, P., Simitsis, A., Georgantas, P., Terrovitis, M., Skiadopoulos, S.: A generic and customizable framework for the design of ETL scenarios. Inf. Syst. 30(7), 492–525 (2005)CrossRef
33.
go back to reference Stupnikov, S.A.: Modeling of compositional refining specifications. Ph.D. thesis. Institute of Informatics Problems, Russian Academy of Sciences, Moscow, 195 p. (2006) Stupnikov, S.A.: Modeling of compositional refining specifications. Ph.D. thesis. Institute of Informatics Problems, Russian Academy of Sciences, Moscow, 195 p. (2006)
34.
go back to reference Stupnikov, S.A.: Unification of an array data model for the integration of heterogeneous information resources. In: Proceedings of the 14th Russian Conference on Digital Libraries RCDL 2012, CEUR Workshop Proceedings, vol. 934, pp. 42–52 (2012) Stupnikov, S.A.: Unification of an array data model for the integration of heterogeneous information resources. In: Proceedings of the 14th Russian Conference on Digital Libraries RCDL 2012, CEUR Workshop Proceedings, vol. 934, pp. 42–52 (2012)
35.
go back to reference Stupnikov, S.A.: Mapping of a graph data model into an object-frame canonical information model for the development of heterogeneous information resources integration systems. In: Proceedings of the 15th Russian Conference on Digital Libraries RCDL 2013, CEUR Workshop Proceedings, vol. 1108, pp. 85–94 (2013) Stupnikov, S.A.: Mapping of a graph data model into an object-frame canonical information model for the development of heterogeneous information resources integration systems. In: Proceedings of the 15th Russian Conference on Digital Libraries RCDL 2013, CEUR Workshop Proceedings, vol. 1108, pp. 85–94 (2013)
36.
go back to reference Stupnikov, S.A., Vovchenko, A.E.: Combined virtual and materialized environment for integration of large heterogeneous data collections. In: Proceedings of the 16th Russian Conference on Digital Libraries RCDL 2014 Proceedings. CEUR Workshop Proceedings, vol. 1297, pp. 201–210 (2014) Stupnikov, S.A., Vovchenko, A.E.: Combined virtual and materialized environment for integration of large heterogeneous data collections. In: Proceedings of the 16th Russian Conference on Digital Libraries RCDL 2014 Proceedings. CEUR Workshop Proceedings, vol. 1297, pp. 201–210 (2014)
39.
go back to reference Stupnikov, S.: Formal semantics of a language for entity resolution and data fusion and its application for verification of data integration workflows. Selected Papers of the XVIII International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2016), CEUR Workshop Proceedings, vol. 1752, pp. 159–167 (2016) Stupnikov, S.: Formal semantics of a language for entity resolution and data fusion and its application for verification of data integration workflows. Selected Papers of the XVIII International Conference on Data Analytics and Management in Data Intensive Domains (DAMDID/RCDL 2016), CEUR Workshop Proceedings, vol. 1752, pp. 159–167 (2016)
40.
go back to reference Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling Framework, 2nd edn. Addison-Wesley Professional, Reading (2008) Steinberg, D., Budinsky, F., Paternostro, M., Merks, E.: EMF: Eclipse Modeling Framework, 2nd edn. Addison-Wesley Professional, Reading (2008)
Metadata
Title
Semantics and Verification of Entity Resolution and Data Fusion Operations via Transformation into a Formal Notation
Author
Sergey Stupnikov
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-57135-5_11

Premium Partner