Skip to main content
Top

2016 | OriginalPaper | Chapter

8. Object Identification

Authors : Carlo Batini, Monica Scannapieco

Published in: Data and Information Quality

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter we address object identification (IQ), the most important and the most extensively investigated information quality activity. Due to such an importance, we decided to dedicate two chapters of the book to object identification, this chapter focusing on consolidated techniques and the next one on recent advancements.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
23.
go back to reference Ananthakrishna R, Chaudhuri C, Ganti V (2002) Eliminating Fuzzy duplicates in data warehouses. In: Proceedings of VLDB 2002, Hong Kong, pp 586–597 Ananthakrishna R, Chaudhuri C, Ganti V (2002) Eliminating Fuzzy duplicates in data warehouses. In: Proceedings of VLDB 2002, Hong Kong, pp 586–597
52.
go back to reference Belin TR, Rubin DB (1995) A method for calibrating false matches rates in record linkage. Journal of American Statistical Association 90:694–707CrossRefMATH Belin TR, Rubin DB (1995) A method for calibrating false matches rates in record linkage. Journal of American Statistical Association 90:694–707CrossRefMATH
64.
go back to reference Bertolazzi P, Santis LD, Scannapieco M (2003) Automatic record matching in cooperative information systems. In: Proceedings of the ICDT’03 International Workshop on Data Quality in Cooperative Information Systems (DQCIS’03), Siena Bertolazzi P, Santis LD, Scannapieco M (2003) Automatic record matching in cooperative information systems. In: Proceedings of the ICDT’03 International Workshop on Data Quality in Cooperative Information Systems (DQCIS’03), Siena
77.
go back to reference Bitton D, DeWitt D (1983) Duplicate record elimination in large data files. ACM Transactions on Databases Systems 8(2):255–262CrossRefMATH Bitton D, DeWitt D (1983) Duplicate record elimination in large data files. ACM Transactions on Databases Systems 8(2):255–262CrossRefMATH
96.
go back to reference Buechi M, Borthwick A, Winkel A, Goldberg A (2003) ClueMaker: a language for approximate record matching. In: Proceedings of the 7th International Conference on Information Quality (ICIQ 2003), Boston, pp 207–223 Buechi M, Borthwick A, Winkel A, Goldberg A (2003) ClueMaker: a language for approximate record matching. In: Proceedings of the 7th International Conference on Information Quality (ICIQ 2003), Boston, pp 207–223
144.
go back to reference Codd EF (1970) A relational model of data for large shared data banks. Communications of the ACM 13(6):377–387CrossRefMATH Codd EF (1970) A relational model of data for large shared data banks. Communications of the ACM 13(6):377–387CrossRefMATH
174.
go back to reference Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society 39:1–38MathSciNetMATH Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society 39:1–38MathSciNetMATH
180.
go back to reference Dong X, Halevy AY, Madhavan J (2005) Reference reconciliation in complex information spaces. In: Proceedings of the SIGMOD 2005, pp 85–96 Dong X, Halevy AY, Madhavan J (2005) Reference reconciliation in complex information spaces. In: Proceedings of the SIGMOD 2005, pp 85–96
186.
go back to reference Dunn HL (1946) Record linkage. American Journal of Public Health 36:1412–1416CrossRef Dunn HL (1946) Record linkage. American Journal of Public Health 36:1412–1416CrossRef
193.
go back to reference Elfeky MG, Verykios VS, Elmagarmid AK (2002) Tailor: a record linkage toolbox. In: Proceedings of the 18th International Conference on Data Engineering, 2002. IEEE, New York, pp 17–28 Elfeky MG, Verykios VS, Elmagarmid AK (2002) Tailor: a record linkage toolbox. In: Proceedings of the 18th International Conference on Data Engineering, 2002. IEEE, New York, pp 17–28
229.
go back to reference Fellegi IP, Sunter AB (1969) A theory for record linkage. Journal of the American Statistical Association 64 Fellegi IP, Sunter AB (1969) A theory for record linkage. Journal of the American Statistical Association 64
281.
go back to reference Gu L, Baxter R, Vickers D, Rainsford C (2003) Record Linkage: Current Practice and Future Directions. Technical Report 03/83, CMIS 03/83 Gu L, Baxter R, Vickers D, Rainsford C (2003) Record Linkage: Current Practice and Future Directions. Technical Report 03/83, CMIS 03/83
308.
go back to reference Hernández MA, Stolfo SJ (1995) The merge/purge problem for large databases. In: ACM SIGMOD Record. ACM, New York, vol 24, pp 127–138 Hernández MA, Stolfo SJ (1995) The merge/purge problem for large databases. In: ACM SIGMOD Record. ACM, New York, vol 24, pp 127–138
309.
go back to reference Hernandez MA, Stolfo SJ (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Journal of Data Mining and Knowledge Discovery 1(2) Hernandez MA, Stolfo SJ (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Journal of Data Mining and Knowledge Discovery 1(2)
335.
go back to reference Jaccard P (1901) Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz Jaccard P (1901) Etude comparative de la distribution florale dans une portion des Alpes et du Jura. Impr. Corbaz
343.
go back to reference Jaro MA (1985) Advances in record linkage methodologies as applied to matching the 1985 Cencus of Tampa, Florida. Journal of American Statistical Society 84(406):414–420CrossRef Jaro MA (1985) Advances in record linkage methodologies as applied to matching the 1985 Cencus of Tampa, Florida. Journal of American Statistical Society 84(406):414–420CrossRef
388.
go back to reference Larsen MD, Rubin DB (1989) An iterative automated record matching using mixture models. Journal of American Statistical Association 79:32–41MathSciNet Larsen MD, Rubin DB (1989) An iterative automated record matching using mixture models. Journal of American Statistical Association 79:32–41MathSciNet
394.
go back to reference Lehti P, Fankhauser P (2005) Probabilistic iterative duplicate detection. In: OTM Conferences (2), pp 1225–1242 Lehti P, Fankhauser P (2005) Probabilistic iterative duplicate detection. In: OTM Conferences (2), pp 1225–1242
416.
go back to reference Low W, Lee M, Ling T (2001) A knowledge-based approach for duplicate elimination in data cleaning. Information Systems 26(8):586–606MATH Low W, Lee M, Ling T (2001) A knowledge-based approach for duplicate elimination in data cleaning. Information Systems 26(8):586–606MATH
448.
go back to reference Monge A, Elkan C (1997) An efficient domain independent algorithm for detecting approximate duplicate database records. In: Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’97), Tucson Monge A, Elkan C (1997) An efficient domain independent algorithm for detecting approximate duplicate database records. In: Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’97), Tucson
465.
go back to reference Navarro G (2001) A guided tour of approximate string matching. ACM Computing Surveys 31:31–88CrossRef Navarro G (2001) A guided tour of approximate string matching. ACM Computing Surveys 31:31–88CrossRef
470.
go back to reference Newcombe HB, Kennedy JM, Axford SJ, James APF (1959) Automatic linkage of vital records. Science 130 Newcombe HB, Kennedy JM, Axford SJ, James APF (1959) Automatic linkage of vital records. Science 130
472.
go back to reference Nigam K, McCallum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Machine Learning 39:103–134CrossRefMATH Nigam K, McCallum A, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Machine Learning 39:103–134CrossRefMATH
545.
go back to reference Sarawagi S, Bhamidipaty A (eds) (Edmonton, Alberta, Canada, 2002) Interactive Deduplication Using Active Learning Sarawagi S, Bhamidipaty A (eds) (Edmonton, Alberta, Canada, 2002) Interactive Deduplication Using Active Learning
584.
go back to reference Smith TF, Waterman MS (1981) Identification of common molecular subsequences. Molecular Biology 147:195–197CrossRef Smith TF, Waterman MS (1981) Identification of common molecular subsequences. Molecular Biology 147:195–197CrossRef
593.
go back to reference Stolfo SJ, Hernandez MA (1995) The merge/purge problem for large databases. In: Proceedings of the SIGMOD 1995, pp 127–138 Stolfo SJ, Hernandez MA (1995) The merge/purge problem for large databases. In: Proceedings of the SIGMOD 1995, pp 127–138
605.
go back to reference Tejada S, Knoblock C, Minton S (2001) Learning object identification rules for information integration. Information Systems 26(8):607–633CrossRefMATH Tejada S, Knoblock C, Minton S (2001) Learning object identification rules for information integration. Information Systems 26(8):607–633CrossRefMATH
626.
go back to reference Verykios VS, Moustakides GV, Elfeky MG (2003) A Bayesian decision model for cost optimal record matching. The VLDB Journal 12:28–40CrossRef Verykios VS, Moustakides GV, Elfeky MG (2003) A Bayesian decision model for cost optimal record matching. The VLDB Journal 12:28–40CrossRef
660.
go back to reference Weis M, Naumann F (2005) DogmatiX tracks down duplicates in XML. In: Proceedings of the SIGMOD 2005, pp 431–442 Weis M, Naumann F (2005) DogmatiX tracks down duplicates in XML. In: Proceedings of the SIGMOD 2005, pp 431–442
668.
go back to reference Winkler W (1993) Improved decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association Winkler W (1993) Improved decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association
669.
go back to reference Winkler WE (1988) Using the EM algorithm for weight computation in the Fellegi and Sunter modelo of record linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association Winkler WE (1988) Using the EM algorithm for weight computation in the Fellegi and Sunter modelo of record linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association
670.
go back to reference Winkler WE (1995) Matching and record linkage. Business Survey Methods 1:355–384 Winkler WE (1995) Matching and record linkage. Business Survey Methods 1:355–384
671.
go back to reference Winkler WE (2000) Machine learning, information retrieval and record linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association Winkler WE (2000) Machine learning, information retrieval and record linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association
672.
go back to reference Winkler WE (2001) Quality of Very Large Databases. Technical Report RR-2001/04, U.S. Bureau of the Census, Statistical Research Division Winkler WE (2001) Quality of Very Large Databases. Technical Report RR-2001/04, U.S. Bureau of the Census, Statistical Research Division
673.
go back to reference Winkler WE (2004) Methods for evaluating and creating data quality. Information Systems 29(7):531–550CrossRef Winkler WE (2004) Methods for evaluating and creating data quality. Information Systems 29(7):531–550CrossRef
Metadata
Title
Object Identification
Authors
Carlo Batini
Monica Scannapieco
Copyright Year
2016
DOI
https://doi.org/10.1007/978-3-319-24106-7_8

Premium Partner