skip to main content
10.1145/2588555.2588560acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Modeling entity evolution for temporal record matching

Published:18 June 2014Publication History

ABSTRACT

Temporal record matching recognizes that if the entities represented by the records change over time, approaches that use temporal information may do better than approaches that do not. Any such temporal matching method relies at its heart on a temporal model that captures information about how entities evolve. In their pioneering work, Li {\it et al.} used an efficiently computable model that simply tries to predict if an attribute is expected to change over a given time interval. In our work, we propose and evaluate a more detailed model that focuses on the probability that a given attribute value reappears over time. The intuition here is that an entity might change its attribute value in the way that is dependent on its past values. In addition, our model considers sets of records (rather than simply pairs of records) to improve robustness and accuracy. Experimental results show that the resulting approach improves both accuracy and resistance to noise while incurring a minimal overhead.

References

  1. Academic patenting in Europe (APE-INV). http://www.esf-ape-inv.eu/.Google ScholarGoogle Scholar
  2. The DBLP computer science bibliography. http://www.informatik.uni-trier.de/ley/db/.Google ScholarGoogle Scholar
  3. Fec-standardizer - an experiment to standardize individual donor names in campaign finance data. https://github.com/cjdd3b/fec-standardizer.Google ScholarGoogle Scholar
  4. Twitter - an online social networking and microblogging service. https://twitter.com/.Google ScholarGoogle Scholar
  5. E. Cohen and M. Strauss. Maintaining time-decaying stream aggregates. Journal of Algorithms, 59(1):19-36, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cormode, V. Shkapenyuk, D. Srivastava, and B. Xu. Forward decay: A practical time decay model for streaming systems. In IEEE 25th International Conference on Data Engineering (ICDE), pages 138-149. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Domingos. Multi-relational record linkage. In Proc. of the KDD-2004 Workshop on Multi-Relational Data Mining. KDD, 2004.Google ScholarGoogle Scholar
  8. A. Elmagarmid, P. Ipeirotis, and V. Verykios. Duplicate record detection: A survey. IEEE Transactions on Knowledge and Data Engineering, 19(1):1-16, 2007. Google ScholarGoogle ScholarCross RefCross Ref
  9. I. Fellegi and A. Sunter. A theory for record linkage. Journal of the American Statistical Association, pages 1183-1210, 1969.Google ScholarGoogle ScholarCross RefCross Ref
  10. O. Hassanzadeh, F. Chiang, H. Lee, and R. Miller. Framework for evaluating clustering algorithms in duplicate detection. Proceedings of the VLDB Endowment, 2(1):1282-1293, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Jaccard. Distribution de la Flore Alpine: dans le Bassin des dranses et dans quelques régions voisines. Rouge, 1901.Google ScholarGoogle Scholar
  12. N. Koudas, S. Sarawagi, and D. Srivastava. Record linkage: similarity measures and algorithms. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 802-803. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, volume 10, pages 707-710, 1966.Google ScholarGoogle Scholar
  14. P. Li, X. Dong, A. Maurino, and D. Srivastava. Linking temporal records. Proceedings of the VLDB Endowment, 4(11):956-967, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Ozsoyoglu and R. Snodgrass. Temporal and real-time databases: A survey. IEEE Transactions on Knowledge and Data Engineering, 7(4):513-532, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Roddick and M. Spiliopoulou. A survey of temporal knowledge discovery paradigms and methods. IEEE Transactions on Knowledge and Data Engineering, 14(4):750-767, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Wang and M. A. Arbib. Complex temporal sequence learning based on short-term memory. Proceedings of the IEEE, 78(9):1536-1543, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  18. W. Winkler. Methods for record linkage and bayesian networks. Technical report, Statistical Research Division, US Census Bureau, Washington, DC, 2002.Google ScholarGoogle Scholar
  19. M. Yakout, A. Elmagarmid, H. Elmeleegy, M. Ouzzani, and A. Qi. Behavior based record linkage. Proceedings of the VLDB Endowment, 3(1-2):439-448, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling entity evolution for temporal record matching

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
        June 2014
        1645 pages
        ISBN:9781450323765
        DOI:10.1145/2588555

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 18 June 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader