skip to main content
10.1145/3394486.3403115acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Representing Temporal Attributes for Schema Matching

Published:20 August 2020Publication History

ABSTRACT

Temporal data are prevalent, where one or several time attributes present. It is challenging to identify the temporal attributes from heterogeneous sources. The reason is that the same attribute could contain distinct values in different time spans, whereas different attributes may have highly similar timestamps and alike values. Existing studies on schema matching seldom explore the temporal information for matching attributes. In this paper, we argue to order the values in an attribute A by some time attribute T as a time series. To learn deep temporal features in the attribute pair (T, A), we devise an auto-encoder to embed the transitions of values in the time series into a vector. The temporal attribute matching (TAM) is thus to evaluate matching distance of two temporal attribute pairs by comparing their transition vectors. We show that computing the optimal matching distance is NP-hard, and present an approximation algorithm. Experiments on real datasets demonstrate the superiority of our proposal in matching temporal attributes compared to the generic schema matching approaches.

References

  1. I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou. Patient subtyping via time-aware LS™ networks. In SIGKDD, pages 65--74. ACM, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Z. Bellahsene, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. A. Bernstein, J. Madhavan, and E. Rahm. Generic schema matching, ten years later. PVLDB, 4(11):695--701, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bonner, I. Kureshi, J. Brennan, G. Theodoropoulos, A. S. McGough, and B. Obara. Exploring the semantic content of unsupervised graph embeddings: An empirical study. Data Science and Engineering, 4(3):269--289, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  5. K. Cheng and A. S. Krishnakumar. Automatic generation of functional vectors using the extended finite state machine model. ACM TODAES, 1(1):57--79, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Chung, cC. Gü lcc ehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014.Google ScholarGoogle Scholar
  7. H. Elmeleegy, M. Ouzzani, and A. K. Elmagarmid. Usage-based schema matching. In ICDE, pages 20--29, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Fan, X. Lu, D. Li, and Y. Liu. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In ICMI, pages 445--450, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. Gao, B. Xiao, D. Tao, and X. Li. A survey of graph edit distance. PAA, 13(1):113--129, 2010.Google ScholarGoogle Scholar
  10. Y. Gao, S. Song, X. Zhu, J. Wang, X. Lian, and L. Zou. Matching heterogeneous event data. IEEE Trans. Knowl. Data Eng., 30(11):2157--2170, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249--256, 2010.Google ScholarGoogle Scholar
  12. A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In SIGKDD, pages 855--864. ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Gugulothu, V. TV, P. Malhotra, L. Vig, P. Agarwal, and G. Shroff. Predicting remaining useful life using time series embeddings based on recurrent neural networks. CoRR, abs/1709.01073, 2017.Google ScholarGoogle Scholar
  14. S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735--1780, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Itai, M. Rodeh, and S. L. Tanimoto. Some matching problems for bipartite graphs. J. ACM, 25(4):517--525, 1978.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. R. Jaiswal, D. J. Miller, and P. Mitra. Uninterpreted schema matching with embedded value mapping under opaque column names and data values. IEEE TKDE, 22(2):291--304, 2010.Google ScholarGoogle Scholar
  17. A. R. Jaiswal, D. J. Miller, and P. Mitra. Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields. ACM TODS, 38(1):2:1--2:34, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD, pages 205--216. ACM, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. M. Karp. Reducibility among combinatorial problems. In 50 Years of Integer Programming 1958--2008 - From the Early Years to the State-of-the-Art, pages 219--241. Springer, 2010.Google ScholarGoogle Scholar
  20. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR. OpenReview.net, 2017.Google ScholarGoogle Scholar
  21. H. W. Kuhn. The hungarian method for the assignment problem. In 50 Years of Integer Programming 1958--2008 - From the Early Years to the State-of-the-Art, pages 29--47. Springer, 2010.Google ScholarGoogle Scholar
  22. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128. IEEE Computer Society, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. O. Messina and J. Louradour. Segmentation-free handwritten chinese text recognition with LSTM-RNN. In ICDAR, pages 171--175. IEEE Computer Society, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Peng, H. Wang, J. Li, and H. Gao. Set-based similarity search for time series. In SIGMOD, pages 2039--2052. ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. W. Cottrell. A dual-stage attention-based recurrent neural network for time series prediction. In IJCAI, pages 2627--2633. ijcai.org, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  26. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB J., 10(4):334--350, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Sanfeliu and K. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE SMC, 13(3):353--362, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  28. S. Song, Y. Gao, C. Wang, X. Zhu, J. Wang, and P. S. Yu. Matching heterogeneous events with patterns. IEEE TKDE, 29(8):1695--1708, 2017.Google ScholarGoogle Scholar
  29. N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using lstms. In ICML, pages 843--852, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In SIGKDD, pages 1225--1234. ACM, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Wang, S. Song, X. Zhu, X. Lin, and J. Sun. Efficient recovery of missing events. IEEE TKDE, 28(11):2943--2957, 2016.Google ScholarGoogle Scholar
  32. X. Zhu, S. Song, X. Lian, J. Wang, and L. Zou. Matching heterogeneous event data. In SIGMOD, pages 1211--1222. ACM, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. X. Zhu, S. Song, J. Wang, P. S. Yu, and J. Sun. Matching heterogeneous events with patterns. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pages 376--387, 2014.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Representing Temporal Attributes for Schema Matching

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
      August 2020
      3664 pages
      ISBN:9781450379984
      DOI:10.1145/3394486

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader