research-article

Representing Temporal Attributes for Schema Matching

Authors:
Yinan Mei

BNRist, Tsinghua University, Beijing, China

BNRist, Tsinghua University, Beijing, China
View Profile

,
Shaoxu Song

BNRist, Tsinghua University, Beijing, China

BNRist, Tsinghua University, Beijing, China
View Profile

,
Yunsu Lee

Samsung Research, Seoul, South Korea

Samsung Research, Seoul, South Korea
View Profile

,
Jungho Park

Samsung Research, Seoul, South Korea

Samsung Research, Seoul, South Korea
View Profile

,
Soo-Hyung Kim

Samsung Research, Seoul, South Korea

Samsung Research, Seoul, South Korea
View Profile

,
Sungmin Yi

Samsung Research, Seoul, South Korea

Samsung Research, Seoul, South Korea
View Profile

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data MiningAugust 2020Pages 709–719https://doi.org/10.1145/3394486.3403115

Published:20 August 2020Publication History

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Pages 709–719

ABSTRACT

Temporal data are prevalent, where one or several time attributes present. It is challenging to identify the temporal attributes from heterogeneous sources. The reason is that the same attribute could contain distinct values in different time spans, whereas different attributes may have highly similar timestamps and alike values. Existing studies on schema matching seldom explore the temporal information for matching attributes. In this paper, we argue to order the values in an attribute A by some time attribute T as a time series. To learn deep temporal features in the attribute pair (T, A), we devise an auto-encoder to embed the transitions of values in the time series into a vector. The temporal attribute matching (TAM) is thus to evaluate matching distance of two temporal attribute pairs by comparing their transition vectors. We show that computing the optimal matching distance is NP-hard, and present an approximation algorithm. Experiments on real datasets demonstrate the superiority of our proposal in matching temporal attributes compared to the generic schema matching approaches.

References

I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou. Patient subtyping via time-aware LS™ networks. In SIGKDD, pages 65--74. ACM, 2017.Google ScholarDigital Library
Z. Bellahsene, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, 2011.Google ScholarDigital Library
P. A. Bernstein, J. Madhavan, and E. Rahm. Generic schema matching, ten years later. PVLDB, 4(11):695--701, 2011.Google ScholarDigital Library
S. Bonner, I. Kureshi, J. Brennan, G. Theodoropoulos, A. S. McGough, and B. Obara. Exploring the semantic content of unsupervised graph embeddings: An empirical study. Data Science and Engineering, 4(3):269--289, 2019.Google ScholarCross Ref
K. Cheng and A. S. Krishnakumar. Automatic generation of functional vectors using the extended finite state machine model. ACM TODAES, 1(1):57--79, 1996.Google ScholarDigital Library
J. Chung, cC. Gü lcc ehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014.Google Scholar
H. Elmeleegy, M. Ouzzani, and A. K. Elmagarmid. Usage-based schema matching. In ICDE, pages 20--29, 2008.Google ScholarDigital Library
Y. Fan, X. Lu, D. Li, and Y. Liu. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In ICMI, pages 445--450, 2016.Google ScholarDigital Library
X. Gao, B. Xiao, D. Tao, and X. Li. A survey of graph edit distance. PAA, 13(1):113--129, 2010.Google Scholar
Y. Gao, S. Song, X. Zhu, J. Wang, X. Lian, and L. Zou. Matching heterogeneous event data. IEEE Trans. Knowl. Data Eng., 30(11):2157--2170, 2018.Google ScholarDigital Library
X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249--256, 2010.Google Scholar
A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In SIGKDD, pages 855--864. ACM, 2016.Google ScholarDigital Library
N. Gugulothu, V. TV, P. Malhotra, L. Vig, P. Agarwal, and G. Shroff. Predicting remaining useful life using time series embeddings based on recurrent neural networks. CoRR, abs/1709.01073, 2017.Google Scholar
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735--1780, 1997.Google ScholarDigital Library
A. Itai, M. Rodeh, and S. L. Tanimoto. Some matching problems for bipartite graphs. J. ACM, 25(4):517--525, 1978.Google ScholarDigital Library
A. R. Jaiswal, D. J. Miller, and P. Mitra. Uninterpreted schema matching with embedded value mapping under opaque column names and data values. IEEE TKDE, 22(2):291--304, 2010.Google Scholar
A. R. Jaiswal, D. J. Miller, and P. Mitra. Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields. ACM TODS, 38(1):2:1--2:34, 2013.Google ScholarDigital Library
J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD, pages 205--216. ACM, 2003.Google ScholarDigital Library
R. M. Karp. Reducibility among combinatorial problems. In 50 Years of Integer Programming 1958--2008 - From the Early Years to the State-of-the-Art, pages 219--241. Springer, 2010.Google Scholar
T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR. OpenReview.net, 2017.Google Scholar
H. W. Kuhn. The hungarian method for the assignment problem. In 50 Years of Integer Programming 1958--2008 - From the Early Years to the State-of-the-Art, pages 29--47. Springer, 2010.Google Scholar
S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128. IEEE Computer Society, 2002.Google ScholarDigital Library
R. O. Messina and J. Louradour. Segmentation-free handwritten chinese text recognition with LSTM-RNN. In ICDAR, pages 171--175. IEEE Computer Society, 2015.Google ScholarDigital Library
J. Peng, H. Wang, J. Li, and H. Gao. Set-based similarity search for time series. In SIGMOD, pages 2039--2052. ACM, 2016.Google ScholarDigital Library
Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. W. Cottrell. A dual-stage attention-based recurrent neural network for time series prediction. In IJCAI, pages 2627--2633. ijcai.org, 2017.Google ScholarCross Ref
E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB J., 10(4):334--350, 2001.Google ScholarDigital Library
A. Sanfeliu and K. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE SMC, 13(3):353--362, 1983.Google ScholarCross Ref
S. Song, Y. Gao, C. Wang, X. Zhu, J. Wang, and P. S. Yu. Matching heterogeneous events with patterns. IEEE TKDE, 29(8):1695--1708, 2017.Google Scholar
N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using lstms. In ICML, pages 843--852, 2015.Google ScholarDigital Library
D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In SIGKDD, pages 1225--1234. ACM, 2016.Google ScholarDigital Library
J. Wang, S. Song, X. Zhu, X. Lin, and J. Sun. Efficient recovery of missing events. IEEE TKDE, 28(11):2943--2957, 2016.Google Scholar
X. Zhu, S. Song, X. Lian, J. Wang, and L. Zou. Matching heterogeneous event data. In SIGMOD, pages 1211--1222. ACM, 2014.Google ScholarDigital Library
X. Zhu, S. Song, J. Wang, P. S. Yu, and J. Sun. Matching heterogeneous events with patterns. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pages 376--387, 2014.Google ScholarCross Ref

Index Terms

Representing Temporal Attributes for Schema Matching
1. Information systems
  1. Data management systems
    1. Information integration
      1. Mediators and data integration

Recommendations

An Effective Content-Based Schema Matching Algorithm
FITME '08: Proceedings of the 2008 International Seminar on Future Information Technology and Management Engineering

Identifying database corresponding attributes in schema matching plays a key role in data integration in heterogeneous databases. Most of current approaches mainly use schema information of attribute. Little research has attempted to fully explore the ...
Read More
Schema matching based on position of attribute in query statement

Attribute-level schema matching is a critical step in numerous database applications, such as DataSpaces, Ontology Merging and Schema Integration. There exist many researches on this topic, however, they all ignore evidences about the positions of ...
Read More
Query log streams based incremental schema matching
ACSW '19: Proceedings of the Australasian Computer Science Week Multiconference

This paper proposes an idea for a query log streams based incremental schema matching system. Existing techniques include utilizing the usage information of the attributes in matching schemas. This paper proposes a further advancement of the technique ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
August 2020
3664 pages
ISBN:9781450379984
DOI:10.1145/3394486
General Chairs:
Rajesh Gupta
UC San Diego, USA
,
Yan Liu
USC, USA
,
Program Chairs:
Mohak Shah
LG Electronics, USA
,
Suju Rajan
Linkedin, USA
,
Publications Chairs:
Jiliang Tang
Michigan State, USA
,
B. Aditya Prakash
Georgia Tech, USA
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 August 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data integration
schema matching
temporal data
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 469
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Representing Temporal Attributes for Schema Matching

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Effective Content-Based Schema Matching Algorithm

Schema matching based on position of attribute in query statement

Query log streams based incremental schema matching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Representing Temporal Attributes for Schema Matching

KDD '20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

An Effective Content-Based Schema Matching Algorithm

Schema matching based on position of attribute in query statement

Query log streams based incremental schema matching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media