ABSTRACT
Temporal data are prevalent, where one or several time attributes present. It is challenging to identify the temporal attributes from heterogeneous sources. The reason is that the same attribute could contain distinct values in different time spans, whereas different attributes may have highly similar timestamps and alike values. Existing studies on schema matching seldom explore the temporal information for matching attributes. In this paper, we argue to order the values in an attribute A by some time attribute T as a time series. To learn deep temporal features in the attribute pair (T, A), we devise an auto-encoder to embed the transitions of values in the time series into a vector. The temporal attribute matching (TAM) is thus to evaluate matching distance of two temporal attribute pairs by comparing their transition vectors. We show that computing the optimal matching distance is NP-hard, and present an approximation algorithm. Experiments on real datasets demonstrate the superiority of our proposal in matching temporal attributes compared to the generic schema matching approaches.
- I. M. Baytas, C. Xiao, X. Zhang, F. Wang, A. K. Jain, and J. Zhou. Patient subtyping via time-aware LS™ networks. In SIGKDD, pages 65--74. ACM, 2017.Google ScholarDigital Library
- Z. Bellahsene, A. Bonifati, and E. Rahm, editors. Schema Matching and Mapping. Data-Centric Systems and Applications. Springer, 2011.Google ScholarDigital Library
- P. A. Bernstein, J. Madhavan, and E. Rahm. Generic schema matching, ten years later. PVLDB, 4(11):695--701, 2011.Google ScholarDigital Library
- S. Bonner, I. Kureshi, J. Brennan, G. Theodoropoulos, A. S. McGough, and B. Obara. Exploring the semantic content of unsupervised graph embeddings: An empirical study. Data Science and Engineering, 4(3):269--289, 2019.Google ScholarCross Ref
- K. Cheng and A. S. Krishnakumar. Automatic generation of functional vectors using the extended finite state machine model. ACM TODAES, 1(1):57--79, 1996.Google ScholarDigital Library
- J. Chung, cC. Gü lcc ehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014.Google Scholar
- H. Elmeleegy, M. Ouzzani, and A. K. Elmagarmid. Usage-based schema matching. In ICDE, pages 20--29, 2008.Google ScholarDigital Library
- Y. Fan, X. Lu, D. Li, and Y. Liu. Video-based emotion recognition using CNN-RNN and C3D hybrid networks. In ICMI, pages 445--450, 2016.Google ScholarDigital Library
- X. Gao, B. Xiao, D. Tao, and X. Li. A survey of graph edit distance. PAA, 13(1):113--129, 2010.Google Scholar
- Y. Gao, S. Song, X. Zhu, J. Wang, X. Lian, and L. Zou. Matching heterogeneous event data. IEEE Trans. Knowl. Data Eng., 30(11):2157--2170, 2018.Google ScholarDigital Library
- X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, pages 249--256, 2010.Google Scholar
- A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In SIGKDD, pages 855--864. ACM, 2016.Google ScholarDigital Library
- N. Gugulothu, V. TV, P. Malhotra, L. Vig, P. Agarwal, and G. Shroff. Predicting remaining useful life using time series embeddings based on recurrent neural networks. CoRR, abs/1709.01073, 2017.Google Scholar
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735--1780, 1997.Google ScholarDigital Library
- A. Itai, M. Rodeh, and S. L. Tanimoto. Some matching problems for bipartite graphs. J. ACM, 25(4):517--525, 1978.Google ScholarDigital Library
- A. R. Jaiswal, D. J. Miller, and P. Mitra. Uninterpreted schema matching with embedded value mapping under opaque column names and data values. IEEE TKDE, 22(2):291--304, 2010.Google Scholar
- A. R. Jaiswal, D. J. Miller, and P. Mitra. Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields. ACM TODS, 38(1):2:1--2:34, 2013.Google ScholarDigital Library
- J. Kang and J. F. Naughton. On schema matching with opaque column names and data values. In SIGMOD, pages 205--216. ACM, 2003.Google ScholarDigital Library
- R. M. Karp. Reducibility among combinatorial problems. In 50 Years of Integer Programming 1958--2008 - From the Early Years to the State-of-the-Art, pages 219--241. Springer, 2010.Google Scholar
- T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR. OpenReview.net, 2017.Google Scholar
- H. W. Kuhn. The hungarian method for the assignment problem. In 50 Years of Integer Programming 1958--2008 - From the Early Years to the State-of-the-Art, pages 29--47. Springer, 2010.Google Scholar
- S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, pages 117--128. IEEE Computer Society, 2002.Google ScholarDigital Library
- R. O. Messina and J. Louradour. Segmentation-free handwritten chinese text recognition with LSTM-RNN. In ICDAR, pages 171--175. IEEE Computer Society, 2015.Google ScholarDigital Library
- J. Peng, H. Wang, J. Li, and H. Gao. Set-based similarity search for time series. In SIGMOD, pages 2039--2052. ACM, 2016.Google ScholarDigital Library
- Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. W. Cottrell. A dual-stage attention-based recurrent neural network for time series prediction. In IJCAI, pages 2627--2633. ijcai.org, 2017.Google ScholarCross Ref
- E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB J., 10(4):334--350, 2001.Google ScholarDigital Library
- A. Sanfeliu and K. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE SMC, 13(3):353--362, 1983.Google ScholarCross Ref
- S. Song, Y. Gao, C. Wang, X. Zhu, J. Wang, and P. S. Yu. Matching heterogeneous events with patterns. IEEE TKDE, 29(8):1695--1708, 2017.Google Scholar
- N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using lstms. In ICML, pages 843--852, 2015.Google ScholarDigital Library
- D. Wang, P. Cui, and W. Zhu. Structural deep network embedding. In SIGKDD, pages 1225--1234. ACM, 2016.Google ScholarDigital Library
- J. Wang, S. Song, X. Zhu, X. Lin, and J. Sun. Efficient recovery of missing events. IEEE TKDE, 28(11):2943--2957, 2016.Google Scholar
- X. Zhu, S. Song, X. Lian, J. Wang, and L. Zou. Matching heterogeneous event data. In SIGMOD, pages 1211--1222. ACM, 2014.Google ScholarDigital Library
- X. Zhu, S. Song, J. Wang, P. S. Yu, and J. Sun. Matching heterogeneous events with patterns. In IEEE 30th International Conference on Data Engineering, Chicago, ICDE 2014, IL, USA, March 31 - April 4, 2014, pages 376--387, 2014.Google ScholarCross Ref
Index Terms
- Representing Temporal Attributes for Schema Matching
Recommendations
An Effective Content-Based Schema Matching Algorithm
FITME '08: Proceedings of the 2008 International Seminar on Future Information Technology and Management EngineeringIdentifying database corresponding attributes in schema matching plays a key role in data integration in heterogeneous databases. Most of current approaches mainly use schema information of attribute. Little research has attempted to fully explore the ...
Schema matching based on position of attribute in query statement
Attribute-level schema matching is a critical step in numerous database applications, such as DataSpaces, Ontology Merging and Schema Integration. There exist many researches on this topic, however, they all ignore evidences about the positions of ...
Query log streams based incremental schema matching
ACSW '19: Proceedings of the Australasian Computer Science Week MulticonferenceThis paper proposes an idea for a query log streams based incremental schema matching system. Existing techniques include utilizing the usage information of the attributes in matching schemas. This paper proposes a further advancement of the technique ...
Comments