skip to main content
10.1145/502585.502612acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Induction of integrated view for XML data with heterogeneous DTDs

Published:05 October 2001Publication History

ABSTRACT

This paper proposes a novel approach to integrating heterogeneous XML DTDs. With this approach, an information agent can be easily extended to integrate heterogeneous XML-based contents and perform federated search. Based on a tree grammar inference technique, this approach derives an integrated view of XML DTDs in an information integration framework. The derivation takes advantages of naming and structural similarities among DTDs in similar domains. The complete approach consists of three main steps. (1) DTD clustering clusters DTDs in similar domains into classes. (2) Schema learning applies a tree grammar inference technique to generate a set of tree grammar rules from the DTDs in a class from the previous step. (3) Minimization optimizes the rules generated in the previous step and transforms them into an integrated view. We have implemented the proposed approach into a system called DEEP and tested the system on artificial and real domains. The experimental results reveal that this system can effectively and efficiently integrate radically different DTDs.

References

  1. 1.T. Bray, J. Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language(XML) 1.0, 1998. W3C Recommendation.Google ScholarGoogle Scholar
  2. 2.P. Buneman, S. Davidson, G. Hillebrand, and D. Suciu. A query language and optimization techniques for unstructured data. In Proceedings of SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. The TSIMMIS project: Integration of heterogeneous information sources. In Proceedings of the Information Processing Society of Japan Conference, pages 7-18, Tokyo, Japan, October 1995.Google ScholarGoogle Scholar
  4. 4.A. Deutsch, M. Fernandez, D. Florescu, A. Levy, and D. Suciu. XML-QL: a query language for XML, 1998.Google ScholarGoogle Scholar
  5. 5.A. Doan, P. Domingos, and A. Levy. Learning source descriptions for data integration. In 3rd International Workshop on the Web and Databases, 2000.Google ScholarGoogle Scholar
  6. 6.O. Duschka and M. Genesereth. Query planning in infomaster. In Proceedings of the ACM Symposium on Applied Computing, San Jose, CA, February 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.O. Etzioni and D. Weld. A softbot-based interface to the Internet. In C. ACM, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.M. Fernandez, J. Simeon, and P. Wadler. XML query languages:experiences and examplars, 1999. W3C Draft manuscript.Google ScholarGoogle Scholar
  9. 9.H. Fukuda and K. Kamata. Inference of tree automata from sample set of trees. International Journal of Computer and Information Sciences, 13:177-196, 1984.Google ScholarGoogle ScholarCross RefCross Ref
  10. 10.M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. XTRACT: a system for extracting document type descriptors from xml documents. In Proceedings of the ACM SIGMOD, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.T. Kirk, A. Y. Levy, Y. Sagiv, and D. Srivsstava. The information manifold. In Proceedings of the AAAI Spring Symposium on Information Gathering in Distributed Heterogeneous Environments, Stanford, California, March 1995.Google ScholarGoogle Scholar
  12. 12.C. A. Knoblock, Y. Arens, and C. N. Hsu. Cooperating agents for information retrieval. In Proceedings of International Conference on Cooperative Information Systems, 1994.Google ScholarGoogle ScholarCross RefCross Ref
  13. 13.C. Kwok and D. Weld. Planning to gather information. In Proceedings on 13th National Conference of AI, 1996.Google ScholarGoogle Scholar
  14. 14.S. Y. Lu. A tree matching algorithm based on node splitting and merging. In IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 6, pages 249-256, 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. 15.S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. In Proceedings of the ACM SIGMOD, pages 295-306, Seattle, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16.E. Rasmussen. Clustering Algorithms, chapter 16. Prentice Hall, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Induction of integrated view for XML data with heterogeneous DTDs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              CIKM '01: Proceedings of the tenth international conference on Information and knowledge management
              October 2001
              616 pages
              ISBN:1581134363
              DOI:10.1145/502585

              Copyright © 2001 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 5 October 2001

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              Overall Acceptance Rate1,861of8,427submissions,22%

              Upcoming Conference

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader