skip to main content
10.1145/1871437.1871692acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Mapping web pages to database records via link paths

Authors Info & Claims
Published:26 October 2010Publication History

ABSTRACT

In this paper we propose a new knowledge management task which aims to map Web pages to their corresponding records in a structured database. For example, the DBLP database contains records for many computer scientists, and most of these persons have public Web pages; if we can map the database record with the appropriate Web page then the new information could be used to further describe the person's database record. To accomplish this goal we employ link paths which contain anchor texts from multiple paths through the Web ending at the Web page in question. We hypothesize that the information from these link paths can be used to generate an accurate Web page to database record mapping. Experiments on two large, real world data sets, DBLP and IMDB for the structured data and computer science faculty members' Web pages and official movie homepages for the Web page data, show that our method does provide an accurate mapping. Finally, we conclude by issuing a call for further research on this promising new task.

References

  1. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Craswell and D. Hawking. Overview of the trec-2002 web track. In TREC '02: In Proceedings of the eleventh text retrieval conference TREC-2002, pages 86--95. NIST, 2003.Google ScholarGoogle Scholar
  3. N. Craswell, D. Hawking, and S. Robertson. Effective site finding using link anchor information. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 250--257, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. A. McBryan. Genvl and wwww: tools for taming the web. In WWW1: Proceedings of the 15th international conference on World Wide Web, 1994.Google ScholarGoogle Scholar
  5. W. Xi, E. A. Fox, R. P. Tan, and J. Shu. Machine learning approach for homepage finding task. In SPIRE 2002: Proceedings of the 9th International Symposium on String Processing and Information Retrieval, pages 145--159, London, UK, 2002. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Yen. Finding the k shortest loopless paths in a network. Management Science, 17(1):712--716, 1971.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Mapping web pages to database records via link paths

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
        October 2010
        2036 pages
        ISBN:9781450300995
        DOI:10.1145/1871437

        Copyright © 2010 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 26 October 2010

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader