2004 | OriginalPaper | Buchkapitel
Discovering Aspects of Web Pages from Their Referential Contexts in the Web
verfasst von : Koji Zettsu, Yutaka Kidawara, Katsumi Tanaka
Erschienen in: Database Systems for Advanced Applications
Verlag: Springer Berlin Heidelberg
Enthalten in: Professional Book Archive
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
There are an enormous number of Web pages of unknown authorship, and even though Web search engines precisely evaluate the relevancy of Web page contents, a user cannot be sure whether a search result shows credible information. Considering that a Web page is referred to by other pages in various contexts through links, these contexts indicate the reputation of the page. For example, some pages may refer to a company’s page as “an excellent local company” and still other pages may refer to it as “a member of a certain research project”, while the company’s page itself might contain only product and service information. Such references are called “aspects” of the Web page, as distinguished from the content of the page. In this paper, we propose an approach for discovering aspects for characterizing Web pages based on their contexts. We define criteria for selecting “aspectual” Web content based on (1) its strength of association with the page based on the logical structure of the Web (i.e. Web document structure and link structure), (2) its novelty of content compared to the page and (3) its typicality among multiple contexts. We evaluate how these criteria affect aspect discovery results. We also explain the grouping of Web pages based on aspect similarity. This helps us to find Web pages that are referred to in the same way even though their content is different.