ABSTRACT
In contrast with the current Web search methods that essentially do document-level ranking and retrieval, we are exploring a new paradigm to enable Web search at the object level. We collect Web information for objects relevant for a specific application domain and rank these objects in terms of their relevance and popularity to answer user queries. Traditional PageRank model is no longer valid for object popularity calculation because of the existence of heterogeneous relationships between objects. This paper introduces PopRank, a domain-independent object-level link analysis model to rank the objects within a specific domain. Specifically we assign a popularity propagation factor to each type of object relationship, study how different popularity propagation factors for these heterogeneous relationships could affect the popularity ranking, and propose efficient approaches to automatically decide these factors. Our experiments are done using 1 million CS papers, and the experimental results show that PopRank can achieve significantly better ranking results than naively applying PageRank on the object graph.
- Citeseer. Scientific Literature Digital Library. http://citeseer.ist.psu.edu.Google Scholar
- N. Ashish and C. Knoblock. Wrapper generation for semi-structured internet sources. In Proc. Workshop on Management of Semistructured Data, Tucson, 1997.Google ScholarDigital Library
- Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. Authority-based keyword queries in databases using objectrank. In Very Large Data Bases (VLDB), 2004.Google Scholar
- Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarDigital Library
- Deng Cai, Xiaofei He, Ji-Rong Wen, and Wei-Ying Ma. Block-level link analysis. In ACM SIGIR Conference (SIGIR), 2004. Google ScholarDigital Library
- Junghoo Cho and Sourashis Roy. Impact of web search engines on page popularity. In In Proceedings of the World-Wide Web Conference (WWW), 2004. Google ScholarDigital Library
- L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. Xrank: Ranked keyword search over xml documents. In ACM SIGMOD, 2003. Google ScholarDigital Library
- Bin He, Kevin Chen chuan Chang, and Jiawei Han. Discovering complex matchings across web query interfaces: a correlation mining approach. In Knowledge Discovery and Data Mining (KDD), 2004. Google ScholarDigital Library
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598), 1983.Google Scholar
- J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 1999. Google ScholarDigital Library
- Nickolas Kushmerick, Daniel S. Weld, and Robert B. Doorenbos. Wrapper induction for information extraction. In Intl. Joint Conference on Artificial Intelligence (IJCAI), pages 729--737, 1997.Google Scholar
- Bing Liu, Robert Grossman, and Yanhong Zhai. Mining data records in web pages. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2003. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Libraries, 1998.Google Scholar
- Ruihua Song, Haifeng Liu, Ji-Rong Wen, and Wei-Ying Ma. Learning block importance models for web pages. In World Wide Web conference (WWW), 2004. Google ScholarDigital Library
- Sheila Tejada, Craig A. Knoblock, and Steven Minton. Learning domain-independent string transformation weights for high accuracy object identification. In Knowledge Discovery and Data Mining (KDD), 2002. Google ScholarDigital Library
- Jiying Wang, Ji-Rong Wen, Frederick H. Lochovsky, and Wei-Ying Ma. Instance-based schema matching for web databases by domain-specific query probing. In Very Large Data Bases (VLDB), 2004. Google ScholarDigital Library
- Wensi Xi, Benyu Zhang, Yizhou Lu, Zheng Chen, Shuicheng Yan, Huajun Zeng, Wei-Ying Ma, and Edward A. Fox. Link fusion: A unified link analysis framework for multi-type interrelated data objects. In WWW, 2004. Google ScholarDigital Library
Index Terms
- Object-level ranking: bringing order to Web objects
Recommendations
Ranking target objects of navigational queries
WIDM '06: Proceedings of the 8th annual ACM international workshop on Web information and data managementWeb navigation plays an important role in exploring public interconnected data sources such as life science data. A navigational query in the life science graph produces a result graph which is a layered directed acyclic graph (DAG). Traversing the ...
Ranking by community relevance
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrievalA web page may be relevant to multiple topics; even when nominally on a single topic, the page may attract attention (and thus links) from multiple communities. Instead of indiscriminately summing the authority provided by all pages, we decompose a web ...
Time-weighted web authoritative ranking
AbstractWe investigate temporal factors in assessing the authoritativeness of web pages. We present three different metrics related to time: age, event, and trend. These metrics measure recentness, special event occurrence, and trend in revisions, ...
Comments