Article

Web object retrieval

Authors:
Zaiqing Nie

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Yunxiao Ma

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Shuming Shi

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Ji-Rong Wen

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Wei-Ying Ma

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

WWW '07: Proceedings of the 16th international conference on World Wide WebMay 2007Pages 81–90https://doi.org/10.1145/1242572.1242584

Published:08 May 2007Publication History

WWW '07: Proceedings of the 16th international conference on World Wide Web

Pages 81–90

ABSTRACT

The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.

References

Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Publishers, 1999. Google ScholarDigital Library
Deng Cai, Xiaofei He, Ji-Rong Wen, and Wei-Ying Ma. Block-Level Link Analysis. In Proceedings of SIGIR, 2004. Google ScholarDigital Library
Deng Cai, Shipeng Yu, Ji-Rong Wen and Wei-Ying Ma. Block-based Web Search. In Proceedings of SIGIR, 2004. Google ScholarDigital Library
J. P. Callan. Passage-Level Evidence in Document Retrieval. In Proceedings of SIGIR, 1994. Google ScholarDigital Library
J.P. Callan. Distributed information retrieval. In Advances in Information Retrieval: Recent Research from the Center for Intelligent Information Retrieval, edited by W. Bruce Croft. Kluwer Academic Publisher, pp. 127--150, 2000.Google Scholar
Abdur Chowdhury, Mohammed Aljlayl, Eric Jensen, Steve Beitzel, David Grossman and Ophir Frieder. Linear Combinations Based on Document Structure and Varied Stemming for Arabic Retrieval. In The Eleventh Text REtrieval Conference (TREC 2002), 2003.Google Scholar
Charles L.A. Clarke. Controlling Overlap in Content-Oriented XML Retrieval. In Proceedings of the SIGIR, 2005. Google ScholarDigital Library
Nick Craswell, David Hawking and Stephen Roberson. Effective Site Finding using Link Anchor Information. In Proceedings of SIGIR, 2001. Google ScholarDigital Library
Nick Craswell, David Hawking and Trystan Upstill. TREC12 Web and Interactive Tracks at CSIRO. In The Twelfth Text Retrieval Conference(TREC 2003), 2004.Google Scholar
Ronald Fagin, Ravi Kumar, Kevin S. McCurley, Jasmine Novak, D. Sivakumar, John A. Tomlin and David P. Williamson. Searching the Workplace Web. In Proceedings of the Twelfth International World Wide Web Conference, 2003. Google ScholarDigital Library
Hui Fang, Tao Tao and ChengXiang Zhai. A Formal Study of Information Retrieval Heuristics. In Proceedings of SIGIR, 2004. Google ScholarDigital Library
Norbert Fuhr. Probabilistic Models in Information Retrieval. The computer Journal, Vol.35, No.3, pp. 243--255. Google ScholarDigital Library
Norbert Fuhr and Kai Großjohann. XIRQL: A Query Language for Information Retrieval in XML documents. In Proceedings of the SIGIR, 2001. Google ScholarDigital Library
L. Gravano and H. Garcia-Molina. Generalizing gloss to vector-space databases and broker hierarchies. In Proceeding of the International Conference on Very Large Data Bases (VLDB), 1995. Google ScholarDigital Library
Jiawei Han and Micheline Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmman Publishers, 2000. Google ScholarDigital Library
David Hull. Using Statistical Testing in the Evaluation of Retrieval Experiments. In Proceedings of the ACM SIGIR, 1993. Google ScholarDigital Library
Jaap Kamps, Maarten de Rijke and Borkur Sigurbjornsson. Length normalization in XML retrieval. In Proceedings of the SIGIR, 2004. Google ScholarDigital Library
M. Kaszkiel and J. Zobel. Passage Retrieval Revisited. In Proceedings of SIGIR, 1997. Google ScholarDigital Library
Mounia Lalmas. Dempster-Shafer's Theory of Evidence Applied to Structured Documents: Modeling Uncertainty. In Proceedings of SIGIR, 1997. Google ScholarDigital Library
Mounia Lalmas, Uniform representation of content and structure for structured document retrieval. Technical Report, Queen Mary and Westfield College, University of London, 2000.Google Scholar
K. Lerman, L. Getoor, S. Minton, and C. A. Knoblock. Using the structure of Web sites for automatic segmentation of tables. In ACM SIGMOD Conference (SIGMOD), 2004. Google ScholarDigital Library
Bing Liu, Robert Grossman, and Yanhong Zhai. Mining Data Records in Web Pages. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2003. Google ScholarDigital Library
M. Meng, K. Liu, C. Yu, W. Wu, and N. Rishe. Estimating the usefulness of search engines. In ICDE Conference, 1999. Google ScholarDigital Library
Amihai Motro and Igor Rakov. Estimating the quality of databases. In Proceedings of the 3rd International Conference on Flexible Query Answering (FQAS), Roskilde, Denmark, May 1998. Springer Verlag. Google ScholarDigital Library
Felix Naumann and Rolker Claudia. Assessment Methods for Information Quality Criteria. In Proceedings of the International Conference on Information Quality (IQ), Cambridge, MA, 2000.Google Scholar
Zaiqing Nie, Yuanzhi Zhang, Ji-Rong Wen and Wei-Ying Ma. Object-Level Ranking: Bringing Order to Web Objects. In Proceedings of the 14th international World Wide Web Conference (WWW), 2005. Google ScholarDigital Library
Zaiqing Nie, Ji-Rong Wen and Wei-Ying Ma. Object-level Vertical Search. To appear by the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007.Google Scholar
Paul Ogilvie and Jamie Callan. Combining Document Representations for known item search. In Proceedings of SIGIR, 2003. Google ScholarDigital Library
S. E. Robertson, S. Walker, S. Jones and M. M. Hancock-Beaulieu. Okapi at TREC-3. In The Third Text REtrieval Conference (TREC 3), 1994.Google Scholar
Stephen Robertson, Hugo Zaragoza, and Michael Taylor. Simple BM25 Extension to Multiple Weighted Fields. ACM CIKM, 2004. Google ScholarDigital Library
S. Tejada, C. A. Knoblock, and S. Minton. Learning domain-independent string transformation weights for high accuracy object identification. In Knowledge Discovery and Data Mining (KDD), 2002. Google ScholarDigital Library
J. Wang and F. H. Lochovsky. Data extraction and label assignment for Web databases. In World Wide Web conference (WWW), 2003. Google ScholarDigital Library
Thijs Westerveld, Wessel Kraaij and Djoerd Hiemstra. Retrieving Web Pages using Content, Links, URLs and Anchors. In The Tenth Text REtrieval Conference (TREC2001), 2001.Google Scholar
Ross Wilkinson. Effective Retrieval of Structured Documents. In Proceedings of SIGIR, 1994. Google ScholarDigital Library
J. Xu, and J. Callan. Effective retrieval with distributed collections. In Proceedings of SIGIR, 1998. Google ScholarDigital Library
Yiming Yang and Xin Liu. A re-examination of text categorization methods. In Proceedings of the ACM SIGIR, 1999. Google ScholarDigital Library
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma. 2D Conditional Random Fields for Web Information Extraction. In Proceedings of the 22nd International Conference on Machine Learning (ICML), 2005. Google ScholarDigital Library
Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma. Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006. Google ScholarDigital Library

Index Terms

Web object retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Language models for web object retrieval
WiCOM'09: Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing

Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. A paradigm is proposed to enable searching at the object level. However, this reliability assumption is no longer ...
Read More
Language Models for Web Object Retrieval
NISS '09: Proceedings of the 2009 International Conference on New Trends in Information and Service Science

Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. A paradigm is proposed to enable searching at the object level. However, this reliability assumption is no longer ...
Read More
The Study of Methods for Language Model Based Positive and Negative Relevance Feedback in Information Retrieval
ISISE '12: Proceedings of the 2012 Fourth International Symposium on Information Science and Engineering

Relevance feedback techniques are important to Information retrieval (IR), which can effectively improve the performance of IR. The feedback includes positive and negative relevance one. The most of the previous work using feedback have focused on ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '07: Proceedings of the 16th international conference on World Wide Web
May 2007
1382 pages
ISBN:9781595936547
DOI:10.1145/1242572
General Chairs:
Carey Williamson
University of Calgary, Canada
,
Mary Ellen Zurko
IBM, USA
,
Program Chairs:
Peter Patel-Schneider
Bell Labs Research, USA
,
Prashant Shenoy
University of Massachusetts at Amherst, USA
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 May 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
information extraction
information retrieval
language model
web objects
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 92
  Total Citations
  View Citations
- 1,315
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Web object retrieval

WWW '07: Proceedings of the 16th international conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Language models for web object retrieval

Language Models for Web Object Retrieval

The Study of Methods for Language Model Based Positive and Negative Relevance Feedback in Information Retrieval