Overview

Authors:

ChengXiang Zhai ⁰

ChengXiang Zhai
1. Department of Computer Science Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, USA
  Department of Statistics Institute for Genomic Biology, University of Illinois at Urbana-Champaign, USA
View author publications

You can also search for this author in PubMed Google Scholar

Part of the book series: Synthesis Lectures on Human Language Technologies (SLHLT)

729 Accesses
11 Citations

This is a preview of subscription content, log in via an institution to check access.

Access this book

eBook USD 29.99

Price excludes VAT (USA)

Softcover Book USD 37.99

Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

Table of contents (8 chapters)

Front Matter

Pages i-xiii

Download chapter PDF
Introduction
- ChengXiang Zhai
Pages 1-10
Overview of Information Retrieval Models
- ChengXiang Zhai
Pages 11-26
Simple Query Likelihood Retrieval Model
- ChengXiang Zhai
Pages 27-41
Complex Query Likelihood Retrieval Model
- ChengXiang Zhai
Pages 43-51
Probabilistic Distance Retrieval Model
- ChengXiang Zhai
Pages 53-72
Language Models for Special Retrieval Tasks
- ChengXiang Zhai
Pages 73-86
Language Models for Special Retrieval Tasks
- ChengXiang Zhai
Pages 87-100
Conclusions
- ChengXiang Zhai
Pages 101-108
Back Matter

Pages 109-125

Download chapter PDF

About this book

As online information grows dramatically, search engines such as Google are playing a more and more important role in our lives. Critical to all search engines is the problem of designing an effective retrieval model that can rank documents accurately for a given query. This has been a central research problem in information retrieval for several decades. In the past ten years, a new generation of retrieval models, often referred to as statistical language models, has been successfully applied to solve many different information retrieval problems. Compared with the traditional models such as the vector space model, these new models have a more sound statistical foundation and can leverage statistical estimation to optimize retrieval parameters. They can also be more easily adapted to model non-traditional and complex retrieval problems. Empirically, they tend to achieve comparable or better performance than a traditional model with less effort on parameter tuning. This book systematically reviews the large body of literature on applying statistical language models to information retrieval with an emphasis on the underlying principles, empirically effective language models, and language models developed for non-traditional retrieval tasks. All the relevant literature has been synthesized to make it easy for a reader to digest the research progress achieved so far and see the frontier of research in this area. The book also offers practitioners an informative introduction to a set of practically useful language models that can effectively solve a variety of retrieval problems. No prior knowledge about information retrieval is required, but some basic knowledge about probability and statistics would be useful for fully digesting all the details. Table of Contents: Introduction / Overview of Information Retrieval Models / Simple Query Likelihood Retrieval Model / Complex Query Likelihood Model / Probabilistic Distance Retrieval Model / Language Models for Special Retrieval Tasks / Language Models for Latent Topic Analysis / Conclusions

Authors and Affiliations

Department of Computer Science Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, USA

ChengXiang Zhai
Department of Statistics Institute for Genomic Biology, University of Illinois at Urbana-Champaign, USA

ChengXiang Zhai

About the author

ChengXiang Zhai is a Professor of Computer Science and Willett Faculty Scholar at the University of Illinois at Urbana-Champaign, where he is also affiliated with the Graduate School of Library and Information Science, Institute for Genomic Biology, and Department of Statistics. He received a Ph.D. in Computer Science from Nanjing University in 1990, and a Ph.D. in Language and Information Technologies from Carnegie Mellon University in 2002. He worked at Clairvoyance Corp. as a Research Scientist and then Senior Research Scientist from 1997 to 2000. His research interests include information retrieval, text mining, natural language processing, machine learning, biomedical and health informatics, and intelligent education information systems. He has published over 200 research papers in major conferences and journals. He is an Associate Editor for Information Processing and Management and previously served as an Associate Editor of ACM Transactions on Information Systems, and on the editorial board of Information Retrieval Journal. He is a conference program co-chair of ACM CIKM 2004, NAACL HLT 2007, ACM SIGIR 2009, ECIR 2014, ICTIR 2015, and WWW 2015, and conference general co-chair for ACM CIKM 2016. He is an ACM Distinguished Scientist and a recipient of multiple awards, including the ACM SIGIR 2004 Best Paper Award, the ACM SIGIR 2014 Test of Time Paper Award, Alfred P. Sloan Research Fellowship, IBM Faculty Award, HP Innovation Research Program Award, Microsoft Beyond Search Research Award, and the Presidential Early Career Award for Scientists and Engineers (PECASE).

Bibliographic Information

Book Title: Statistical Language Models for Information Retrieval
Authors: ChengXiang Zhai
Series Title: Synthesis Lectures on Human Language Technologies
DOI: https://doi.org/10.1007/978-3-031-02130-5
Publisher: Springer Cham
eBook Packages: Synthesis Collection of Technology (R0), eBColl Synthesis Collection 2
Copyright Information: Springer Nature Switzerland AG 2009
Softcover ISBN: 978-3-031-01002-6Published: 05 December 2008
eBook ISBN: 978-3-031-02130-5Published: 31 May 2022
Series ISSN: 1947-4040
Series E-ISSN: 1947-4059
Edition Number: 1
Number of Pages: XII, 132
Topics: Artificial Intelligence, Natural Language Processing (NLP), Computational Linguistics

Publish with us

Policies and ethics

Statistical Language Models for Information Retrieval

Overview

Access this book

Other ways to access

Table of contents (8 chapters)

Front Matter

Back Matter

About this book

Authors and Affiliations

Department of Computer Science Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign, USA

Department of Statistics Institute for Genomic Biology, University of Illinois at Urbana-Champaign, USA

About the author

Bibliographic Information

Publish with us

Search

Navigation