Article

A study of parameter tuning for term frequency normalization

Authors:
Ben HE

University of Glasgow, Glasgow, UK

University of Glasgow, Glasgow, UK
View Profile

,
Iadh Ounis

University of Glasgow, Glasgow, UK

University of Glasgow, Glasgow, UK
View Profile

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge managementNovember 2003Pages 10–16https://doi.org/10.1145/956863.956867

Published:03 November 2003Publication History

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

Pages 10–16

ABSTRACT

Most current term frequency normalization approaches for information retrieval involve the use of parameters. The tuning of these parameters has an important impact on the overall performance of the information retrieval system. Indeed, a small variation in the involved parameter(s) could lead to an important variation in the precision/recall values. Most current tuning approaches are dependent on the document collections. As a consequence, the effective parameter value cannot be obtained for a given new collection without extensive training data. In this paper, we propose a novel and robust method for the tuning of term frequency normalization parameter(s), by measuring the normalization effect on the within document frequency of the query terms. As an illustration, we apply our method on Amati \& Van Rijsbergen's so-called normalization 2. The experiments for the ad-hoc TREC-6,7,8 tasks and TREC-8,9,10 Web tracks show that the new method is independent of the collections and able to provide reliable and good performance.

References

G. Amati. Probabilistic Models for Information Retrieval based on Divergence from Randomness. PhD thesis, Department of Computing Science, University of Glasgow, 2003.Google Scholar
G. Amati and C. J. V. Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. InACM Transactions on Information Systems (TOIS) , volume 20(4), pages 357--389, October 2002. Google ScholarDigital Library
G. Amati and C. J. V. Rijsbergen. Term frequency normalization via pareto distributions. InAdvances in Information Retrieval, 24th BCS-IRSG European Colloquium on IR Research Glasgow, UK, March 25-27, 2002 Proceedings., volume 2291 of Lecture Notes in Computer Science, pages 183--192. Springer, 2002. Google ScholarDigital Library
A. Chowdhury, M. C. McCabe, D. Grossman, and O. Frieder. Document normalization revisited. In Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 381--382, 2002. Google ScholarDigital Library
S. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 232--241, 1994. Google ScholarDigital Library
S. Robertson, S. Walker, M. M. Beaulieu, M. Gatford, and A. Payne. Okapi at trec-4. In NIST Special Publication 500-236: The Fourth Text REtrieval Conference (TREC-4), pages 73--96, 1995.Google Scholar
G. Salton, A. Wong, and C. Yang. A vector space model for information retrieval. Journal of American Society for Information Retrieval, 18(11):613--620, November 1975. Google ScholarDigital Library
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pages 21--29, 1996. Google ScholarDigital Library
K. Sparck-Jones, S. Walker, and S. E. Robertson. A probabilistic model of information retrieval: Development and comparative experiments. Information Processing and Management, 36(2000):779--840, 2000. Google ScholarDigital Library
C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to ad hoc information retrieval. InProceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 334--342, 2001. Google ScholarDigital Library

Index Terms

A study of parameter tuning for term frequency normalization
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

Lower-bounding term frequency normalization
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

In this paper, we reveal a common deficiency of the current retrieval models: the component of term frequency (TF) normalization by document length is not lower-bounded properly; as a result, very long documents tend to be overly penalized. In order to ...
Read More
On setting the hyper-parameters of term frequency normalization for information retrieval

The setting of the term frequency normalization hyper-parameter suffers from the query dependence and collection dependence problems, which remarkably hurt the robustness of the retrieval performance. Our study in this article investigates three term ...
Read More
Term frequency normalisation tuning for BM25 and DFR models
ECIR'05: Proceedings of the 27th European conference on Advances in Information Retrieval Research

The term frequency normalisation parameter tuning is a crucial issue in information retrieval (IR), which has an important impact on the retrieval performance. The classical pivoted normalisation approach suffers from the collection-dependence problem. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management
November 2003
592 pages
ISBN:1581137230
DOI:10.1145/956863
General Chair:
Donald Kraft
Louisiana State University
,
Program Chairs:
Ophir Frieder
Illinois Institute of Technology
,
Joachim Hammer
University of Florida
,
Sajda Qureshi
University of Nebraska, Omaha
,
Len Seligman
The MITRE Corporation
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
document length
information retrieval
parameter tuning
term frequency normalization
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 968
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A study of parameter tuning for term frequency normalization

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lower-bounding term frequency normalization

On setting the hyper-parameters of term frequency normalization for information retrieval

Term frequency normalisation tuning for BM25 and DFR models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A study of parameter tuning for term frequency normalization

CIKM '03: Proceedings of the twelfth international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lower-bounding term frequency normalization

On setting the hyper-parameters of term frequency normalization for information retrieval

Term frequency normalisation tuning for BM25 and DFR models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media