research-article

A boosting algorithm for learning bipartite ranking functions with partially labeled data

Authors:
Massih Reza Amini

Université Pierre et Marie Curie, Paris, France

Université Pierre et Marie Curie, Paris, France
View Profile

,
Tuong Vinh Truong

Université Pierre et Marie Curie, Paris, France

Université Pierre et Marie Curie, Paris, France
View Profile

,
Cyril Goutte

National Research Council Canada, Gatineau, Canada

National Research Council Canada, Gatineau, Canada
View Profile

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrievalJuly 2008Pages 99–106https://doi.org/10.1145/1390334.1390354

Published:20 July 2008Publication History

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Pages 99–106

ABSTRACT

This paper presents a boosting based algorithm for learning a bipartite ranking function (BRF) with partially labeled data. Until now different attempts had been made to build a BRF in a transductive setting, in which the test points are given to the methods in advance as unlabeled data. The proposed approach is a semi-supervised inductive ranking algorithm which, as opposed to transductive algorithms, is able to infer an ordering on new examples that were not used for its training. We evaluate our approach using the TREC-9 Ohsumed and the Reuters-21578 data collections, comparing against two semi-supervised classification algorithms for ROCArea (AUC), uninterpolated average precision (AUP), mean precision@50 (TP) and Precision-Recall (PR) curves. In the most interesting cases where there are an unbalanced number of irrelevant examples over relevant ones, we show our method to produce statistically significant improvements with respect to these ranking measures.

References

S. Agarwal. Ranking on Graph Data. In Proceedings of the 23rd International Conference on Machine Learning, pages 25--32, 2006. Google ScholarDigital Library
M.-R. Amini and P. Gallinari. The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In Proceedings of the 25th annual international ACM SIGIR, pages 105--112, 2002. Google ScholarDigital Library
M.-R. Amini and P. Gallinari. Semi-Supervised Learning with Explicit Misclassification Modeling. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 555--560, 2003. Google ScholarDigital Library
P. L. Bartlett, M. I. Jordan and Jon D. McAuliffe. Large Margin Classifiers: convex Loss, Low Noise and Convergence Rates. In Advances in Neural Information Processing Systems 16, pages 1173--1180, 2004.Google Scholar
A. Blum and T. Mitchell. Combining labeled and unlabeled data with Co-training. In Proceedings of the Workshop on Computational Learning Theory, pages 92--100, Wisconsin, USA, 1998. Google ScholarDigital Library
A.P. Bradley. The use of the Area under the ROC curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30:1145--1159, 1997. Google ScholarDigital Library
R. Caruana and A. Niculescu-Mizil. Data mining in metric space: an empirical analysis of supervised learning performance criteria. In Proceedings of the 10th ACM SIGKDD, pages 69--78, 2004. Google ScholarDigital Library
O. Chapelle, B. Schölkopf and A. Zien. Semi-Supervised Learning, MIT Press, Cambridge, MA, 2006 Google ScholarDigital Library
C. Cortes and M. Mohri. AUC optimization vs. error rate minimization. In Advances in Neural Information Processing Systems 16, pages 313--320, 2004.Google Scholar
Y. Freund, R. Iyer, R.E. Schapire and Y. Singer. An efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, 4:933--969, 2004. Google ScholarDigital Library
E. Gaussier and C. Goutte. Learning with Partially Labelled Data - with Confidence. In ICML'05 Workshop on Learning from Partially Classified Training Data (ICML'05-LPCT), pages 29--36, 2005.Google Scholar
W. Hersh, C. Buckley, T. J. Leone and David Hickam. OHSUMED: an interactive Retrieval Evaluation and new Large Test Collection for Research. In Proceedings of the 17th annual international ACM SIGIR, pages 192--201, 1994. Google ScholarDigital Library
T. Joachims. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of 16th International Conference on Machine Learning, pages 200--209, 1999. Google ScholarDigital Library
E.L. Lehmann. Nonparametric Statistical Methods Based on Ranks. McGraw-Hill, New York, 1975.Google Scholar
D. D. Lewis. Reuters-21578, distribution 1.0 http://www.daviddlewis.com/resources/testcollections/reuters21578/. January 1997.Google Scholar
K. Nigam, A.K. McCallum, S. Thrun and Tom Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3):127--163, 2000. Google ScholarDigital Library
S. Robertson and D.A. Hull. The TREC-9 Filtering Track Final Report. In Proceedings of the 9th Text REtrieval Conference (TREC-9). pages 25--40, 2001.Google Scholar
I. Soboroff and S. Robertson. Building a Filtering Test Collection for TREC 2002. In Proceedings of the 26th annual international ACM SIGIR. pages 243--250, 2003. Google ScholarDigital Library
C. van Rijsbergen, Information Retrieval, Butterworths, London, 1979. Google ScholarDigital Library
J.-N. Vittaut, M.-R. Amini and P. Gallinari, Learning Classification with Both Labeled and Unlabeled Data. In Proceedings of the 13th European Conference on Machine Learning. pages 468--476, 2002. Google ScholarDigital Library
C.C. Vogt and G.W. Cottrell, Fusion Via a Linear Combination of Scores. Information Retrieval, 1(3):151--173, 1999. Google ScholarDigital Library
J. Weston, R. Kuang, C. Leslie and W.S. Noble. Protein ranking by semi-supervised network propagation. BMC Bioinformatics, special issue, 2006.Google Scholar
D. Zhou, J. Weston, A. Gretton, O. Bousquet and B. Schölkopf. Ranking on Data Manifolds. In Advances in Neural Information Processing Systems 16, pages 169--176, 2004.Google Scholar
D. Zhou , C.J.C. Burges and T. Tao. Transductive link spam detection. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web. pages 21--28, 2007. Google ScholarDigital Library

Index Terms

A boosting algorithm for learning bipartite ranking functions with partially labeled data
1. Information systems
  1. Information retrieval
2. Mathematics of computing
  1. Information theory

Recommendations

Learning to rank with partially-labeled data
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

Ranking algorithms, whose goal is to appropriately order a set of objects/documents, are an important component of information retrieval systems. Previous work on ranking algorithms has focused on cases where only labeled data is available for training (...
Read More
Boosting for multiclass semi-supervised learning

We present an algorithm for multiclass semi-supervised learning, which is learning from a limited amount of labeled data and plenty of unlabeled data. Existing semi-supervised learning algorithms use approaches such as one-versus-all to convert the ...
Read More
Non-linear dictionary learning with partially labeled data

While recent techniques for discriminative dictionary learning have demonstrated tremendous success in image analysis applications, their performance is often limited by the amount of labeled data available for training. Even though labeling images is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
July 2008
934 pages
ISBN:9781605581644
DOI:10.1145/1390334
General Chairs:
Tat-Seng Chua
National University of Singapore
,
Mun-Kew Leong
National Library Board, Singapore
,
Program Chairs:
Syung Hyon Myaeng
Information and Communications University, Korea
,
Douglas W. Oard
University of Maryland, College Park, USA
,
Fabrizio Sebastiani
Consiglio Nazionale delle Ricerche, Italy
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 July 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
boosting
information routing
learning to rank with partially labeled data
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate792of3,983submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 48
  Total Citations
  View Citations
- 805
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A boosting algorithm for learning bipartite ranking functions with partially labeled data

SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Learning to rank with partially-labeled data

Boosting for multiclass semi-supervised learning

Non-linear dictionary learning with partially labeled data