skip to main content
10.1145/1390334.1390354acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

A boosting algorithm for learning bipartite ranking functions with partially labeled data

Published:20 July 2008Publication History

ABSTRACT

This paper presents a boosting based algorithm for learning a bipartite ranking function (BRF) with partially labeled data. Until now different attempts had been made to build a BRF in a transductive setting, in which the test points are given to the methods in advance as unlabeled data. The proposed approach is a semi-supervised inductive ranking algorithm which, as opposed to transductive algorithms, is able to infer an ordering on new examples that were not used for its training. We evaluate our approach using the TREC-9 Ohsumed and the Reuters-21578 data collections, comparing against two semi-supervised classification algorithms for ROCArea (AUC), uninterpolated average precision (AUP), mean precision@50 (TP) and Precision-Recall (PR) curves. In the most interesting cases where there are an unbalanced number of irrelevant examples over relevant ones, we show our method to produce statistically significant improvements with respect to these ranking measures.

References

  1. S. Agarwal. Ranking on Graph Data. In Proceedings of the 23rd International Conference on Machine Learning, pages 25--32, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M.-R. Amini and P. Gallinari. The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In Proceedings of the 25th annual international ACM SIGIR, pages 105--112, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M.-R. Amini and P. Gallinari. Semi-Supervised Learning with Explicit Misclassification Modeling. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 555--560, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. L. Bartlett, M. I. Jordan and Jon D. McAuliffe. Large Margin Classifiers: convex Loss, Low Noise and Convergence Rates. In Advances in Neural Information Processing Systems 16, pages 1173--1180, 2004.Google ScholarGoogle Scholar
  5. A. Blum and T. Mitchell. Combining labeled and unlabeled data with Co-training. In Proceedings of the Workshop on Computational Learning Theory, pages 92--100, Wisconsin, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A.P. Bradley. The use of the Area under the ROC curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30:1145--1159, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Caruana and A. Niculescu-Mizil. Data mining in metric space: an empirical analysis of supervised learning performance criteria. In Proceedings of the 10th ACM SIGKDD, pages 69--78, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. Chapelle, B. Schölkopf and A. Zien. Semi-Supervised Learning, MIT Press, Cambridge, MA, 2006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Cortes and M. Mohri. AUC optimization vs. error rate minimization. In Advances in Neural Information Processing Systems 16, pages 313--320, 2004.Google ScholarGoogle Scholar
  10. Y. Freund, R. Iyer, R.E. Schapire and Y. Singer. An efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research, 4:933--969, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Gaussier and C. Goutte. Learning with Partially Labelled Data - with Confidence. In ICML'05 Workshop on Learning from Partially Classified Training Data (ICML'05-LPCT), pages 29--36, 2005.Google ScholarGoogle Scholar
  12. W. Hersh, C. Buckley, T. J. Leone and David Hickam. OHSUMED: an interactive Retrieval Evaluation and new Large Test Collection for Research. In Proceedings of the 17th annual international ACM SIGIR, pages 192--201, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Joachims. Transductive Inference for Text Classification using Support Vector Machines. In Proceedings of 16th International Conference on Machine Learning, pages 200--209, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E.L. Lehmann. Nonparametric Statistical Methods Based on Ranks. McGraw-Hill, New York, 1975.Google ScholarGoogle Scholar
  15. D. D. Lewis. Reuters-21578, distribution 1.0 http://www.daviddlewis.com/resources/testcollections/reuters21578/. January 1997.Google ScholarGoogle Scholar
  16. K. Nigam, A.K. McCallum, S. Thrun and Tom Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3):127--163, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Robertson and D.A. Hull. The TREC-9 Filtering Track Final Report. In Proceedings of the 9th Text REtrieval Conference (TREC-9). pages 25--40, 2001.Google ScholarGoogle Scholar
  18. I. Soboroff and S. Robertson. Building a Filtering Test Collection for TREC 2002. In Proceedings of the 26th annual international ACM SIGIR. pages 243--250, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. van Rijsbergen, Information Retrieval, Butterworths, London, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J.-N. Vittaut, M.-R. Amini and P. Gallinari, Learning Classification with Both Labeled and Unlabeled Data. In Proceedings of the 13th European Conference on Machine Learning. pages 468--476, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C.C. Vogt and G.W. Cottrell, Fusion Via a Linear Combination of Scores. Information Retrieval, 1(3):151--173, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Weston, R. Kuang, C. Leslie and W.S. Noble. Protein ranking by semi-supervised network propagation. BMC Bioinformatics, special issue, 2006.Google ScholarGoogle Scholar
  23. D. Zhou, J. Weston, A. Gretton, O. Bousquet and B. Schölkopf. Ranking on Data Manifolds. In Advances in Neural Information Processing Systems 16, pages 169--176, 2004.Google ScholarGoogle Scholar
  24. D. Zhou , C.J.C. Burges and T. Tao. Transductive link spam detection. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web. pages 21--28, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A boosting algorithm for learning bipartite ranking functions with partially labeled data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
        July 2008
        934 pages
        ISBN:9781605581644
        DOI:10.1145/1390334

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 20 July 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate792of3,983submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader