skip to main content
research-article

A Critical Reassessment of the Saerens-Latinne-Decaestecker Algorithm for Posterior Probability Adjustment

Published:31 December 2020Publication History
Skip Abstract Section

Abstract

We critically re-examine the Saerens-Latinne-Decaestecker (SLD) algorithm, a well-known method for estimating class prior probabilities (“priors”) and adjusting posterior probabilities (“posteriors”) in scenarios characterized by distribution shift, i.e., difference in the distribution of the priors between the training and the unlabelled documents. Given a machine learned classifier and a set of unlabelled documents for which the classifier has returned posterior probabilities and estimates of the prior probabilities, SLD updates them both in an iterative, mutually recursive way, with the goal of making both more accurate; this is of key importance in downstream tasks such as single-label multiclass classification and cost-sensitive text classification. Since its publication, SLD has become the standard algorithm for improving the quality of the posteriors in the presence of distribution shift, and SLD is still considered a top contender when we need to estimate the priors (a task that has become known as “quantification”). However, its real effectiveness in improving the quality of the posteriors has been questioned. We here present the results of systematic experiments conducted on a large, publicly available dataset, across multiple amounts of distribution shift and multiple learners. Our experiments show that SLD improves the quality of the posterior probabilities and of the estimates of the prior probabilities, but only when the number of classes in the classification scheme is very small and the classifier is calibrated. As the number of classes grows, or as we use non-calibrated classifiers, SLD converges more slowly (and often does not converge at all), performance degrades rapidly, and the impact of SLD on the quality of the prior estimates and of the posteriors becomes negative rather than positive.

References

  1. Tuomo Alasalmi, Jaakko Suutala, Heli Koskimäki, and Juha Röning. 2020. Better classifier calibration for small data sets. ACM Trans. Knowl. Discov. Data 14, 3 (2020), 1--19. DOI:https://doi.org/10.1145/3385656Google ScholarGoogle Scholar
  2. Antonio Bella, Cèsar Ferri, José Hernández-Orallo, and María José Ramírez-Quintana. 2014. Aggregative quantification for regression. Data Mining Knowl. Discov. 28, 2 (2014), 475--518. DOI:https://doi.org/10.1007/s10618-013-0308-zGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  3. Artem Bequé, Kristof Coussement, Ross W. Gayler, and Stefan Lessmann. 2017. Approaches for credit scorecard calibration: An empirical analysis. Knowl.-based Syst. 134 (2017), 213--227. DOI:https://doi.org/10.1016/j.knosys.2017.07.034Google ScholarGoogle Scholar
  4. Glenn W. Brier. 1950. Verification of forecasts expressed in terms of probability. Month. Weath. Rev. 78, 1 (1950), 1--3. DOI:https://doi.org/10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2Google ScholarGoogle ScholarCross RefCross Ref
  5. Gordon V. Cormack. 2008. Email spam filtering: A systematic review. Found. Trends Inf. Retr. 1, 4 (2008), 335--455. DOI:https://doi.org/10.1561/9781601981479Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kristof Coussement and Wouter Buckinx. 2011. A probability-mapping algorithm for calibrating the posterior probabilities: A direct marketing application. Eur. J. Op. Res. 214, 3 (2011), 732--738. DOI:https://doi.org/10.1016/j.ejor.2011.05.027Google ScholarGoogle ScholarCross RefCross Ref
  7. Morris H. DeGroot and Stephen E. Fienberg. 1983. The comparison and evaluation of forecasters. The Statistician 32, 1/2 (1983), 12--22. DOI:https://doi.org/10.2307/2987588Google ScholarGoogle ScholarCross RefCross Ref
  8. Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1 (1977), 1--38.Google ScholarGoogle ScholarCross RefCross Ref
  9. Pedro M. Domingos and Michael J. Pazzani. 1996. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the 13th International Conference on Machine Learning (ICML’96). 105--112.Google ScholarGoogle Scholar
  10. Andrea Esuli, Alejandro Moreo, and Fabrizio Sebastiani. 2018. A recurrent neural network for sentiment quantification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM’18). 1775--1778. DOI:https://doi.org/10.1145/3269206.3269287Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Andrea Esuli, Alejandro Moreo, and Fabrizio Sebastiani. 2020. Cross-lingual sentiment quantification. IEEE Intell. Syst. 35, 3 (2020), 106--114. DOI:https://doi.org/10.1109/MIS.2020.2979203Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recog. Lett. 27 (2006), 861--874.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tom Fawcett and Peter Flach. 2005. A response to Webb and Ting’s On the application of ROC analysis to predict classification performance under varying class distributions.’’ Mach. Learn. 58, 1 (2005), 33--38. DOI:https://doi.org/10.1007/s10994-005-5256-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Afonso Fernandes Vaz, Rafael Izbicki, and Rafael Bassi Stern. 2019. Quantification under prior probability shift: The ratio estimator and its extensions. J. Mach. Learn. Res. 20 (2019), 79:1--79:33.Google ScholarGoogle Scholar
  15. Peter A. Flach. 2017. Classifier calibration. In Encyclopedia of Machine Learning (2nd ed.), Claude Sammut and Geoffrey I. Webb (Eds.). Springer, DE, 212--219.Google ScholarGoogle Scholar
  16. George Forman. 2008. Quantifying counts and costs via classification. Data Mining Knowl. Discov. 17, 2 (2008), 164--206. DOI:https://doi.org/10.1007/s10618-008-0097-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wei Gao and Fabrizio Sebastiani. 2016. From classification to quantification in tweet sentiment analysis. Soc. Netw. Anal. Mining 6, 19 (2016), 1--22. DOI:https://doi.org/10.1007/s13278-016-0327-zGoogle ScholarGoogle ScholarCross RefCross Ref
  18. Tilmann Gneiting and Adrian E. Raftery. 2007. Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102, 477 (2007), 359--378. DOI:https://doi.org/10.1198/016214506000001437Google ScholarGoogle ScholarCross RefCross Ref
  19. Pablo González, Alberto Castaño, Nitesh V. Chawla, and Juan José del Coz. 2017. A review on quantification learning. Comput. Surveys 50, 5 (2017), 74:1--74:40. DOI:https://doi.org/10.1145/3117807Google ScholarGoogle Scholar
  20. Moshe Koppel, Jonathan Schler, and Shlomo Argamon. 2009. Computational methods in authorship attribution. J. Amer. Soc. Inf. Sci. Technol. 60, 1 (2009), 9--26. DOI:https://doi.org/10.1002/asi.20961Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. David D. Lewis and William A. Gale. 1994. A sequential algorithm for training text classifiers. In Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR’94). 3--12. DOI:https://doi.org/10.1007/978-1-4471-2099-5_1Google ScholarGoogle Scholar
  22. Alessio Molinari. 2019. Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review. In Proceedings of the 8th BCS-IRSG Symposium on Future Directions in Information Access (FDIA’19). 72--78.Google ScholarGoogle Scholar
  23. Alessio Molinari. 2019. Risk Minimization Models for Technology-assisted Review and Their Application to e-discovery. Master’s thesis. Department of Computer Science, University of Pisa, Pisa, IT.Google ScholarGoogle Scholar
  24. Jose G. Moreno-Torres, Troy Raeder, Rocío Alaíz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recog. 45, 1 (2012), 521--530. DOI:https://doi.org/10.1016/j.patcog.2011.06.019Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Alejandro Moreo and Fabrizio Sebastiani. 2020. Tweet sentiment quantification: An experimental re-evaluation. Submitted for publication. https://arxiv.org/abs/2011.08091.Google ScholarGoogle Scholar
  26. Allan H. Murphy. 1973. A new vector partition of the probability score. J. Appl. Meteorol. 12, 4 (1973), 595--600.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mahdi P. Naeini, Gregory F. Cooper, and Milos Hauskrecht. 2015. Obtaining well-calibrated probabilities using Bayesian binning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 2901--2907.Google ScholarGoogle Scholar
  28. Alexandru Niculescu-Mizil and Rich Caruana. 2005. Obtaining calibrated probabilities from boosting. In Proceedings of the 21st Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’05). 413--420.Google ScholarGoogle Scholar
  29. Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML’05). 625--632. DOI:https://doi.org/10.1145/1102351.1102430Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Douglas W. Oard, Fabrizio Sebastiani, and Jyothi K. Vinjumur. 2018. Jointly minimizing the expected costs of review for responsiveness and privilege in e-discovery. ACM Trans. Inf. Syst. 37, 1 (2018), 11:1--11:35 pages. DOI:https://doi.org/10.1145/3268928Google ScholarGoogle Scholar
  31. John C. Platt. 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers, Alexander Smola, Peter Bartlett, Bernard Schölkopf, and Dale Schuurmans (Eds.). The MIT Press, Cambridge, MA, 61--74.Google ScholarGoogle Scholar
  32. Pablo Pérez-Gállego, Alberto Castaño, José Ramón Quevedo, and Juan José del Coz. 2019. Dynamic ensemble selection for quantification tasks. Inf. Fusion 45 (2019), 1--15. DOI:https://doi.org/10.1016/j.inffus.2018.01.001Google ScholarGoogle ScholarCross RefCross Ref
  33. Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence (Eds.). 2009. Dataset Shift in Machine Learning. The MIT Press, Cambridge, MA. DOI:https://doi.org/10.7551/mitpress/9780262170055.001.0001Google ScholarGoogle Scholar
  34. Marco Saerens, Patrice Latinne, and Christine Decaestecker. 2002. Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure. Neur. Comput. 14, 1 (2002), 21--41. DOI:https://doi.org/10.1162/089976602753284446Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Fabrizio Sebastiani. 2020. Evaluation measures for quantification: An axiomatic approach. Inf. Retr. J. 23, 3 (2020), 255--288. DOI:https://doi.org/10.1007/s10791-019-09363-yGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  36. David Spence, Christopher Inskip, Novi Quadrianto, and David Weir. 2019. Quantification under class-conditional dataset shift. In Proceedings of the 11th International Conference on Advances in Social Networks Analysis and Mining (ASONAM’19). 528--529. DOI:https://doi.org/10.1145/3341161.3342948Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. B. Stephenson, C. A. S. Coelho, and I. T. Jolliffe. 2008. Two extra components in the Brier score decomposition. Weath. Forecast. 23, 4 (2008), 752--757. DOI:https://doi.org/10.1175/2007WAF2006116.1Google ScholarGoogle ScholarCross RefCross Ref
  38. Meesun Sun and Sungzoon Cho. 2018. Obtaining calibrated probability using ROC binning. Pattern Anal. Applic. 21, 2 (2018), 307--322. DOI:https://doi.org/10.1007/s10044-016-0578-3Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley, New York, NY.Google ScholarGoogle ScholarCross RefCross Ref
  40. Ting-Fan Wu, Chih-Jen Lin, and Ruby C. Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5 (2004), 975--1005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Bianca Zadrozny and Charles Elkan. 2002. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining (KDD’02). 694--699. DOI:https://doi.org/10.1145/775107.775151Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Critical Reassessment of the Saerens-Latinne-Decaestecker Algorithm for Posterior Probability Adjustment

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Information Systems
        ACM Transactions on Information Systems  Volume 39, Issue 2
        April 2021
        391 pages
        ISSN:1046-8188
        EISSN:1558-2868
        DOI:10.1145/3444752
        Issue’s Table of Contents

        Copyright © 2020 ACM

        Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 31 December 2020
        • Accepted: 1 November 2020
        • Revised: 1 September 2020
        • Received: 1 May 2020
        Published in tois Volume 39, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format