research-article

A Critical Reassessment of the Saerens-Latinne-Decaestecker Algorithm for Posterior Probability Adjustment

Authors:
Andrea Esuli

Consiglio Nazionale delle Ricerche, Pisa, Italy

Consiglio Nazionale delle Ricerche, Pisa, Italy
View Profile

,
Alessio Molinari

Consiglio Nazionale delle Ricerche, Italy and Università di Pisa, Italy

Consiglio Nazionale delle Ricerche, Italy and Università di Pisa, Italy
View Profile

,
Fabrizio Sebastiani

Consiglio Nazionale delle Ricerche, Pisa, Italy

Consiglio Nazionale delle Ricerche, Pisa, Italy
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 39 Issue 2Article No.: 19pp 1–34https://doi.org/10.1145/3433164

Published:31 December 2020Publication History

ACM Transactions on Information Systems

Abstract

We critically re-examine the Saerens-Latinne-Decaestecker (SLD) algorithm, a well-known method for estimating class prior probabilities (“priors”) and adjusting posterior probabilities (“posteriors”) in scenarios characterized by distribution shift, i.e., difference in the distribution of the priors between the training and the unlabelled documents. Given a machine learned classifier and a set of unlabelled documents for which the classifier has returned posterior probabilities and estimates of the prior probabilities, SLD updates them both in an iterative, mutually recursive way, with the goal of making both more accurate; this is of key importance in downstream tasks such as single-label multiclass classification and cost-sensitive text classification. Since its publication, SLD has become the standard algorithm for improving the quality of the posteriors in the presence of distribution shift, and SLD is still considered a top contender when we need to estimate the priors (a task that has become known as “quantification”). However, its real effectiveness in improving the quality of the posteriors has been questioned. We here present the results of systematic experiments conducted on a large, publicly available dataset, across multiple amounts of distribution shift and multiple learners. Our experiments show that SLD improves the quality of the posterior probabilities and of the estimates of the prior probabilities, but only when the number of classes in the classification scheme is very small and the classifier is calibrated. As the number of classes grows, or as we use non-calibrated classifiers, SLD converges more slowly (and often does not converge at all), performance degrades rapidly, and the impact of SLD on the quality of the prior estimates and of the posteriors becomes negative rather than positive.

References

Tuomo Alasalmi, Jaakko Suutala, Heli Koskimäki, and Juha Röning. 2020. Better classifier calibration for small data sets. ACM Trans. Knowl. Discov. Data 14, 3 (2020), 1--19. DOI:https://doi.org/10.1145/3385656Google Scholar
Antonio Bella, Cèsar Ferri, José Hernández-Orallo, and María José Ramírez-Quintana. 2014. Aggregative quantification for regression. Data Mining Knowl. Discov. 28, 2 (2014), 475--518. DOI:https://doi.org/10.1007/s10618-013-0308-zGoogle ScholarDigital Library
Artem Bequé, Kristof Coussement, Ross W. Gayler, and Stefan Lessmann. 2017. Approaches for credit scorecard calibration: An empirical analysis. Knowl.-based Syst. 134 (2017), 213--227. DOI:https://doi.org/10.1016/j.knosys.2017.07.034Google Scholar
Glenn W. Brier. 1950. Verification of forecasts expressed in terms of probability. Month. Weath. Rev. 78, 1 (1950), 1--3. DOI:https://doi.org/10.1175/1520-0493(1950)078<0001:vofeit>2.0.co;2Google ScholarCross Ref
Gordon V. Cormack. 2008. Email spam filtering: A systematic review. Found. Trends Inf. Retr. 1, 4 (2008), 335--455. DOI:https://doi.org/10.1561/9781601981479Google ScholarDigital Library
Kristof Coussement and Wouter Buckinx. 2011. A probability-mapping algorithm for calibrating the posterior probabilities: A direct marketing application. Eur. J. Op. Res. 214, 3 (2011), 732--738. DOI:https://doi.org/10.1016/j.ejor.2011.05.027Google ScholarCross Ref
Morris H. DeGroot and Stephen E. Fienberg. 1983. The comparison and evaluation of forecasters. The Statistician 32, 1/2 (1983), 12--22. DOI:https://doi.org/10.2307/2987588Google ScholarCross Ref
Arthur P. Dempster, Nan M. Laird, and Donald B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. B 39, 1 (1977), 1--38.Google ScholarCross Ref
Pedro M. Domingos and Michael J. Pazzani. 1996. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Proceedings of the 13th International Conference on Machine Learning (ICML’96). 105--112.Google Scholar
Andrea Esuli, Alejandro Moreo, and Fabrizio Sebastiani. 2018. A recurrent neural network for sentiment quantification. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (CIKM’18). 1775--1778. DOI:https://doi.org/10.1145/3269206.3269287Google ScholarDigital Library
Andrea Esuli, Alejandro Moreo, and Fabrizio Sebastiani. 2020. Cross-lingual sentiment quantification. IEEE Intell. Syst. 35, 3 (2020), 106--114. DOI:https://doi.org/10.1109/MIS.2020.2979203Google ScholarDigital Library
Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recog. Lett. 27 (2006), 861--874.Google ScholarDigital Library
Tom Fawcett and Peter Flach. 2005. A response to Webb and Ting’s On the application of ROC analysis to predict classification performance under varying class distributions.’’ Mach. Learn. 58, 1 (2005), 33--38. DOI:https://doi.org/10.1007/s10994-005-5256-4Google ScholarDigital Library
Afonso Fernandes Vaz, Rafael Izbicki, and Rafael Bassi Stern. 2019. Quantification under prior probability shift: The ratio estimator and its extensions. J. Mach. Learn. Res. 20 (2019), 79:1--79:33.Google Scholar
Peter A. Flach. 2017. Classifier calibration. In Encyclopedia of Machine Learning (2nd ed.), Claude Sammut and Geoffrey I. Webb (Eds.). Springer, DE, 212--219.Google Scholar
George Forman. 2008. Quantifying counts and costs via classification. Data Mining Knowl. Discov. 17, 2 (2008), 164--206. DOI:https://doi.org/10.1007/s10618-008-0097-yGoogle ScholarDigital Library
Wei Gao and Fabrizio Sebastiani. 2016. From classification to quantification in tweet sentiment analysis. Soc. Netw. Anal. Mining 6, 19 (2016), 1--22. DOI:https://doi.org/10.1007/s13278-016-0327-zGoogle ScholarCross Ref
Tilmann Gneiting and Adrian E. Raftery. 2007. Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102, 477 (2007), 359--378. DOI:https://doi.org/10.1198/016214506000001437Google ScholarCross Ref
Pablo González, Alberto Castaño, Nitesh V. Chawla, and Juan José del Coz. 2017. A review on quantification learning. Comput. Surveys 50, 5 (2017), 74:1--74:40. DOI:https://doi.org/10.1145/3117807Google Scholar
Moshe Koppel, Jonathan Schler, and Shlomo Argamon. 2009. Computational methods in authorship attribution. J. Amer. Soc. Inf. Sci. Technol. 60, 1 (2009), 9--26. DOI:https://doi.org/10.1002/asi.20961Google ScholarDigital Library
David D. Lewis and William A. Gale. 1994. A sequential algorithm for training text classifiers. In Proceedings of the 17th ACM International Conference on Research and Development in Information Retrieval (SIGIR’94). 3--12. DOI:https://doi.org/10.1007/978-1-4471-2099-5_1Google Scholar
Alessio Molinari. 2019. Leveraging the transductive nature of e-discovery in cost-sensitive technology-assisted review. In Proceedings of the 8th BCS-IRSG Symposium on Future Directions in Information Access (FDIA’19). 72--78.Google Scholar
Alessio Molinari. 2019. Risk Minimization Models for Technology-assisted Review and Their Application to e-discovery. Master’s thesis. Department of Computer Science, University of Pisa, Pisa, IT.Google Scholar
Jose G. Moreno-Torres, Troy Raeder, Rocío Alaíz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recog. 45, 1 (2012), 521--530. DOI:https://doi.org/10.1016/j.patcog.2011.06.019Google ScholarDigital Library
Alejandro Moreo and Fabrizio Sebastiani. 2020. Tweet sentiment quantification: An experimental re-evaluation. Submitted for publication. https://arxiv.org/abs/2011.08091.Google Scholar
Allan H. Murphy. 1973. A new vector partition of the probability score. J. Appl. Meteorol. 12, 4 (1973), 595--600.Google ScholarCross Ref
Mahdi P. Naeini, Gregory F. Cooper, and Milos Hauskrecht. 2015. Obtaining well-calibrated probabilities using Bayesian binning. In Proceedings of the 29th AAAI Conference on Artificial Intelligence (AAAI’15). 2901--2907.Google Scholar
Alexandru Niculescu-Mizil and Rich Caruana. 2005. Obtaining calibrated probabilities from boosting. In Proceedings of the 21st Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI’05). 413--420.Google Scholar
Alexandru Niculescu-Mizil and Rich Caruana. 2005. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML’05). 625--632. DOI:https://doi.org/10.1145/1102351.1102430Google ScholarDigital Library
Douglas W. Oard, Fabrizio Sebastiani, and Jyothi K. Vinjumur. 2018. Jointly minimizing the expected costs of review for responsiveness and privilege in e-discovery. ACM Trans. Inf. Syst. 37, 1 (2018), 11:1--11:35 pages. DOI:https://doi.org/10.1145/3268928Google Scholar
John C. Platt. 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers, Alexander Smola, Peter Bartlett, Bernard Schölkopf, and Dale Schuurmans (Eds.). The MIT Press, Cambridge, MA, 61--74.Google Scholar
Pablo Pérez-Gállego, Alberto Castaño, José Ramón Quevedo, and Juan José del Coz. 2019. Dynamic ensemble selection for quantification tasks. Inf. Fusion 45 (2019), 1--15. DOI:https://doi.org/10.1016/j.inffus.2018.01.001Google ScholarCross Ref
Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence (Eds.). 2009. Dataset Shift in Machine Learning. The MIT Press, Cambridge, MA. DOI:https://doi.org/10.7551/mitpress/9780262170055.001.0001Google Scholar
Marco Saerens, Patrice Latinne, and Christine Decaestecker. 2002. Adjusting the outputs of a classifier to new a priori probabilities: A simple procedure. Neur. Comput. 14, 1 (2002), 21--41. DOI:https://doi.org/10.1162/089976602753284446Google ScholarDigital Library
Fabrizio Sebastiani. 2020. Evaluation measures for quantification: An axiomatic approach. Inf. Retr. J. 23, 3 (2020), 255--288. DOI:https://doi.org/10.1007/s10791-019-09363-yGoogle ScholarDigital Library
David Spence, Christopher Inskip, Novi Quadrianto, and David Weir. 2019. Quantification under class-conditional dataset shift. In Proceedings of the 11th International Conference on Advances in Social Networks Analysis and Mining (ASONAM’19). 528--529. DOI:https://doi.org/10.1145/3341161.3342948Google ScholarDigital Library
D. B. Stephenson, C. A. S. Coelho, and I. T. Jolliffe. 2008. Two extra components in the Brier score decomposition. Weath. Forecast. 23, 4 (2008), 752--757. DOI:https://doi.org/10.1175/2007WAF2006116.1Google ScholarCross Ref
Meesun Sun and Sungzoon Cho. 2018. Obtaining calibrated probability using ROC binning. Pattern Anal. Applic. 21, 2 (2018), 307--322. DOI:https://doi.org/10.1007/s10044-016-0578-3Google ScholarDigital Library
Vladimir Vapnik. 1998. Statistical Learning Theory. Wiley, New York, NY.Google ScholarCross Ref
Ting-Fan Wu, Chih-Jen Lin, and Ruby C. Weng. 2004. Probability estimates for multi-class classification by pairwise coupling. J. Mach. Learn. Res. 5 (2004), 975--1005.Google ScholarDigital Library
Bianca Zadrozny and Charles Elkan. 2002. Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining (KDD’02). 694--699. DOI:https://doi.org/10.1145/775107.775151Google ScholarDigital Library

Index Terms

A Critical Reassessment of the Saerens-Latinne-Decaestecker Algorithm for Posterior Probability Adjustment
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification

Recommendations

A probabilistic methodology for multilabel classification

Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen to label an ...
Read More
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Read More
Biquality learning: a framework to design algorithms dealing with closed-set distribution shifts
Abstract
Training machine learning models from data with weak supervision and dataset shifts is still challenging. Designing algorithms when these two situations arise has not been explored much, and existing algorithms cannot always handle the most ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Information Systems Volume 39, Issue 2
April 2021
391 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/3444752
Editor:
Min Zhang
Tsinghua University, China
Issue’s Table of Contents
Copyright © 2020 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 December 2020
- Accepted: 1 November 2020
- Revised: 1 September 2020
- Received: 1 May 2020
Published in tois Volume 39, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Text classification
dataset shift
distribution shift
posterior probabilities
prior probabilities
probabilistic classifiers
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 150
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

A Critical Reassessment of the Saerens-Latinne-Decaestecker Algorithm for Posterior Probability Adjustment

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

A probabilistic methodology for multilabel classification

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Biquality learning: a framework to design algorithms dealing with closed-set distribution shifts