Top

Published in:

2023 | OriginalPaper | Chapter

Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching

Authors : Bastien Dussap, Gilles Blanchard, Badr-Eddine Chérief-Abdellatif

Published in: Machine Learning and Knowledge Discovery in Databases: Research Track

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Quantification learning deals with the task of estimating the target label distribution under label shift. In this paper, we first present a unifying framework, distribution feature matching (DFM), that recovers as particular instances various estimators introduced in previous literature. We derive a general performance bound for DFM procedures, improving in several key aspects upon previous bounds derived in particular cases. We then extend this analysis to study robustness of DFM procedures in the misspecified setting under departure from the exact label shift hypothesis, in particular in the case of contamination of the target by an unknown distribution. These theoretical findings are confirmed by a detailed numerical study on simulated and real-world datasets. We also introduce an efficient, scalable and robust version of kernel-based DFM using Random Fourier Features.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Overcoming the Limitations of Localization Uncertainty: Efficient and Exact Non-linear Post-processing and Calibration

next chapter Robust Classification of High-Dimensional Data Using Data-Adaptive Energy Distance

There can be large variability between samples coming from different laboratories, while there is homogeneity within each lab. The label shift hypothesis is therefore reasonable when keeping source and target from the same lab.

Alexandari, A., Kundaje, A., Shrikumar, A.: Maximum likelihood with bias-corrected calibration is hard-to-beat at label shift adaptation. In: International Conference on Machine Learning, pp. 222–232. PMLR (2020)

Azizzadenesheli, K., Liu, A., Yang, F., Anandkumar, A.: Regularized learning for domain adaptation under label shifts. arXiv preprint arXiv:1903.09734 (2019)

Barranquero, J., Díez, J., del Coz, J.J.: Quantification-oriented learning based on reliable classifiers. Pattern Recogn. 48(2), 591–604 (2015)CrossRefMATH

Barranquero, J., González, P., Díez, J., Del Coz, J.J.: On the study of nearest neighbor algorithms for prevalence estimation in binary problems. Pattern Recogn. 46(2), 472–482 (2013)CrossRefMATH

Bigot, J., Freulon, P., Hejblum, B.P., Leclaire, A.: On the potential benefits of entropic regularization for smoothing Wasserstein estimators. arXiv preprint arXiv:2210.06934 (2022)

Brusic, V., Gottardo, R., Kleinstein, S.H., Davis, M.M.: Computational resources for high-dimensional immune analysis from the human immunology project consortium. Nat. Biotechnol. 32, 146–148 (2014)CrossRef

Camoriano, R., Angles, T., Rudi, A., Rosasco, L.: Nytro: when subsampling meets early stopping. In: Artificial Intelligence and Statistics, pp. 1403–1411. PMLR (2016)

Charlier, B., Feydy, J., Glaunès, J.A., Collin, F.D., Durif, G.: Kernel operations on the GPU, with autodiff, without memory overflows. J. Mach. Learn. Res. 22(74), 1–6 (2021). https://www.kernel-operations.io/keops/index.html

Tachet des Combes, R., Zhao, H., Wang, Y.X., Gordon, G.J.: Domain adaptation with conditional distribution matching and generalized label shift. In: Advances in Neural Information Processing Systems, vol. 33, pp. 19276–19289 (2020)

10.

Du Plessis, M.C., Sugiyama, M.: Semi-supervised learning of class balance under class-prior change by distribution matching. Neural Netw. 50, 110–119 (2014)CrossRefMATH

11.

Dussap, B.: Distribution Feature Matching for Label Shift (2023). https://plmlab.math.cnrs.fr/dussap/Label-shift-DFM

12.

Dussap, B., Blanchard, G., Chérief-Abdellatif, B.E.: Label shift quantification with robustness guarantees via distribution feature matching. arXiv preprint arXiv:2306.04376 (2023)

13.

Esuli, A., Fabris, A., Moreo, A., Sebastiani, F.: Learning to quantify (2023)

14.

Finak, G., et al.: Standardizing flow cytometry immunophenotyping analysis from the human immunophenotyping consortium. Sci. Rep. 6(1), 1–11 (2016)CrossRef

15.

Forman, G.: Counting positives accurately despite inaccurate classification. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 564–575. Springer, Heidelberg (2005). https://doi.org/10.1007/11564096_55CrossRef

16.

Forman, G.: Quantifying trends accurately despite classifier error and class imbalance. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 157–166 (2006)

17.

Forman, G.: Quantifying counts and costs via classification. Data Min. Knowl. Disc. 17(2), 164–206 (2008)MathSciNetCrossRef

18.

Garg, S., Wu, Y., Balakrishnan, S., Lipton, Z.C.: A unified view of label shift estimation. arXiv preprint arXiv:2003.07554 (2020)

19.

González, P., Castaño, A., Chawla, N.V., Coz, J.J.D.: A review on quantification learning. ACM Comput. Surv. (CSUR) 50(5), 1–40 (2017)CrossRef

20.

González-Castro, V., Alaiz-Rodríguez, R., Alegre, E.: Class distribution estimation based on the hellinger distance. Inf. Sci. 218, 146–164 (2013)CrossRef

21.

Gretton, A., Borgwardt, K.M., Rasch, M.J., Schölkopf, B., Smola, A.: A kernel two-sample test. J. Mach. Learn. Res. 13(1), 723–773 (2012)MathSciNetMATH

22.

Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., Schölkopf, B.: Covariate shift by kernel mean matching. Dataset Shift Mach. Learn. 3(4), 5 (2009)

23.

Hopkins, D.J., King, G.: A method of automated nonparametric content analysis for social science. Am. J. Political Sci. 54(1), 229–247 (2010)CrossRef

24.

Iyer, A., Nath, S., Sarawagi, S.: Maximum mean discrepancy for class ratio estimation: convergence bounds and kernel selection. In: International Conference on Machine Learning, pp. 530–538. PMLR (2014)

25.

Kawakubo, H., Du Plessis, M.C., Sugiyama, M.: Computationally efficient class-prior estimation under class balance change using energy distance. IEICE Trans. Inf. Syst. 99(1), 176–186 (2016)CrossRef

26.

Lipton, Z., Wang, Y.X., Smola, A.: Detecting and correcting for label shift with black box predictors. In: International Conference on Machine Learning, pp. 3122–3130. PMLR (2018)

27.

Maletzke, A., dos Reis, D., Cherman, E., Batista, G.: DyS: a framework for mixture models in quantification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4552–4560 (2019)

28.

Milli, L., Monreale, A., Rossetti, G., Giannotti, F., Pedreschi, D., Sebastiani, F.: Quantification trees. In: 2013 IEEE 13th International Conference on Data Mining, pp. 528–536. IEEE (2013)

29.

Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B., et al.: Kernel mean embedding of distributions: a review and beyond. Found. Trends® Mach. Learn. 10(1–2), 1–141 (2017)

30.

Patel, V.M., Gopalan, R., Li, R., Chellappa, R.: Visual domain adaptation: a survey of recent advances. IEEE Signal Process. Mag. 32(3), 53–69 (2015)CrossRef

31.

Quinonero-Candela, J., Sugiyama, M., Schwaighofer, A., Lawrence, N.D.: Dataset Shift in Machine Learning. MIT Press, Cambridge (2008)CrossRef

32.

Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, vol. 20 (2007)

33.

Rudi, A., Camoriano, R., Rosasco, L.: Less is more: Nyström computational regularization. In: Advances in Neural Information Processing Systems, vol. 28 (2015)

34.

Rudi, A., Rosasco, L.: Generalization properties of learning with random features. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

35.

Saerens, M., Latinne, P., Decaestecker, C.: Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput. 14(1), 21–41 (2002)CrossRefMATH

36.

Sejdinovic, D., Sriperumbudur, B., Gretton, A., Fukumizu, K.: Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 2263–2291 (2013)

37.

Sutherland, D.J., Schneider, J.: On the error of random Fourier features. In: Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pp. 862–871 (2015)

38.

Zhang, K., Schölkopf, B., Muandet, K., Wang, Z.: Domain adaptation under target and conditional shift. In: International Conference on Machine Learning, pp. 819–827. PMLR (2013)

Title: Label Shift Quantification with Robustness Guarantees via Distribution Feature Matching
Authors: Bastien Dussap
Gilles Blanchard
Badr-Eddine Chérief-Abdellatif
Publisher: Springer Nature Switzerland
Book: Machine Learning and Knowledge Discovery in Databases: Research Track
Print ISBN: 978-3-031-43423-5

Electronic ISBN: 978-3-031-43424-2

Copyright Year: 2023
DOI: https://doi.org/10.1007/978-3-031-43424-2_5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner