Top

Published in:

2021 | OriginalPaper | Chapter

Learning from Noisy Similar and Dissimilar Data

Authors : Soham Dan, Han Bao, Masashi Sugiyama

Published in: Machine Learning and Knowledge Discovery in Databases. Research Track

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

With the widespread use of machine learning for classification, it becomes increasingly important to be able to use weaker kinds of supervision for tasks in which it is hard to obtain standard labeled data. One such kind of supervision is provided pairwise in the form of Similar (S) pairs (if two examples belong to the same class) and Dissimilar (D) pairs (if two examples belong to different classes). This kind of supervision is realistic in privacy-sensitive domains. Although the basic version of this problem has been studied recently, it is still unclear how to learn from such supervision under label noise, which is very common when the supervision is, for instance, crowd-sourced. In this paper, we close this gap and demonstrate how to learn a classifier from noisy S and D labeled pairs. We perform a detailed investigation of this problem under two realistic noise models and propose two algorithms to learn from noisy SD data. We also show important connections between learning from such pairwise supervision data and learning from ordinary class-labeled data. Finally, we perform experiments on synthetic and real-world datasets and show our noise-informed algorithms outperform existing baselines in learning from noisy pairwise data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Disparity Between Batches as a Signal for Early Stopping

next chapter Knowledge Distillation with Distribution Mismatch

Available only for authorised users

[24] has studied a relationship between relative comparison and a single hypothesis on stimuli, which is known as the law of comparative judgement.

This bias is known as social desirability bias [9]; questionees are unconsciously led to a socially desirable opinion when they are asked to reveal their opinions in a direct way. Such a tendency is observed especially in answering their sensitive matters such as criminal records.

Available at https://archive.ics.uci.edu/ml/datasets.php.

Bao, H., Niu, G., Sugiyama, M.: Classification from pairwise similarity and unlabeled data. In: International Conference on Machine Learning, pp. 461–470 (2018)

Bartlett, P.L., Bousquet, O., Mendelson, S., et al.: Local rademacher complexities. Ann. Stat. 33(4), 1497–1537 (2005)MathSciNetCrossRef

Bartlett, P.L., Jordan, M.I., McAuliffe, J.D.: Convexity, classification, and risk bounds. J. Am. Stat. Assoc. 101(473), 138–156 (2006)MathSciNetCrossRef

Bartlett, P.L., Mendelson, S.: Rademacher and gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(Nov), 463–482 (2002)

Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, Boca Raton (2008)CrossRef

Du, S.S., Zhai, X., Poczos, B., Singh, A.: Gradient descent provably optimizes over-parameterized neural networks. arXiv preprint arXiv:1810.02054 (2018)

Elkan, C.: The foundations of cost-sensitive learning. In: International Joint Conference on Artificial Intelligence, vol. 17, pp. 973–978 (2001)

Eric, B., Freitas, N.D., Ghosh, A.: Active preference learning with discrete choice data. In: Advances in Neural Information Processing Systems, pp. 409–416 (2008)

Fisher, R.J.: Social desirability bias and the validity of indirect questioning. J. Consum. Res. 20(2), 303–315 (1993)CrossRef

10.

Fürnkranz, J., Hüllermeier, E.: Preference Learning. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14125-6CrossRefMATH

11.

Gomes, R., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: NIPS (2011)

12.

Han, B., et al.: Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Advances in Neural Information Processing Systems, pp. 8527–8537 (2018)

13.

Hsu, Y.C., Lv, Z., Schlosser, J., Odom, P., Kira, Z.: Multiclass classification without multiclass labels. In: International Conference on Learning Representations (2018)

14.

Jamieson, K.G., Nowak, R.: Active ranking using pairwise comparisons. In: Advances in Neural Information Processing Systems, pp. 2240–2248 (2011)

15.

Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: Mentornet: regularizing very deep neural networks on corrupted labels. arXiv preprint arXiv:1712.05055 (2017)

16.

MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)

17.

Menon, A.K., Van Rooyen, B., Natarajan, N.: Learning from binary labels with instance-dependent corruption. arXiv preprint arXiv:1605.00751 (2016)

18.

Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. MIT Press, Cambridge (2018)MATH

19.

Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems, pp. 1196–1204 (2013)

20.

Patrini, G., Rozza, A., Krishna Menon, A., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)

21.

Saaty, T.L.: Decision Making for Leaders: The Analytic Hierarchy Process for Decisions in a Complex World. RWS Publications (1990)

22.

Scott, C., et al.: Calibrated asymmetric surrogate losses. Electron. J. Stat. 6, 958–992 (2012)MathSciNetCrossRef

23.

Shimada, T., Bao, H., Sato, I., Sugiyama, M.: Classification from pairwise similarities/dissimilarities and unlabeled data via empirical risk minimization. arXiv preprint arXiv:1904.11717 (2019)

24.

Thurstone, L.L.: A law of comparative judgment. Psychol. Rev. 34(4) (1927)

25.

Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S., et al.: Constrained k-means clustering with background knowledge. In: ICML, vol. 1, pp. 577–584 (2001)

26.

Yi, J., Jin, R., Jain, A.K., Jain, S.: Crowdclustering with sparse pairwise labels: a matrix completion approach. In: HCOMP@ AAAI. Citeseer (2012)

Title: Learning from Noisy Similar and Dissimilar Data
Authors: Soham Dan
Han Bao
Masashi Sugiyama
Publisher: Springer International Publishing
Book: Machine Learning and Knowledge Discovery in Databases. Research Track
Print ISBN: 978-3-030-86519-1

Electronic ISBN: 978-3-030-86520-7

Copyright Year: 2021
DOI: https://doi.org/10.1007/978-3-030-86520-7_15

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner