Skip to main content
Top

2021 | OriginalPaper | Chapter

Finding High-Value Training Data Subset Through Differentiable Convex Programming

Authors : Soumi Das, Arshdeep Singh, Saptarshi Chatterjee, Suparna Bhattacharya, Sourangshu Bhattacharya

Published in: Machine Learning and Knowledge Discovery in Databases. Research Track

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Finding valuable training data points for deep neural networks has been a core research challenge with many applications. In recent years, various techniques for calculating the “value” of individual training datapoints have been proposed for explaining trained models. However, the value of a training datapoint also depends on other selected training datapoints - a notion which is not explicitly captured by existing methods. In this paper, we study the problem of selecting high-value subsets of training data. The key idea is to design a learnable framework for online subset selection, which can be learned using mini-batches of training data, thus making our method scalable. This results in a parameterised convex subset selection problem that is amenable to a differentiable convex programming paradigm, thus allowing us to learn the parameters of the selection model in an end-to-end training. Using this framework, we design an online alternating minimization based algorithm for jointly learning the parameters of the selection model and ML model. Extensive evaluation on a synthetic dataset, and three standard datasets, show that our algorithm finds consistently higher value subsets of training data, compared to the recent state of the art methods, sometimes \(\sim 20\%\) higher value than existing methods. The subsets are also useful in finding mislabelled training data. Our algorithm takes running time comparable to the existing valuation functions.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, J.Z.: Differentiable convex optimization layers. In: NeurIPS (2019) Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., Kolter, J.Z.: Differentiable convex optimization layers. In: NeurIPS (2019)
2.
go back to reference Buchbinder, N., Feldman, M., Naor, J., Schwartz, R.: Submodular maximization with cardinality constraints. In: ACM-SIAM SODA (2014) Buchbinder, N., Feldman, M., Naor, J., Schwartz, R.: Submodular maximization with cardinality constraints. In: ACM-SIAM SODA (2014)
3.
go back to reference Choromanska, A., et al.: Beyond backprop: online alternating minimization with auxiliary variables. In: ICML. PMLR (2019) Choromanska, A., et al.: Beyond backprop: online alternating minimization with auxiliary variables. In: ICML. PMLR (2019)
4.
go back to reference Cook, R.D., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, New York (1982)MATH Cook, R.D., Weisberg, S.: Residuals and Influence in Regression. Chapman and Hall, New York (1982)MATH
5.
go back to reference Das, S., et al.: Multi-criteria online frame-subset selection for autonomous vehicle videos. Pattern Recognit. Lett. 133, 349–355 (2020)CrossRef Das, S., et al.: Multi-criteria online frame-subset selection for autonomous vehicle videos. Pattern Recognit. Lett. 133, 349–355 (2020)CrossRef
6.
go back to reference Elhamifar, E., Kaluza, M.C.D.P.: Online summarization via submodular and convex optimization. In: CVPR (2017) Elhamifar, E., Kaluza, M.C.D.P.: Online summarization via submodular and convex optimization. In: CVPR (2017)
7.
go back to reference Elhamifar, E., Sapiro, G., Sastry, S.S.: Dissimilarity-based sparse subset selection. IEEE TPAMI 38(11), 2182–2197 (2015)CrossRef Elhamifar, E., Sapiro, G., Sastry, S.S.: Dissimilarity-based sparse subset selection. IEEE TPAMI 38(11), 2182–2197 (2015)CrossRef
8.
go back to reference Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)CrossRef Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2013)CrossRef
9.
go back to reference Ghorbani, A., Kim, M., Zou, J.: A distributional framework for data valuation. In: ICML. PMLR (2020) Ghorbani, A., Kim, M., Zou, J.: A distributional framework for data valuation. In: ICML. PMLR (2020)
10.
go back to reference Ghorbani, A., Zou, J.: Data shapley: equitable valuation of data for machine learning. In: ICML. PMLR (2019) Ghorbani, A., Zou, J.: Data shapley: equitable valuation of data for machine learning. In: ICML. PMLR (2019)
11.
go back to reference Ghorbani, A., Zou, J.Y.: Neuron shapley: discovering the responsible neurons. In: NeurIPS (2020) Ghorbani, A., Zou, J.Y.: Neuron shapley: discovering the responsible neurons. In: NeurIPS (2020)
12.
go back to reference Hara, S., Nitanda, A., Maehara, T.: Data cleansing for models trained with SGD. In: NeurIPS (2019) Hara, S., Nitanda, A., Maehara, T.: Data cleansing for models trained with SGD. In: NeurIPS (2019)
13.
go back to reference Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: ICML. PMLR (2017) Koh, P.W., Liang, P.: Understanding black-box predictions via influence functions. In: ICML. PMLR (2017)
15.
go back to reference Lundberg, S.M., Erion, G.G., Lee, S.I.: Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888 (2018) Lundberg, S.M., Erion, G.G., Lee, S.I.: Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:​1802.​03888 (2018)
16.
go back to reference Pruthi, G., Liu, F., Kale, S., Sundararajan, M.: Estimating training data influence by tracing gradient descent. NeurIPS (2020) Pruthi, G., Liu, F., Kale, S., Sundararajan, M.: Estimating training data influence by tracing gradient descent. NeurIPS (2020)
17.
go back to reference Steinhardt, J., Koh, P.W., Liang, P.: Certified defenses for data poisoning attacks. In: NIPS 2017 (2017) Steinhardt, J., Koh, P.W., Liang, P.: Certified defenses for data poisoning attacks. In: NIPS 2017 (2017)
18.
go back to reference Wu, Y., Dobriban, E., Davidson, S.: Deltagrad: rapid retraining of machine learning models. In: ICML. PMLR (2020) Wu, Y., Dobriban, E., Davidson, S.: Deltagrad: rapid retraining of machine learning models. In: ICML. PMLR (2020)
19.
go back to reference Yoon, J., Arik, S., Pfister, T.: Data valuation using reinforcement learning. In: ICML. PMLR (2020) Yoon, J., Arik, S., Pfister, T.: Data valuation using reinforcement learning. In: ICML. PMLR (2020)
Metadata
Title
Finding High-Value Training Data Subset Through Differentiable Convex Programming
Authors
Soumi Das
Arshdeep Singh
Saptarshi Chatterjee
Suparna Bhattacharya
Sourangshu Bhattacharya
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-86520-7_41

Premium Partner