Skip to main content

2016 | OriginalPaper | Buchkapitel

Ballpark Learning: Estimating Labels from Rough Group Comparisons

verfasst von : Tom Hope, Dafna Shahaf

Erschienen in: Machine Learning and Knowledge Discovery in Databases

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We are interested in estimating individual labels given only coarse, aggregated signal over the data points. In our setting, we receive sets (“bags”) of unlabeled instances with constraints on label proportions. We relax the unrealistic assumption of known label proportions, made in previous work; instead, we assume only to have upper and lower bounds, and constraints on bag differences. We motivate the problem, propose an intuitive formulation and algorithm, and apply our methods to real-world scenarios. Across several domains, we show how using only proportion constraints and no labeled examples, we can achieve surprisingly high accuracy. In particular, we demonstrate how to predict income level using rough stereotypes and how to perform sentiment analysis using very little information. We also apply our method to guide exploratory analysis, recovering geographical differences in twitter dialect.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Aggarwal, C.C., Zhai, C.X.: A survey of text clustering algorithms. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 77–128. Springer, New York (2012)CrossRef Aggarwal, C.C., Zhai, C.X.: A survey of text clustering algorithms. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 77–128. Springer, New York (2012)CrossRef
4.
Zurück zum Zitat Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)CrossRefMATH Ben-Tal, A., El Ghaoui, L., Nemirovski, A.: Robust Optimization. Princeton University Press, Princeton (2009)CrossRefMATH
5.
Zurück zum Zitat Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
6.
Zurück zum Zitat Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-supervised Learning. MIT Press, Cambridge (2006)CrossRef Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-supervised Learning. MIT Press, Cambridge (2006)CrossRef
7.
8.
Zurück zum Zitat Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: SIGIR 2008, pp. 595–602 (2008) Druck, G., Mann, G., McCallum, A.: Learning from labeled features using generalized expectation criteria. In: SIGIR 2008, pp. 595–602 (2008)
9.
Zurück zum Zitat Eisenstein, J., Brendan, O., Smith, N., Xing, E.P.: A latent variable model for geographic lexical variation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Cambridge, MA (2010) Eisenstein, J., Brendan, O., Smith, N., Xing, E.P.: A latent variable model for geographic lexical variation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Cambridge, MA (2010)
10.
Zurück zum Zitat Eisenstein, J., Smith, N.A., Xing, E.P.: Discovering sociolinguistic associations with structured sparsity. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011) Eisenstein, J., Smith, N.A., Xing, E.P.: Discovering sociolinguistic associations with structured sparsity. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (2011)
11.
Zurück zum Zitat Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998) Fellbaum, C.: WordNet: An Electronic Lexical Database. Bradford Books (1998)
12.
Zurück zum Zitat Foulds, J., Frank, E.: A review of multi-instance learning assumptions. Knowl. Eng. Rev. 25, 125 (2010)CrossRef Foulds, J., Frank, E.: A review of multi-instance learning assumptions. Knowl. Eng. Rev. 25, 125 (2010)CrossRef
13.
Zurück zum Zitat Hu, Y., Li, M., Yu, N.: Multiple-instance ranking: learning to rank images for image retrieval. In: Proceedings of CVPR, p. 18 (2008) Hu, Y., Li, M., Yu, N.: Multiple-instance ranking: learning to rank images for image retrieval. In: Proceedings of CVPR, p. 18 (2008)
14.
Zurück zum Zitat Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML 1999, pp. 200–209 (1999) Joachims, T.: Transductive inference for text classification using support vector machines. In: ICML 1999, pp. 200–209 (1999)
15.
Zurück zum Zitat Kotzias, D., Denil, M., de Freitas, N., Smyth, P.: From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015 (2015) Kotzias, D., Denil, M., de Freitas, N., Smyth, P.: From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015 (2015)
16.
Zurück zum Zitat Li, L., Jin, X., Pan, S.J., Sun, J.-T.: Multi-domain active learning for text classification. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1086–1094. ACM (2012) Li, L., Jin, X., Pan, S.J., Sun, J.-T.: Multi-domain active learning for text classification. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1086–1094. ACM (2012)
17.
Zurück zum Zitat Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity. In: Proceedings of ACL, pp. 271–278 (2004) Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity. In: Proceedings of ACL, pp. 271–278 (2004)
18.
Zurück zum Zitat Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)CrossRef Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retrieval 2(1–2), 1–135 (2008)CrossRef
19.
Zurück zum Zitat Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
20.
Zurück zum Zitat Quadrianto, N., Smola, A.J., Caetano, T.S., Le, Q.V.: Estimating labels from label proportions. J. Mach. Learn. Res. 10, 2349–2374 (2009)MathSciNetMATH Quadrianto, N., Smola, A.J., Caetano, T.S., Le, Q.V.: Estimating labels from label proportions. J. Mach. Learn. Res. 10, 2349–2374 (2009)MathSciNetMATH
21.
Zurück zum Zitat Rueping, S.: SVM classifier estimation from group probabilities. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010) (2010) Rueping, S.: SVM classifier estimation from group probabilities. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010) (2010)
22.
Zurück zum Zitat Settles, B.: Active learning literature survey. University of Wisconsin, Madison, 52(55-66):11 Settles, B.: Active learning literature survey. University of Wisconsin, Madison, 52(55-66):11
23.
Zurück zum Zitat Settles, B.: Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1467–1478 (2011) Settles, B.: Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1467–1478 (2011)
24.
Zurück zum Zitat Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)MATH Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66 (2002)MATH
25.
Zurück zum Zitat Wager, S., Blocker, A., Cardin, N.: Weakly supervised clustering: learning fine-grained signals from coarse labels. Ann. Appl. Stat. 9(2), 801–820 (2015)MathSciNetCrossRefMATH Wager, S., Blocker, A., Cardin, N.: Weakly supervised clustering: learning fine-grained signals from coarse labels. Ann. Appl. Stat. 9(2), 801–820 (2015)MathSciNetCrossRefMATH
26.
Zurück zum Zitat Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: NIPS 2003. MIT Press (2003) Xing, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: NIPS 2003. MIT Press (2003)
27.
Zurück zum Zitat Yu, F., Liu, D., Kumar, S., Jebara, T., Chang, S.: \(\propto \)-SVM for learning with label proportions. In: ICML 2013 (2013) Yu, F., Liu, D., Kumar, S., Jebara, T., Chang, S.: \(\propto \)-SVM for learning with label proportions. In: ICML 2013 (2013)
28.
Zurück zum Zitat Zhou, G.-T., Lan, T., Vahdat, A., Mori, G.: Latent maximum margin clustering. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 26, pp. 28–36 (2013) Zhou, G.-T., Lan, T., Vahdat, A., Mori, G.: Latent maximum margin clustering. In: Burges, C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 26, pp. 28–36 (2013)
Metadaten
Titel
Ballpark Learning: Estimating Labels from Rough Group Comparisons
verfasst von
Tom Hope
Dafna Shahaf
Copyright-Jahr
2016
DOI
https://doi.org/10.1007/978-3-319-46227-1_19