Skip to main content
Erschienen in: Progress in Artificial Intelligence 1/2017

30.09.2016 | Regular Paper

Why is quantification an interesting learning problem?

verfasst von: Pablo González, Jorge Díez, Nitesh Chawla, Juan José del Coz

Erschienen in: Progress in Artificial Intelligence | Ausgabe 1/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

There are real applications that do not demand to classify or to make predictions about individual objects, but to estimate some magnitude about a group of them. For instance, one of these cases happens in sentiment analysis and opinion mining. Some applications require to classify opinions as positives or negatives, but there are also others, even more useful sometimes, that just need an estimation of which is the proportion of each class during a concrete period of time. “How many tweets about our new product were positive yesterday?” Practitioners should apply quantification algorithms to tackle this kind of problems, instead of just using off-the-shelf classification methods, because classifiers are suboptimal in the context of quantification tasks. Unfortunately, quantification learning is still relatively an under explored area in machine learning. The goal of this paper is to show that quantification learning is an interesting open problem. To support its benefits, we shall show an application to analyze Twitter comments in which even the most simple quantification methods outperform classification approaches.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Fußnoten
1
All the details can be found in [8]
 
Literatur
1.
Zurück zum Zitat Barranquero, J., González, P., Díez, J., del Coz, J.J.: On the study of nearest neighbour algorithms for prevalence estimation in binary problems. Pattern Recognit. 46(2), 472–482 (2013)CrossRefMATH Barranquero, J., González, P., Díez, J., del Coz, J.J.: On the study of nearest neighbour algorithms for prevalence estimation in binary problems. Pattern Recognit. 46(2), 472–482 (2013)CrossRefMATH
2.
Zurück zum Zitat Barranquero, J., Díez, J., del Coz, J.J.: Quantification-oriented learning based on reliable classifiers. Pattern Recognit. 48(2), 591–604 (2015)CrossRef Barranquero, J., Díez, J., del Coz, J.J.: Quantification-oriented learning based on reliable classifiers. Pattern Recognit. 48(2), 591–604 (2015)CrossRef
3.
Zurück zum Zitat Beijbom, O., Hoffman, J., Yao, E., Darrell, T., Rodriguez-Ramirez, A., Gonzalez-Rivero, M., Guldberg, O.H.: Quantification in-the-wild: data-sets and baselines. In: NIPS 2015, Workshop on Transfer and Multi-Task Learning. Montreal, CA (2015) Beijbom, O., Hoffman, J., Yao, E., Darrell, T., Rodriguez-Ramirez, A., Gonzalez-Rivero, M., Guldberg, O.H.: Quantification in-the-wild: data-sets and baselines. In: NIPS 2015, Workshop on Transfer and Multi-Task Learning. Montreal, CA (2015)
4.
Zurück zum Zitat Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.: Quantification via probability estimators. In: Proc. of the 10th IEEE International Conference on Data Mining, pp. 737–742 (2010) Bella, A., Ferri, C., Hernández-Orallo, J., Ramírez-Quintana, M.: Quantification via probability estimators. In: Proc. of the 10th IEEE International Conference on Data Mining, pp. 737–742 (2010)
5.
Zurück zum Zitat Esuli, A., Sebastiani, F.: Sentiment quantification. IEEE Intell. Syst. 25(4), 72–75 (2010)CrossRef Esuli, A., Sebastiani, F.: Sentiment quantification. IEEE Intell. Syst. 25(4), 72–75 (2010)CrossRef
6.
Zurück zum Zitat Esuli, A., Sebastiani, F.: Optimizing text quantifiers for multivariate loss functions. ACM Trans. Knowl. Discov. Data 9(4), 27:1–27:27 (2015) Esuli, A., Sebastiani, F.: Optimizing text quantifiers for multivariate loss functions. ACM Trans. Knowl. Discov. Data 9(4), 27:1–27:27 (2015)
7.
Zurück zum Zitat Fawcett, T., Flach, P.: A response to Webb and Ting’s on the application of ROC analysis to predict classification performance under varying class distributions. Mach. Learn. 58(1), 33–38 (2005)CrossRef Fawcett, T., Flach, P.: A response to Webb and Ting’s on the application of ROC analysis to predict classification performance under varying class distributions. Mach. Learn. 58(1), 33–38 (2005)CrossRef
8.
Zurück zum Zitat Forman, G.: Quantifying counts and costs via classification. Data Mining Knowl. Discov. 17(2), 164–206 (2008)MathSciNetCrossRef Forman, G.: Quantifying counts and costs via classification. Data Mining Knowl. Discov. 17(2), 164–206 (2008)MathSciNetCrossRef
9.
Zurück zum Zitat Forman, G., Kirshenbaum, E., Suermondt, J.: Pragmatic text mining: minimizing human effort to quantify many issues in call logs. In: Proceedings of ACM SIGKDD’06, ACM, pp. 852–861 (2006) Forman, G., Kirshenbaum, E., Suermondt, J.: Pragmatic text mining: minimizing human effort to quantify many issues in call logs. In: Proceedings of ACM SIGKDD’06, ACM, pp. 852–861 (2006)
10.
Zurück zum Zitat Garcia, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)MATH Garcia, S., Herrera, F.: An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J. Mach. Learn. Res. 9, 2677–2694 (2008)MATH
11.
Zurück zum Zitat Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1:12 (2009) Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford 1:12 (2009)
12.
Zurück zum Zitat González-Castro, V., Alaiz-Rodríguez, R., Alegre, E.: Class distribution estimation based on the hellinger distance. Inf. Sci. 218, 146–164 (2013)CrossRef González-Castro, V., Alaiz-Rodríguez, R., Alegre, E.: Class distribution estimation based on the hellinger distance. Inf. Sci. 218, 146–164 (2013)CrossRef
13.
Zurück zum Zitat Latinne, P., Saerens, M., Decaestecker, C.: Adjusting the outputs of a classifier to new a priori probabilities may significantly improve classification accuracy: Evidence from a multi-class problem in remote sensing. In: Proceedings of ICML’01, M. Kaufmann, pp. 298–305 (2001) Latinne, P., Saerens, M., Decaestecker, C.: Adjusting the outputs of a classifier to new a priori probabilities may significantly improve classification accuracy: Evidence from a multi-class problem in remote sensing. In: Proceedings of ICML’01, M. Kaufmann, pp. 298–305 (2001)
14.
Zurück zum Zitat Milli, L., Monreale, A., Rossetti, G., Giannotti, F., Pedreschi, D., Sebastiani, F.: Quantification trees. In: IEEE International Conference on Data Mining (ICDM’13), pp. 528–536 (2013) Milli, L., Monreale, A., Rossetti, G., Giannotti, F., Pedreschi, D., Sebastiani, F.: Quantification trees. In: IEEE International Conference on Data Mining (ICDM’13), pp. 528–536 (2013)
15.
Zurück zum Zitat Milli, L., Monreale, A., Rossetti, G., Pedreschi, D., Giannotti, F., Sebastiani, F.: Quantification in social networks. In: Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on, pp. 1–10 (2015) Milli, L., Monreale, A., Rossetti, G., Pedreschi, D., Giannotti, F., Sebastiani, F.: Quantification in social networks. In: Data Science and Advanced Analytics (DSAA), 2015. 36678 2015. IEEE International Conference on, pp. 1–10 (2015)
16.
Zurück zum Zitat Pérez-Gallego, P., Quevedo, J.R., del Coz, J.J.: Using ensembles for problems with characterizable changes in data distribution: a case study on quantification. Inf. Fusion 34, 87–100 (2017)CrossRef Pérez-Gallego, P., Quevedo, J.R., del Coz, J.J.: Using ensembles for problems with characterizable changes in data distribution: a case study on quantification. Inf. Fusion 34, 87–100 (2017)CrossRef
17.
Zurück zum Zitat Rakthanmanon, T., Keogh, E., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)CrossRef Rakthanmanon, T., Keogh, E., Lonardi, S., Evans, S.: MDL-based time series clustering. Knowl. Inf. Syst. 33(2), 371–399 (2012)CrossRef
18.
Zurück zum Zitat Saif, H., Fernández, M., He, Y., Alani, H.: Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold. In: 1st Interantional Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013) Saif, H., Fernández, M., He, Y., Alani, H.: Evaluation datasets for twitter sentiment analysis: a survey and a new dataset, the sts-gold. In: 1st Interantional Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013) (2013)
19.
Zurück zum Zitat Tasche, D.: Exact fit of simple finite mixture models. J. Risk Financial Manag. 7(4), 150–164 (2014)CrossRef Tasche, D.: Exact fit of simple finite mixture models. J. Risk Financial Manag. 7(4), 150–164 (2014)CrossRef
Metadaten
Titel
Why is quantification an interesting learning problem?
verfasst von
Pablo González
Jorge Díez
Nitesh Chawla
Juan José del Coz
Publikationsdatum
30.09.2016
Verlag
Springer Berlin Heidelberg
Erschienen in
Progress in Artificial Intelligence / Ausgabe 1/2017
Print ISSN: 2192-6352
Elektronische ISSN: 2192-6360
DOI
https://doi.org/10.1007/s13748-016-0103-3

Weitere Artikel der Ausgabe 1/2017

Progress in Artificial Intelligence 1/2017 Zur Ausgabe