nach oben

Progress in Artificial Intelligence

Erschienen in:

17.05.2019 | Regular Paper

Scaling up the learning-from-crowds GLAD algorithm using instance-difficulty clustering

verfasst von: Enrique González Rodrigo, Juan A. Aledo, Jose A. Gamez

Erschienen in: Progress in Artificial Intelligence | Ausgabe 3/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

The main goal of this article is to improve the results obtained by the GLAD algorithm in cases with large data. This algorithm is able to learn from instances labeled by multiple annotators taking into account both the quality of the annotators and the difficulty of the instances. Despite its many advantages, this study shows that GLAD does not scale well when dealing with large number of instances, as it estimates one parameter per instance of the dataset. Clustering is an alternative to reduce the number of parameters to be estimated, making the learning process more efficient. However, as the features of crowdsourced datasets are not usually available, classical clustering procedures cannot be applied directly. To solve this issue, we propose using clustering from vectors created by matrix factorization. Our analysis shows that this clustering process improves the results obtained by GLAD both regarding accuracy and execution time, especially in large data scenarios. We also compare this approach against other algorithms with a similar goal.

Vorheriger Artikel WordificationMI: multi-relational data mining through multiple-instance propositionalization

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

The reader can find an implementation of this algorithm and several others in the Apache Spark package spark-crowd [16].

An Apache Spark cluster with 3 worker nodes with 10 cores and 30 GB of memory each.

All the simulated datasets were used for these comparison. The (aggregated) results for each data size are shown in Fig. 3.

Aydin, B.I., Yilmaz, Y.S., Li, Y., Li, Q., Gao, J., Demirbas, M.: Crowdsourcing for multiple-choice question answering. In: Twenty-Sixth IAAI Conference (2014)

Charte, D., Charte, F., García, S., Herrera, F.: A snapshot on nonstandard supervised learning problems: taxonomy, relationships, problem transformations and algorithm adaptations. Prog. Artif. Intell. (2018). https://doi.org/10.1007/s13748-018-00167-7

Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 193–202. ACM (2013)

Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 2, 20–28 (1979)CrossRef

Demartini, G., Difallah, D.E., Cudré-Mauroux, P.: Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st International Conference on World Wide Web, pp. 469–478. ACM (2012)

Hernández-González, J., Inza, I., Lozano, J.A.: Weak supervision and other non-standard classification problems: a taxonomy. Pattern Recognit. Lett. 69, 49–55 (2016)CrossRef

Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on Amazon Mechanical Turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’10, pp. 64–67. ACM, New York (2010). https://doi.org/10.1145/1837885.1837906

Karger, D.R., Oh, S., Shah, D.: Iterative learning for reliable crowdsourcing systems. In: Advances in Neural Information Processing Systems, pp. 1953–1961 (2011)

Kim, H.C., Ghahramani, Z.: Bayesian classifier combination. In: Artificial Intelligence and Statistics, pp. 619–627 (2012)

10.

Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8(4), 425–436 (2014)CrossRef

11.

Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 1187–1198. ACM (2014)

12.

Liu, Q., Peng, J., Ihler, A.T.: Variational inference for crowdsourcing. In: Advances in Neural Information Processing Systems, pp. 692–700 (2012)

13.

Luna-Romera, J.M., García-Gutiérrez, J., Martínez-Ballesteros, M., Riquelme Santos, J.C.: An approach to validity indices for clustering techniques in big data. Prog. Artif. Intell. 7(2), 81–94 (2018)CrossRef

14.

Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. J. Mach. Learn. Res. 11(Apr), 1297–1322 (2010)MathSciNet

15.

Rodrigo, G., Aledo, E., Gámez, J.A.: CGLAD: using GLAD in crowdsourced large datasets. In: Lecture Notes in Computer Science, vol. 11314 (IDEAL 2018), pp. 783–791 (2018)

16.

Rodrigo, E.G., Aledo, J.A., Gamez, J.A.: spark-crowd: a spark package for learning from crowdsourced big data. J. Mach. Learn. Res. 20(19), 1–5 (2019)MathSciNetMATH

17.

Rodrigo, G., Aledo, E., Gámez, J.A.: Machine learning from crowds: a systematic review of its applications. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. (2019). https://doi.org/10.1002/widm.1288

18.

Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good? Evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics, Honolulu (2008)

19.

Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based Bayesian aggregation models for crowdsourcing. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 155–164. ACM (2014)

20.

Whitehill, J., Wu, T.f., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, pp. 2035–2043 (2009)

21.

Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)CrossRef

22.

Zhang, J., Wu, X.: Multi-label inference for crowdsourcing. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, pp. 2738–2747. ACM, New York (2018). https://doi.org/10.1145/3219819.3219958

23.

Zhang, J., Wu, X., Sheng, V.S.: Learning from crowdsourced labeled data: a survey. Artif. Intell. Rev. 46(4), 543–576 (2016)CrossRef

24.

Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: is the problem solved? Proc. VLDB Endow. 10(5), 541–552 (2017)CrossRef

25.

Zhou, Y., Wilkinson, D., Schreiber, R., Pan, R.: Large-scale parallel collaborative filtering for the netflix prize. In: International Conference on Algorithmic Applications in Management, pp. 337–348. Springer, Berlin (2008)

Titel: Scaling up the learning-from-crowds GLAD algorithm using instance-difficulty clustering
verfasst von: Enrique González Rodrigo
Juan A. Aledo
Jose A. Gamez
Publikationsdatum: 17.05.2019
Verlag: Springer Berlin Heidelberg
Erschienen in: Progress in Artificial Intelligence / Ausgabe 3/2019
Print ISSN: 2192-6352
Elektronische ISSN: 2192-6360
DOI: https://doi.org/10.1007/s13748-019-00189-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 3/2019

A chaotic salp swarm algorithm based on quadratic integrate and fire neural model for function optimization

WordificationMI: multi-relational data mining through multiple-instance propositionalization

Label prediction on issue tracking systems using text mining

OCAPIS: R package for Ordinal Classification and Preprocessing in Scala

A framework for evaluation in learning from label proportions

Early anticipation of driver’s maneuver in semiautonomous vehicles using deep learning