nach oben

Programming and Computer Software

Erschienen in:

01.11.2018

Active Learning and Crowdsourcing: A Survey of Optimization Methods for Data Labeling

verfasst von: R. A. Gilyazev, D. Yu. Turdakov

Erschienen in: Programming and Computer Software | Ausgabe 6/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

High-quality annotated collections are a key element in constructing systems that use machine learning. In most cases, these collections are created through manual labeling, which is expensive and tedious for annotators. To optimize data labeling, a number of methods using active learning and crowdsourcing were proposed. This paper provides a survey of currently available approaches, discusses their combined use, and describes existing software systems designed to facilitate the data labeling process.

Vorheriger Artikel An Approach to Reachability Determination for Static Analysis Defects with the Help of Dynamic Symbolic Execution

Nächster Artikel Dynamically Changing User Interfaces: Software Solutions Based on Automatically Collected User Information

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://www.mturk.com/

https://www.crowdflower.com/

https://toloka.yandex.ru/

https://lenta.ru/

Boim, R., Greenshpan, O., Milo, T., Novgorodov, S., Polyzotis, N., and Tan, W.-C., Asking the right questions in crowd data sourcing, Proc. 28th Int. Conf. Data Engineering (ICDE), 2012, pp. 1261–1264.

Brew, A., Greene, D., and Cunningham, P., Using crowdsourcing and active learning to track sentiment in online media, Proc. 19th Eur. Conf. Artificial Intelligence (ECAI), Amsterdam, 2010, pp. 145–150.

Dawid, A.P. and Skene, A.M., Maximum likelihood estimation of observer error-rates using the EM algorithm, Appl. Stat., 1979, pp. 20–28.

Demartini, G., Difallah, D.E., and Cudré-Mauroux, P., Zencrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking, Proc. 21st Int. Conf. World Wide Web, 2012, pp. 469–478.

Fan, J., Li, G., Ooi, B.C., Tan, K.-l., and Feng, J., iCrowd: An adaptive crowdsourcing framework, Proc. ACM SIGMOD Int. Conf. Management of Data, 2015, pp. 1015–1030.

Fan, J., Zhang, M., Kok, S., Lu, M., and Ooi, B.C., Crowdop: Query optimization for declarative crowdsourcing systems, IEEE Trans. Knowl. Data Eng., 2015, vol. 27, no. 8, pp. 2078–2092.CrossRef

Fang, M., Zhu, X., Li, B., Ding, W., and Wu, X., Self-taught active learning from crowds, Proc. 12th IEEE Int. Conf. Data Mining (ICDM), 2012, pp. 858–863.

Felt, P., Haertel, R., Ringger, E.K., and Seppi, K.D., Momresp: A bayesian model for multi-annotator document labeling, Proc. LREC, 2014, pp. 3704–3711.

Ghosh, A., Kale, S., and McAfee, P., Who moderates the moderators?: Crowdsourcing abuse detection in user-generated content, Proc. 12th ACM Conf. Electronic Commerce, 2011, pp. 167–176.

10.

Haas, D., Wang, J., Wu, E., and Franklin, M.J., Clamshell: Speeding up crowds for low-latency data labeling, Proc. VLDB, 2015, vol. 9, no. 4, pp. 372–383.

11.

Hao, S., Hoi, S.C.H., Miao, C., and Zhao, P., Active crowdsourcing for annotation, Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence and Intelligent Agent Technology (WI-IAT), Singapore, 2015, vol. 2, pp. 1–8.

12.

Hua, G., Long, C., Yang, M., and Gao, Y., Collaborative active learning of a kernel machine ensemble for recognition, Proc. IEEE Int. Conf. Computer Vision (ICCV), 2013, pp. 1209–1216.

13.

Hung, N.Q.V., Tam, N.T., Tran, L.N., and Aberer, K., An evaluation of aggregation techniques in crowdsourcing, Proc. Int. Conf. Web Information Systems Engineering, 2013, pp. 1–15.

14.

Kajino, H., Tsuboi, Y., and Kashima, H., A convex formulation for learning from crowds, Trans. Jpn. Soc. Artif. Intell., 2012, vol. 27, no. 3, pp. 133–142.CrossRef

15.

Karger, D.R., Oh, S., and Shah, D., Iterative learning for reliable crowdsourcing systems, Advances in Neural Information Processing Systems, 2011, pp. 1953–1961.

16.

Khattak, F.K., Toward a Robust and Universal Crowd Labeling Framework, Columbia University, 2017.

17.

Khattak, F.K. and Salleb-Aouissi, A., Quality control of crowd labeling through expert evaluation, Proc. 2nd NIPS Workshop Computational Social Science and the Wisdom of Crowds, vol. 2, 2011.

18.

Kilgarriff, A., Gold standard datasets for evaluating word sense disambiguation programs, Comput. Humanit., 1998, pp. 4–12.

19.

Kim, H.-C. and Ghahramani, Z., Bayesian classifier combination, Artif. Intell. Stat., 2012, 619–627.

20.

Laws, F., Scheible, C., and Schütze, H., Active learning with Amazon Mechanical Turk, Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), Stroudsburg, 2011, pp. 1546–1556.

21.

Lee, K., Caverlee, J., and Webb, S., The social honeypot project: Protecting online communities from spammers, Proc. 19th Int. Conf. World Wide Web, 2010, pp. 1139–1140.

22.

Li, G., Chai, C., Fan, J., Weng, X., Li, J., Zheng, Y., Li, Y., Yu, X., Zhang, X., and Yuan, H., CDB: Optimizing queries with crowd-based selections and joins, Proc. ACM Int. Conf. Management of Data, 2017, pp. 1463–1478.

23.

Li, G., Wang, J., Zheng, Y., and Franklin, M.J., Crowdsourced data management: A survey, IEEE Trans. Knowl. Data Eng., 2016, vol. 28, no. 9, pp. 2296–2319.CrossRef

24.

Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B., Fan, W., and Han, J., A survey on truth discovery, ACM SIGKDD Explor. Newsl., 2016, vol. 17, no. 2, pp. 1–16.CrossRef

25.

Liu, Q., Peng, J., and Ihler, A.T., Variational inference for crowdsourcing, Advances in Neural Information Processing Systems, 2012, pp. 692–700.

26.

Liu, X., Lu, M., Ooi, B.C., Shen, Y., Wu, S., and Zhang, M., CDAS: A crowdsourcing data analytics system, Proc. VLDB Endowment, 2012, vol. 5, no. 10, pp. 1040–1051.

27.

Liu, Y. and Liu, M., An online learning approach to improving the quality of crowd-sourcing, ACM SIGMETRICS Perform. Eval. Rev., 2015, vol. 43, pp. 217–230.CrossRef

28.

Marcus, A., Wu, E., Karger, D.R., Madden, S., and Miller, R.C., Crowdsourced databases: Query processing with people, Proc. CIDR, 2011.

29.

Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., and Madden, S., Scaling up crowdsourcing to very large datasets: A case for active learning, Proc. VLDB Endowment, 2014, vol. 8, no. 2, pp. 125–136.

30.

Nguyen, A.T., Wallace, B.C., and Lease, M., Combining crowd and expert labels using decision theoretic active learning, Proc. 3rd AAAI Conf. Human Computation and Crowdsourcing, 2015.

31.

Nowak, S. and Rüger, S., How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation, Proc. Int. Conf. Multimedia Information Retrieval (MIR), New York, 2010, pp. 557–566.

32.

Parameswaran, A.G., Park, H., Garcia-Molina, H., Polyzotis, N., and Widom, J., Deco: Declarative crowdsourcing, Proc. 21st ACM Int. Conf. Information and Knowledge Management, 2012, pp. 1203–1212.

33.

Rasmussen, C.E., Gaussian processes in machine learning, Advanced Lectures on Machine Learning, Springer, 2004, pp. 63–71.MATH

34.

Raykar, V.C., Yu, S., Zhao, L.H., Jerebko, A., Florin, C., Valadez, G.H., Bogoni, L., and Moy, L., Supervised learning from multiple experts: Whom to trust when everyone lies a bit, Proc. 26th Annu. Int. Conf. Machine Learning, 2009, pp. 889–896.

35.

Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., and Moy, L., Learning from crowds, J. Mach. Learn. Res., 2010, vol. 11, pp. 1297–1322.MathSciNet

36.

Rodrigues, F., Pereira, F., and Ribeiro, B., Gaussian process classification and active learning with multiple annotators, Proc. Int. Conf. Machine Learning, 2014, pp. 433–441.

37.

Settles, B., Active learning literature survey, University of Wisconsin–Madison, 2009.MATH

38.

Sheng, V.S., Provost, F.J., and Ipeirotis, P.G., Get another label? Improving data quality and data mining using multiple, noisy labelers, KDD, Li, Y., Liu, B., and Sarawagi, S., Eds., 2008, pp. 614–622.

39.

Sheshadri, A. and Lease, M., Square: A benchmark for research on computing crowd consensus, Proc. 1st AAAI Conf. Human Computation and Crowdsourcing, 2013.

40.

Snow, R., O’Connor, B., Jurafsky, D., and Ng, A.Y., Cheap and fast–but is it good?: Evaluating non-expert annotations for natural language tasks, Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), Stroudsburg, 2008, pp. 254–263.

41.

Tran-Thanh, L., Stein, S., Rogers, A., and Jennings, N.R., Efficient crowdsourcing of unknown experts using bounded multi-armed bandits, Artif. Intell., 2014, vol. 214, pp. 89–111.MathSciNetCrossRefMATH

42.

Wauthier, F.L. and Jordan, M.I., Bayesian bias mitigation for crowdsourcing, Advances in Neural Information Processing Systems, 2011, pp. 1800–1808.

43.

Welinder, P. and Perona, P., Online crowdsourcing: Rating annotators and obtaining cost-effective labels, Proc. IEEE Comput. Soc. Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), 2010, pp. 25–32.

44.

Whitehill, J., Wu, T.-f., Bergsma, J., Movellan, J.R., and Ruvolo, P.L., Whose vote should count more: Optimal integration of labels from labelers of unknown expertise, Advances in Neural Information Processing Systems, 2009, pp. 2035–2043.

45.

Yan, Y., Rosales, R., Fung, G., and Dy, J.G., Active learning from crowds, Proc. ICML, 2011, vol. 11, pp. 1161–1168.

46.

Zhang, C. and Chaudhuri, K., Active learning from weak and strong labelers, Advances in Neural Information Processing Systems, 2015, pp. 703–711.

47.

Zhang, J., Sheng, V.S., Wu, J., and Wu, X., Multi-class ground truth inference in crowdsourcing with clustering, IEEE Trans. Knowl. Data Eng., 2016, vol. 28, no. 4, pp. 1080–1085.CrossRef

48.

Zhang, J., Wu, X., and Sheng, V.S., Learning from crowdsourced labeled data: A survey, Artif. Intell. Rev., 2016, vol. 46, no. 4, pp. 543–576.CrossRef

49.

Zhang, Y., Chen, X., Zhou, D., and Jordan, M.I., Spectral methods meet EM: A provably optimal algorithm for crowdsourcing, Advances in Neural Information Processing Systems, 2014, pp. 1260–1268.

50.

Zheng, Y., Cheng, R., Maniu, S., and Mo, L., On optimality of jury selection in crowdsourcing, Proc. 18th Int. Conf. Extending Database Technology (EDBT), 2015.

51.

Zheng, Y., Li, G., and Cheng, R., Docs: A domain-aware crowdsourcing system using knowledge bases, Proc. VLDB Endowment, 2016, vol. 10, no. 4, pp. 361–372.

52.

Zheng, Y., Li, G., Li, Y., Shan, C., and Cheng, R., Truth inference in crowdsourcing: Is the problem solved?, Proc. VLDB Endowment, 2017, vol. 10, no. 5, pp. 541–552.

53.

Zheng, Y., Wang, J., Li, G., Cheng, R., and Feng, J., Qasca: A quality-aware task assignment system for crowdsourcing applications, Proc. ACM SIGMOD Int. Conf. Management of Data, 2015, pp. 1031–1046.

54.

Zhong, J., Tang, K., and Zhou, Z.-H., Active learning from crowds with unsure option, Proc. IJCAI, 2015, pp. 1061–1068.

55.

Ponomarev, A.V., Quality assurance methods in crowd computing systems: Analytical survey, Tr. S.-Peterb. Inst. Inf. Avtom. Ross. Akad. Nauk, 2017, vol. 5, no. 54, pp. 152–184.

56.

Korshunov, A. and Gomzin, A., Thematic modeling of texts in natural language, Tr. Inst. Sistemnogo Program. Ross. Akad. Nauk, 2012.

Titel: Active Learning and Crowdsourcing: A Survey of Optimization Methods for Data Labeling
verfasst von: R. A. Gilyazev
D. Yu. Turdakov
Publikationsdatum: 01.11.2018
Verlag: Pleiades Publishing
Erschienen in: Programming and Computer Software / Ausgabe 6/2018
Print ISSN: 0361-7688
Elektronische ISSN: 1608-3261
DOI: https://doi.org/10.1134/S0361768818060142

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 6/2018

Application of Computer Algebra to the Reconstruction of Surface from Its Photometric Images

Energy Proficient Flooding Scheme Using Reduced Coverage Set Algorithm for Unreliable Links

An Approach to Reachability Determination for Static Analysis Defects with the Help of Dynamic Symbolic Execution

OS-Agnostic Identification of Processes and Threads in the Full System Emulation for Selective Instrumentation

Analysis of Mobility Patterns for Public Transportation and Bus Stops Relocation

Node Failure Aware Broadcasting Mechanism in Mobile Adhoc Network Environment