nach oben

Knowledge and Information Systems

Erschienen in:

14.05.2020 | Regular Paper

Label similarity-based weighted soft majority voting and pairing for crowdsourcing

verfasst von: Fangna Tao, Liangxiao Jiang, Chaoqun Li

Erschienen in: Knowledge and Information Systems | Ausgabe 7/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Crowdsourcing services provide an efficient and relatively inexpensive approach to obtain substantial amounts of labeled data by employing crowd workers. It is obvious that the labeling qualities of crowd workers directly affect the quality of the labeled data. However, existing label aggregation strategies seldom consider the differences in the quality of workers labeling different instances. In this paper, we argue that a single worker may even have different labeling qualities on different instances. Based on this premise, we propose four new strategies by assigning different weights to workers when labeling different instances. In our proposed strategies, we first use the similarity among worker labels to estimate the specific quality of the worker on different instances, and then we build a classifier to estimate the overall quality of the worker across all instances. Finally, we combine these two qualities to define the weight of the worker labeling a particular instance. Extensive experimental results show that our proposed strategies significantly outperform other existing state-of-the-art label aggregation strategies.

Vorheriger Artikel A survey of recent methods on deriving topics from Twitter: algorithm to evaluation

Nächster Artikel Case notion discovery and recommendation: automated event log building on databases

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

http://www.mturk.com.

http://crowdflower.com.

Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):20–28

Demartini G, Difallah DE, Cudré-Mauroux P (2012) Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st World Wide Web conference 2012, WWW 2012, Lyon, France, pp 469–478

Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F (2009) Imagenet: a large-scale hierarchical image database. In Proceedings of conference on computer vision and pattern recognition, (CVPR 2009), Miami, Florida, pp 248–255

Donmez P, Carbonell JG, Schneider JG (2009) Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, pp 259–268

Dua D, Karra TE (2017) UCI machine learning repository

Ipeirotis PG, Provost FJ, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441MathSciNetCrossRef

Jiang L, Kong G, Li C (2019) Wrapper framework for test-cost-sensitive feature selection. In: IEEE transactions on systems man cybernetics-systems, pp 1–10

Jiang L, Zhang L, Liangjun Y, Wang D (2019) Class-specific attribute weighted naive bayes. Pattern Recognit 88:321–330CrossRef

Karger DR, Sewoong O, Shah D (2014) Budget-optimal task allocation for reliable crowdsourcing systems. Oper Res 62(1):1–24CrossRef

10.

Li C, Jiang L, Wenqiang X (2019) Noise correction to improve data and model quality for crowdsourcing. Eng Appl Artif Intell 82:184–191CrossRef

11.

Li C, Sheng VS, Jiang L, Li H (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103CrossRef

12.

Li J, Baba Y, Kashima H (2018) Incorporating worker similarity for label aggregation in crowdsourcing. In: Proceedings of the 27th international conference on artificial neural networks, ICANN 2018, Rhodes, pp 596–606

13.

Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Proceedings of the 26th annual conference on neural information processing systems 2012, Lake Tahoe, pp 701–709

14.

Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J (2015) Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Sydney, pp 745–754

15.

Qiu C, Jiang L, Cai Z (2018) Using differential evolution to estimate labeler quality for crowdsourcing. In: PRICAI 2018: trends in artificial intelligence 15th pacific rim international conference on artificial intelligence, Proceedings, Part II, Nanjing, China, pp 165–173

16.

Raykar VC, Shipeng Y, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNet

17.

Sheng VS, Zhang J, Bin G, Xindong W (2019) Majority voting and pairing with multiple noisy labeling. IEEE Trans Knowl Data Eng 31(7):1355–1368CrossRef

18.

Sheshadri A, Lease M (2013) SQUARE: a benchmark for research on computing crowd consensus. In: Proceedings of the first AAAI conference on human computation and crowdsourcing, HCOMP 2013 (November), Palm Springs. CA, USA, pp 7–9

19.

Tian T, Zhu J, Qiaoben Y (2019) Max-margin majority voting for learning from crowds. IEEE Trans Pattern Anal Mach Intell 41(10):2480–2494CrossRef

20.

Tu J, Yu G, Domeniconi C, Wang J, Xiao G, Guo M (2018) Multi-label answer aggregation based on joint matrix factorization. In: Proceedings of the IEEE international conference on data mining, ICDM 2018, Singapore, pp 517–526

21.

Turnbull D, Liu R, Barrington L, Lanckriet GRG (2007) A game-based approach for collecting semantic annotations of music. In: Proceedings of the 8th international conference on music information retrieval, ISMIR 2007, Vienna, Austria, pp 535–538

22.

Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan JR (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 23rd annual conference on neural information processing systems 2009, Vancouver, pp 2035–2043

23.

Zhang H, Jiang L, Xu W (2018) Differential evolution-based weighted majority voting for crowdsourcing. In: Proceedings of the 15th pacific rim international conference on artificial intelligence 2018, Nanjing, pp 228–236

24.

Zhang H, Jiang L, Xu W (2019) Multiple noisy label distribution propagation for crowdsourcing. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, pp 1473–1479

25.

Zhang H, Jiang L, Liangjun Y (2020) Class-specific attribute value weighting for naive bayes. Inf Sci 508:260–274CrossRef

26.

Zhang J, Sheng VS, Li T (2017) Label aggregation for crowdsourcing with bi-layer clustering. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, Shinjuku, Tokyo, Japan, pp 921–924

27.

Zhang J, Sheng VS, Nicholson B, Xindong W (2015) CEKA: a tool for mining the wisdom of crowds. J Mach Learn Res 16:2853–2858MathSciNet

28.

Zhang J, Sheng VS, Jian W, Xindong W (2016) Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans Knowl Data Eng 28(4):1080–1085CrossRef

29.

Zhang J, Wu X (2018) Multi-label inference for crowdsourcing. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2018, London, pp 2738–2747

30.

Zhong J, Yang P, Tang K (2017) A quality-sensitive method for learning from crowds. IEEE Trans Knowl Data Eng 29(12):2643–2654CrossRef

Titel: Label similarity-based weighted soft majority voting and pairing for crowdsourcing
verfasst von: Fangna Tao
Liangxiao Jiang
Chaoqun Li
Publikationsdatum: 14.05.2020
Verlag: Springer London
Erschienen in: Knowledge and Information Systems / Ausgabe 7/2020
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI: https://doi.org/10.1007/s10115-020-01475-y

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Weitere Artikel der Ausgabe 7/2020

Cyber security incidents analysis and classification in a case study of Korean enterprises

Verifying the manipulation of data objects according to business process and data models

PCA-based drift and shift quantification framework for multidimensional data

A compact firefly algorithm for matching biomedical ontologies

Fast and memory-efficient algorithms for high-order Tucker decomposition

Improving the -approximate algorithm for Probabilistic Classifier Chains