Skip to main content
Erschienen in: Knowledge and Information Systems 7/2020

14.05.2020 | Regular Paper

Label similarity-based weighted soft majority voting and pairing for crowdsourcing

verfasst von: Fangna Tao, Liangxiao Jiang, Chaoqun Li

Erschienen in: Knowledge and Information Systems | Ausgabe 7/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Crowdsourcing services provide an efficient and relatively inexpensive approach to obtain substantial amounts of labeled data by employing crowd workers. It is obvious that the labeling qualities of crowd workers directly affect the quality of the labeled data. However, existing label aggregation strategies seldom consider the differences in the quality of workers labeling different instances. In this paper, we argue that a single worker may even have different labeling qualities on different instances. Based on this premise, we propose four new strategies by assigning different weights to workers when labeling different instances. In our proposed strategies, we first use the similarity among worker labels to estimate the specific quality of the worker on different instances, and then we build a classifier to estimate the overall quality of the worker across all instances. Finally, we combine these two qualities to define the weight of the worker labeling a particular instance. Extensive experimental results show that our proposed strategies significantly outperform other existing state-of-the-art label aggregation strategies.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):20–28 Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):20–28
2.
Zurück zum Zitat Demartini G, Difallah DE, Cudré-Mauroux P (2012) Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st World Wide Web conference 2012, WWW 2012, Lyon, France, pp 469–478 Demartini G, Difallah DE, Cudré-Mauroux P (2012) Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st World Wide Web conference 2012, WWW 2012, Lyon, France, pp 469–478
3.
Zurück zum Zitat Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F (2009) Imagenet: a large-scale hierarchical image database. In Proceedings of conference on computer vision and pattern recognition, (CVPR 2009), Miami, Florida, pp 248–255 Deng J, Dong W, Socher R, Li L-J, Li K, Li F-F (2009) Imagenet: a large-scale hierarchical image database. In Proceedings of conference on computer vision and pattern recognition, (CVPR 2009), Miami, Florida, pp 248–255
4.
Zurück zum Zitat Donmez P, Carbonell JG, Schneider JG (2009) Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, pp 259–268 Donmez P, Carbonell JG, Schneider JG (2009) Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining, Paris, pp 259–268
5.
Zurück zum Zitat Dua D, Karra TE (2017) UCI machine learning repository Dua D, Karra TE (2017) UCI machine learning repository
6.
Zurück zum Zitat Ipeirotis PG, Provost FJ, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441MathSciNetCrossRef Ipeirotis PG, Provost FJ, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441MathSciNetCrossRef
7.
Zurück zum Zitat Jiang L, Kong G, Li C (2019) Wrapper framework for test-cost-sensitive feature selection. In: IEEE transactions on systems man cybernetics-systems, pp 1–10 Jiang L, Kong G, Li C (2019) Wrapper framework for test-cost-sensitive feature selection. In: IEEE transactions on systems man cybernetics-systems, pp 1–10
8.
Zurück zum Zitat Jiang L, Zhang L, Liangjun Y, Wang D (2019) Class-specific attribute weighted naive bayes. Pattern Recognit 88:321–330CrossRef Jiang L, Zhang L, Liangjun Y, Wang D (2019) Class-specific attribute weighted naive bayes. Pattern Recognit 88:321–330CrossRef
9.
Zurück zum Zitat Karger DR, Sewoong O, Shah D (2014) Budget-optimal task allocation for reliable crowdsourcing systems. Oper Res 62(1):1–24CrossRef Karger DR, Sewoong O, Shah D (2014) Budget-optimal task allocation for reliable crowdsourcing systems. Oper Res 62(1):1–24CrossRef
10.
Zurück zum Zitat Li C, Jiang L, Wenqiang X (2019) Noise correction to improve data and model quality for crowdsourcing. Eng Appl Artif Intell 82:184–191CrossRef Li C, Jiang L, Wenqiang X (2019) Noise correction to improve data and model quality for crowdsourcing. Eng Appl Artif Intell 82:184–191CrossRef
11.
Zurück zum Zitat Li C, Sheng VS, Jiang L, Li H (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103CrossRef Li C, Sheng VS, Jiang L, Li H (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103CrossRef
12.
Zurück zum Zitat Li J, Baba Y, Kashima H (2018) Incorporating worker similarity for label aggregation in crowdsourcing. In: Proceedings of the 27th international conference on artificial neural networks, ICANN 2018, Rhodes, pp 596–606 Li J, Baba Y, Kashima H (2018) Incorporating worker similarity for label aggregation in crowdsourcing. In: Proceedings of the 27th international conference on artificial neural networks, ICANN 2018, Rhodes, pp 596–606
13.
Zurück zum Zitat Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Proceedings of the 26th annual conference on neural information processing systems 2012, Lake Tahoe, pp 701–709 Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Proceedings of the 26th annual conference on neural information processing systems 2012, Lake Tahoe, pp 701–709
14.
Zurück zum Zitat Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J (2015) Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Sydney, pp 745–754 Ma F, Li Y, Li Q, Qiu M, Gao J, Zhi S, Su L, Zhao B, Ji H, Han J (2015) Faitcrowd: fine grained truth discovery for crowdsourced data aggregation. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, Sydney, pp 745–754
15.
Zurück zum Zitat Qiu C, Jiang L, Cai Z (2018) Using differential evolution to estimate labeler quality for crowdsourcing. In: PRICAI 2018: trends in artificial intelligence 15th pacific rim international conference on artificial intelligence, Proceedings, Part II, Nanjing, China, pp 165–173 Qiu C, Jiang L, Cai Z (2018) Using differential evolution to estimate labeler quality for crowdsourcing. In: PRICAI 2018: trends in artificial intelligence 15th pacific rim international conference on artificial intelligence, Proceedings, Part II, Nanjing, China, pp 165–173
16.
Zurück zum Zitat Raykar VC, Shipeng Y, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNet Raykar VC, Shipeng Y, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNet
17.
Zurück zum Zitat Sheng VS, Zhang J, Bin G, Xindong W (2019) Majority voting and pairing with multiple noisy labeling. IEEE Trans Knowl Data Eng 31(7):1355–1368CrossRef Sheng VS, Zhang J, Bin G, Xindong W (2019) Majority voting and pairing with multiple noisy labeling. IEEE Trans Knowl Data Eng 31(7):1355–1368CrossRef
18.
Zurück zum Zitat Sheshadri A, Lease M (2013) SQUARE: a benchmark for research on computing crowd consensus. In: Proceedings of the first AAAI conference on human computation and crowdsourcing, HCOMP 2013 (November), Palm Springs. CA, USA, pp 7–9 Sheshadri A, Lease M (2013) SQUARE: a benchmark for research on computing crowd consensus. In: Proceedings of the first AAAI conference on human computation and crowdsourcing, HCOMP 2013 (November), Palm Springs. CA, USA, pp 7–9
19.
Zurück zum Zitat Tian T, Zhu J, Qiaoben Y (2019) Max-margin majority voting for learning from crowds. IEEE Trans Pattern Anal Mach Intell 41(10):2480–2494CrossRef Tian T, Zhu J, Qiaoben Y (2019) Max-margin majority voting for learning from crowds. IEEE Trans Pattern Anal Mach Intell 41(10):2480–2494CrossRef
20.
Zurück zum Zitat Tu J, Yu G, Domeniconi C, Wang J, Xiao G, Guo M (2018) Multi-label answer aggregation based on joint matrix factorization. In: Proceedings of the IEEE international conference on data mining, ICDM 2018, Singapore, pp 517–526 Tu J, Yu G, Domeniconi C, Wang J, Xiao G, Guo M (2018) Multi-label answer aggregation based on joint matrix factorization. In: Proceedings of the IEEE international conference on data mining, ICDM 2018, Singapore, pp 517–526
21.
Zurück zum Zitat Turnbull D, Liu R, Barrington L, Lanckriet GRG (2007) A game-based approach for collecting semantic annotations of music. In: Proceedings of the 8th international conference on music information retrieval, ISMIR 2007, Vienna, Austria, pp 535–538 Turnbull D, Liu R, Barrington L, Lanckriet GRG (2007) A game-based approach for collecting semantic annotations of music. In: Proceedings of the 8th international conference on music information retrieval, ISMIR 2007, Vienna, Austria, pp 535–538
22.
Zurück zum Zitat Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan JR (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 23rd annual conference on neural information processing systems 2009, Vancouver, pp 2035–2043 Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan JR (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Proceedings of the 23rd annual conference on neural information processing systems 2009, Vancouver, pp 2035–2043
23.
Zurück zum Zitat Zhang H, Jiang L, Xu W (2018) Differential evolution-based weighted majority voting for crowdsourcing. In: Proceedings of the 15th pacific rim international conference on artificial intelligence 2018, Nanjing, pp 228–236 Zhang H, Jiang L, Xu W (2018) Differential evolution-based weighted majority voting for crowdsourcing. In: Proceedings of the 15th pacific rim international conference on artificial intelligence 2018, Nanjing, pp 228–236
24.
Zurück zum Zitat Zhang H, Jiang L, Xu W (2019) Multiple noisy label distribution propagation for crowdsourcing. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, pp 1473–1479 Zhang H, Jiang L, Xu W (2019) Multiple noisy label distribution propagation for crowdsourcing. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI 2019, Macao, pp 1473–1479
25.
Zurück zum Zitat Zhang H, Jiang L, Liangjun Y (2020) Class-specific attribute value weighting for naive bayes. Inf Sci 508:260–274CrossRef Zhang H, Jiang L, Liangjun Y (2020) Class-specific attribute value weighting for naive bayes. Inf Sci 508:260–274CrossRef
26.
Zurück zum Zitat Zhang J, Sheng VS, Li T (2017) Label aggregation for crowdsourcing with bi-layer clustering. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, Shinjuku, Tokyo, Japan, pp 921–924 Zhang J, Sheng VS, Li T (2017) Label aggregation for crowdsourcing with bi-layer clustering. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, Shinjuku, Tokyo, Japan, pp 921–924
27.
Zurück zum Zitat Zhang J, Sheng VS, Nicholson B, Xindong W (2015) CEKA: a tool for mining the wisdom of crowds. J Mach Learn Res 16:2853–2858MathSciNet Zhang J, Sheng VS, Nicholson B, Xindong W (2015) CEKA: a tool for mining the wisdom of crowds. J Mach Learn Res 16:2853–2858MathSciNet
28.
Zurück zum Zitat Zhang J, Sheng VS, Jian W, Xindong W (2016) Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans Knowl Data Eng 28(4):1080–1085CrossRef Zhang J, Sheng VS, Jian W, Xindong W (2016) Multi-class ground truth inference in crowdsourcing with clustering. IEEE Trans Knowl Data Eng 28(4):1080–1085CrossRef
29.
Zurück zum Zitat Zhang J, Wu X (2018) Multi-label inference for crowdsourcing. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2018, London, pp 2738–2747 Zhang J, Wu X (2018) Multi-label inference for crowdsourcing. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2018, London, pp 2738–2747
30.
Zurück zum Zitat Zhong J, Yang P, Tang K (2017) A quality-sensitive method for learning from crowds. IEEE Trans Knowl Data Eng 29(12):2643–2654CrossRef Zhong J, Yang P, Tang K (2017) A quality-sensitive method for learning from crowds. IEEE Trans Knowl Data Eng 29(12):2643–2654CrossRef
Metadaten
Titel
Label similarity-based weighted soft majority voting and pairing for crowdsourcing
verfasst von
Fangna Tao
Liangxiao Jiang
Chaoqun Li
Publikationsdatum
14.05.2020
Verlag
Springer London
Erschienen in
Knowledge and Information Systems / Ausgabe 7/2020
Print ISSN: 0219-1377
Elektronische ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-020-01475-y

Weitere Artikel der Ausgabe 7/2020

Knowledge and Information Systems 7/2020 Zur Ausgabe