Skip to main content
Top
Published in: Neural Computing and Applications 9/2018

21-02-2017 | Original Article

Learning from crowds with active learning and self-healing

Authors: Zhenyu Shu, Victor S. Sheng, Jingjing Li

Published in: Neural Computing and Applications | Issue 9/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With the development of crowdsourcing, data acquisition for supervised learning from annotators all over the world becomes simple and economical. To improve accuracy, it is nature to obtain multiple noisy labels (i.e., a multiple label set) for each example from the crowd. Then, consensus algorithms can infer the estimated ground truth from the multiple label set for each example. The estimated ground truth is also called an integrated label, which could be a noise. That is, a dataset constructed via integrating the multiple noisy labels for each example in a crowdsourcing dataset (called an integrated dataset) still contains noises. In order to further improve the data quality of an integrated dataset, so that to improve the performance of a model learned from the integrated dataset, this paper proposes a framework that integrates active learning with the self-healing of a model together. With active learning, a limited number of examples from the integrated dataset, which are most likely noises, are selected for the oracle to correct; with the self-healing of a model, the data quality of the integrated dataset can be also improved automatically. From our experimental results on eight simulated crowdsourcing datasets with three popular consensus algorithms, we draw some conclusions as follows. (1) Our proposed framework does improve the performance of a model learned from the integrated dataset. (2) The simple active learning selection strategy based on uncertainty estimation can identify noises in the integrated dataset. (3) Self-healing is efficient and effective to improve the data quality of the integrated dataset, so that it improves the accuracy of a model learned from the integrated dataset. We further investigate our proposed framework on a real-world crowdsourcing dataset collected from Amazon Mechanical Turk, and the above conclusions are sustained.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Lai S, Xu L, Liu K et al (2015) Recurrent convolutional neural networks for text classification. AAAI, pp 2267–2273 Lai S, Xu L, Liu K et al (2015) Recurrent convolutional neural networks for text classification. AAAI, pp 2267–2273
2.
go back to reference Tang K, Paluri M, Fei-Fei L et al (2015) Improving image classification with location context. In: Proceedings of the IEEE international conference on computer vision, pp 1008–1016 Tang K, Paluri M, Fei-Fei L et al (2015) Improving image classification with location context. In: Proceedings of the IEEE international conference on computer vision, pp 1008–1016
3.
go back to reference Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406CrossRef Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406CrossRef
4.
go back to reference Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy-move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518CrossRef Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy-move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518CrossRef
5.
go back to reference Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools Appl 75(4):1947–1962CrossRef Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools Appl 75(4):1947–1962CrossRef
6.
go back to reference Chen B, Shu H, Coatrieux G, Chen G, Sun X, Coatrieux JL (2015) Color image analysis by quaternion-type moments. J Math Imaging Vis 51(1):124–144MathSciNetCrossRef Chen B, Shu H, Coatrieux G, Chen G, Sun X, Coatrieux JL (2015) Color image analysis by quaternion-type moments. J Math Imaging Vis 51(1):124–144MathSciNetCrossRef
7.
go back to reference Zheng Y, Jeon B, Xu D et al (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973 Zheng Y, Jeon B, Xu D et al (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973
8.
go back to reference Zhou Z, Wang Y, Wu QM et al (2017) Effective and efficient global context verification for image copy detection. IEEE Trans Inf Forensics Secur 12(1):48–63CrossRef Zhou Z, Wang Y, Wu QM et al (2017) Effective and efficient global context verification for image copy detection. IEEE Trans Inf Forensics Secur 12(1):48–63CrossRef
9.
go back to reference Xia Z, Wang X, Zhang L et al (2016) A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing. IEEE Trans Inf Forensics Secur 11(11):2594–2608CrossRef Xia Z, Wang X, Zhang L et al (2016) A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing. IEEE Trans Inf Forensics Secur 11(11):2594–2608CrossRef
10.
go back to reference Fu Z, Wu X, Guan C et al (2016) Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans Inf Forensics Secur 11(12): 2706–2716CrossRef Fu Z, Wu X, Guan C et al (2016) Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans Inf Forensics Secur 11(12): 2706–2716CrossRef
11.
go back to reference Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy--move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518CrossRef Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy--move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518CrossRef
12.
go back to reference Xia Z, Wang X, Sun X, Wang B (2014) Steganalysis of least significant bit matching using multi-order differences. Secur Commun Netw 7(8):1283–1291CrossRef Xia Z, Wang X, Sun X, Wang B (2014) Steganalysis of least significant bit matching using multi-order differences. Secur Commun Netw 7(8):1283–1291CrossRef
13.
go back to reference Wu J, Pan S, Zhu X et al (2016) Positive and unlabeled multi-graph learning. IEEE Trans Cybern Wu J, Pan S, Zhu X et al (2016) Positive and unlabeled multi-graph learning. IEEE Trans Cybern
14.
go back to reference Wu J, Pan S, Zhu X et al (2015) Boosting for multi-graph classification. IEEE Trans Cybern 45:416–429CrossRef Wu J, Pan S, Zhu X et al (2015) Boosting for multi-graph classification. IEEE Trans Cybern 45:416–429CrossRef
15.
go back to reference Wu J, Zhu X, Zhang C et al (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26:2382–2396CrossRef Wu J, Zhu X, Zhang C et al (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26:2382–2396CrossRef
16.
go back to reference Xintong G, Hongzhi W, Song Y et al (2014) Brief survey of crowdsourcing for data mining. Expert Syst Appl 41:7987–7994CrossRef Xintong G, Hongzhi W, Song Y et al (2014) Brief survey of crowdsourcing for data mining. Expert Syst Appl 41:7987–7994CrossRef
17.
go back to reference Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 614–622 Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 614–622
18.
go back to reference Ipeirotis PG, Provost F, Sheng VS et al (2008) Repeated labeling using multiple noisy labelers. Data Min Knowl Disc 28:402–441MathSciNetCrossRef Ipeirotis PG, Provost F, Sheng VS et al (2008) Repeated labeling using multiple noisy labelers. Data Min Knowl Disc 28:402–441MathSciNetCrossRef
19.
go back to reference Penrose LS (1946) The elementary statistics of majority voting. J R Stat Soc 109:53–57CrossRef Penrose LS (1946) The elementary statistics of majority voting. J R Stat Soc 109:53–57CrossRef
20.
go back to reference Raykar VC, Yu S, Zhao LH et al (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNet Raykar VC, Yu S, Zhao LH et al (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNet
21.
go back to reference Demartini G, Difallah D E, Cudré-Mauroux P (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 469–478 Demartini G, Difallah D E, Cudré-Mauroux P (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 469–478
23.
go back to reference Settles, Burr (2010) Active learning literature survey. University of Wisconsin, Madison 52:55–66 Settles, Burr (2010) Active learning literature survey. University of Wisconsin, Madison 52:55–66
24.
go back to reference Lewis, David D, Catlett Jason (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the eleventh international conference on machine learning pp 48–156 Lewis, David D, Catlett Jason (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the eleventh international conference on machine learning pp 48–156
25.
go back to reference Blake C, Merz CJ (1998) UCI repository of machine learning databases Blake C, Merz CJ (1998) UCI repository of machine learning databases
26.
go back to reference Wu J, Pan S, Zhu X et al (2016) SODE: self-adaptive one-dependence estimators for classification. Pattern Recogn 51:358–377CrossRef Wu J, Pan S, Zhu X et al (2016) SODE: self-adaptive one-dependence estimators for classification. Pattern Recogn 51:358–377CrossRef
27.
go back to reference Wu J, Pan S, Zhu X et al (2015) Self-adaptive attribute weighting for Naive Bayes classification. Expert Syst Appl 42:1487–1502CrossRef Wu J, Pan S, Zhu X et al (2015) Self-adaptive attribute weighting for Naive Bayes classification. Expert Syst Appl 42:1487–1502CrossRef
28.
go back to reference Jiang L, Li C, Wang S, Zhang L (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39CrossRef Jiang L, Li C, Wang S, Zhang L (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39CrossRef
29.
go back to reference Rahman Mahbubur et al (2015) Smartphone-based hierarchical crowdsourcing for weed identification. Comput Electron Agric 113:14–23CrossRef Rahman Mahbubur et al (2015) Smartphone-based hierarchical crowdsourcing for weed identification. Comput Electron Agric 113:14–23CrossRef
30.
go back to reference Parry C, Beckjord E, Moser RP et al (2015) It takes a (virtual) village: crowdsourcing measurement consensus to advance survivorship care planning. Transl Behav Med 5:53–59CrossRef Parry C, Beckjord E, Moser RP et al (2015) It takes a (virtual) village: crowdsourcing measurement consensus to advance survivorship care planning. Transl Behav Med 5:53–59CrossRef
31.
go back to reference Crescenzi V, Merialdo P, Qiu D (2014) Crowdsourcing large scale wrapper inference. Distributed and Parallel Databases, pp 1–28CrossRef Crescenzi V, Merialdo P, Qiu D (2014) Crowdsourcing large scale wrapper inference. Distributed and Parallel Databases, pp 1–28CrossRef
32.
go back to reference Byun TMA, Halpin PF, Szeredi D (2015) Online crowdsourcing for efficient rating of speech: a validation study. J Commun Disord 53:70–83CrossRef Byun TMA, Halpin PF, Szeredi D (2015) Online crowdsourcing for efficient rating of speech: a validation study. J Commun Disord 53:70–83CrossRef
33.
go back to reference Li C, Sheng VS, Jiang L et al (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103CrossRef Li C, Sheng VS, Jiang L et al (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103CrossRef
34.
go back to reference Peer E, Vosgerau J, Acquisti A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav Res Methods 46:1023–1031CrossRef Peer E, Vosgerau J, Acquisti A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav Res Methods 46:1023–1031CrossRef
35.
go back to reference Raykar VC, Yu S (2011) An entropic score to rank annotators for crowdsourced labeling tasks. In: IEEE third national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG) Raykar VC, Yu S (2011) An entropic score to rank annotators for crowdsourced labeling tasks. In: IEEE third national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG)
36.
go back to reference Tarasov A, Delany SJ, Namee BMac (2014) Dynamic estimation of worker reliability in crowdsourcing for regression tasks: making it work. Expert Syst Appl 41:6190–6210CrossRef Tarasov A, Delany SJ, Namee BMac (2014) Dynamic estimation of worker reliability in crowdsourcing for regression tasks: making it work. Expert Syst Appl 41:6190–6210CrossRef
37.
go back to reference Hu Q et al (2014) Learning from crowds under experts’ supervision. Advances in knowledge discovery and data mining, pp 200–211 Hu Q et al (2014) Learning from crowds under experts’ supervision. Advances in knowledge discovery and data mining, pp 200–211
38.
go back to reference Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, ACM pp 287–294 Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, ACM pp 287–294
39.
go back to reference Brinker K (2003) Incorporating diversity in active learning with support vector machines. ICML 3:59–66 Brinker K (2003) Incorporating diversity in active learning with support vector machines. ICML 3:59–66
40.
go back to reference Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics, pp 1070–1079 Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics, pp 1070–1079
41.
go back to reference Holub A, Perona P, Burl MC (2008) Entropy-based active learning for object recognition. In: IEEE computer society conference computer vision and pattern recognition workshops, 2008. CVPRW’08, pp 1–8 Holub A, Perona P, Burl MC (2008) Entropy-based active learning for object recognition. In: IEEE computer society conference computer vision and pattern recognition workshops, 2008. CVPRW’08, pp 1–8
42.
go back to reference Zhao L, Sukthankar G, Sukthankar R (2011) Incremental relabeling for active learning with noisy crowdsourced annotations. In: IEEE international conference privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), pp 728–733 Zhao L, Sukthankar G, Sukthankar R (2011) Incremental relabeling for active learning with noisy crowdsourced annotations. In: IEEE international conference privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), pp 728–733
43.
go back to reference Costa J et al (2011) On using crowdsourcing and active learning to improve classification performance. In: IEEE international 11th conference intelligent systems design and applications (ISDA), pp 469–474 Costa J et al (2011) On using crowdsourcing and active learning to improve classification performance. In: IEEE international 11th conference intelligent systems design and applications (ISDA), pp 469–474
44.
go back to reference Zhang J, Wu X, Sheng VS (2015) Active learning with imbalanced multiple noisy labeling. IEEE Trans Cybern 45:1081–1093 Zhang J, Wu X, Sheng VS (2015) Active learning with imbalanced multiple noisy labeling. IEEE Trans Cybern 45:1081–1093
46.
go back to reference Shu Z, Sheng VS, Zhang Y, et al (2015) Integrating active learning with supervision for crowdsourcing generalization. In: IEEE 14th international conference on machine learning and applications (ICMLA), pp 232–237 Shu Z, Sheng VS, Zhang Y, et al (2015) Integrating active learning with supervision for crowdsourcing generalization. In: IEEE 14th international conference on machine learning and applications (ICMLA), pp 232–237
48.
go back to reference Jiang L, Zhang H, Cai Z (2009) A novel bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng 21:1361–1371CrossRef Jiang L, Zhang H, Cai Z (2009) A novel bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng 21:1361–1371CrossRef
49.
go back to reference Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive bayes for class probability estimation. Knowl-Based Syst 26:239–245CrossRef Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive bayes for class probability estimation. Knowl-Based Syst 26:239–245CrossRef
50.
go back to reference Qiu C, Jiang L, Li C (2015) Not always simple classification: learning super parent for class probability estimation. Expert Syst Appl 42:5433–5440CrossRef Qiu C, Jiang L, Li C (2015) Not always simple classification: learning super parent for class probability estimation. Expert Syst Appl 42:5433–5440CrossRef
51.
go back to reference Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for v-support vector regression. Neural Netw 67:140–150CrossRef Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for v-support vector regression. Neural Netw 67:140–150CrossRef
52.
go back to reference Gu B, Sheng VS, Li S (2015) Bi-parameter space partition for cost-sensitive SVM. In: Proceedings of the 24th international conference on artificial intelligence. AAAI Press, pp 3532–3539 Gu B, Sheng VS, Li S (2015) Bi-parameter space partition for cost-sensitive SVM. In: Proceedings of the 24th international conference on artificial intelligence. AAAI Press, pp 3532–3539
53.
go back to reference Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416MathSciNetCrossRef Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416MathSciNetCrossRef
Metadata
Title
Learning from crowds with active learning and self-healing
Authors
Zhenyu Shu
Victor S. Sheng
Jingjing Li
Publication date
21-02-2017
Publisher
Springer London
Published in
Neural Computing and Applications / Issue 9/2018
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI
https://doi.org/10.1007/s00521-017-2878-y

Other articles of this Issue 9/2018

Neural Computing and Applications 9/2018 Go to the issue

Premium Partner