Top

Neural Computing and Applications

Published in:

21-02-2017 | Original Article

Learning from crowds with active learning and self-healing

Authors: Zhenyu Shu, Victor S. Sheng, Jingjing Li

Published in: Neural Computing and Applications | Issue 9/2018

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

With the development of crowdsourcing, data acquisition for supervised learning from annotators all over the world becomes simple and economical. To improve accuracy, it is nature to obtain multiple noisy labels (i.e., a multiple label set) for each example from the crowd. Then, consensus algorithms can infer the estimated ground truth from the multiple label set for each example. The estimated ground truth is also called an integrated label, which could be a noise. That is, a dataset constructed via integrating the multiple noisy labels for each example in a crowdsourcing dataset (called an integrated dataset) still contains noises. In order to further improve the data quality of an integrated dataset, so that to improve the performance of a model learned from the integrated dataset, this paper proposes a framework that integrates active learning with the self-healing of a model together. With active learning, a limited number of examples from the integrated dataset, which are most likely noises, are selected for the oracle to correct; with the self-healing of a model, the data quality of the integrated dataset can be also improved automatically. From our experimental results on eight simulated crowdsourcing datasets with three popular consensus algorithms, we draw some conclusions as follows. (1) Our proposed framework does improve the performance of a model learned from the integrated dataset. (2) The simple active learning selection strategy based on uncertainty estimation can identify noises in the integrated dataset. (3) Self-healing is efficient and effective to improve the data quality of the integrated dataset, so that it improves the accuracy of a model learned from the integrated dataset. We further investigate our proposed framework on a real-world crowdsourcing dataset collected from Amazon Mechanical Turk, and the above conclusions are sustained.

previous article Performance evaluation of framework of VoIP/SIP server under virtualization environment along with the most common security threats

next article Dual problem of sorptive barrier design with a multiobjective approach

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Lai S, Xu L, Liu K et al (2015) Recurrent convolutional neural networks for text classification. AAAI, pp 2267–2273

Tang K, Paluri M, Fei-Fei L et al (2015) Improving image classification with location context. In: Proceedings of the IEEE international conference on computer vision, pp 1008–1016

Wen X, Shao L, Xue Y, Fang W (2015) A rapid learning algorithm for vehicle classification. Inf Sci 295:395–406CrossRef

Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy-move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518CrossRef

Xia Z, Wang X, Sun X, Liu Q, Xiong N (2016) Steganalysis of LSB matching using differences between nonadjacent pixels. Multimedia Tools Appl 75(4):1947–1962CrossRef

Chen B, Shu H, Coatrieux G, Chen G, Sun X, Coatrieux JL (2015) Color image analysis by quaternion-type moments. J Math Imaging Vis 51(1):124–144MathSciNetCrossRef

Zheng Y, Jeon B, Xu D et al (2015) Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst 28(2):961–973

Zhou Z, Wang Y, Wu QM et al (2017) Effective and efficient global context verification for image copy detection. IEEE Trans Inf Forensics Secur 12(1):48–63CrossRef

Xia Z, Wang X, Zhang L et al (2016) A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing. IEEE Trans Inf Forensics Secur 11(11):2594–2608CrossRef

10.

Fu Z, Wu X, Guan C et al (2016) Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans Inf Forensics Secur 11(12): 2706–2716CrossRef

11.

Li J, Li X, Yang B, Sun X (2015) Segmentation-based image copy--move forgery detection scheme. IEEE Trans Inf Forensics Secur 10(3):507–518CrossRef

12.

Xia Z, Wang X, Sun X, Wang B (2014) Steganalysis of least significant bit matching using multi-order differences. Secur Commun Netw 7(8):1283–1291CrossRef

13.

Wu J, Pan S, Zhu X et al (2016) Positive and unlabeled multi-graph learning. IEEE Trans Cybern

14.

Wu J, Pan S, Zhu X et al (2015) Boosting for multi-graph classification. IEEE Trans Cybern 45:416–429CrossRef

15.

Wu J, Zhu X, Zhang C et al (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26:2382–2396CrossRef

16.

Xintong G, Hongzhi W, Song Y et al (2014) Brief survey of crowdsourcing for data mining. Expert Syst Appl 41:7987–7994CrossRef

17.

Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 614–622

18.

Ipeirotis PG, Provost F, Sheng VS et al (2008) Repeated labeling using multiple noisy labelers. Data Min Knowl Disc 28:402–441MathSciNetCrossRef

19.

Penrose LS (1946) The elementary statistics of majority voting. J R Stat Soc 109:53–57CrossRef

20.

Raykar VC, Yu S, Zhao LH et al (2010) Learning from crowds. J Mach Learn Res 11:1297–1322MathSciNet

21.

Demartini G, Difallah D E, Cudré-Mauroux P (2012) ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: Proceedings of the 21st international conference on World Wide Web. ACM, pp 469–478

22.

Liu Q, Steyvers M, Fisher JW et al (2003) On reliable crowdsourcing and the use of ground truth information. The Advancement of Artificial Intelligence. http://www.ics.uci.edu/~ihler/papers/hcomp13.pdf

23.

Settles, Burr (2010) Active learning literature survey. University of Wisconsin, Madison 52:55–66

24.

Lewis, David D, Catlett Jason (1994) Heterogeneous uncertainty sampling for supervised learning. In: Proceedings of the eleventh international conference on machine learning pp 48–156

25.

Blake C, Merz CJ (1998) UCI repository of machine learning databases

26.

Wu J, Pan S, Zhu X et al (2016) SODE: self-adaptive one-dependence estimators for classification. Pattern Recogn 51:358–377CrossRef

27.

Wu J, Pan S, Zhu X et al (2015) Self-adaptive attribute weighting for Naive Bayes classification. Expert Syst Appl 42:1487–1502CrossRef

28.

Jiang L, Li C, Wang S, Zhang L (2016) Deep feature weighting for naive Bayes and its application to text classification. Eng Appl Artif Intell 52:26–39CrossRef

29.

Rahman Mahbubur et al (2015) Smartphone-based hierarchical crowdsourcing for weed identification. Comput Electron Agric 113:14–23CrossRef

30.

Parry C, Beckjord E, Moser RP et al (2015) It takes a (virtual) village: crowdsourcing measurement consensus to advance survivorship care planning. Transl Behav Med 5:53–59CrossRef

31.

Crescenzi V, Merialdo P, Qiu D (2014) Crowdsourcing large scale wrapper inference. Distributed and Parallel Databases, pp 1–28CrossRef

32.

Byun TMA, Halpin PF, Szeredi D (2015) Online crowdsourcing for efficient rating of speech: a validation study. J Commun Disord 53:70–83CrossRef

33.

Li C, Sheng VS, Jiang L et al (2016) Noise filtering to improve data and model quality for crowdsourcing. Knowl Based Syst 107:96–103CrossRef

34.

Peer E, Vosgerau J, Acquisti A (2014) Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behav Res Methods 46:1023–1031CrossRef

35.

Raykar VC, Yu S (2011) An entropic score to rank annotators for crowdsourced labeling tasks. In: IEEE third national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG)

36.

Tarasov A, Delany SJ, Namee BMac (2014) Dynamic estimation of worker reliability in crowdsourcing for regression tasks: making it work. Expert Syst Appl 41:6190–6210CrossRef

37.

Hu Q et al (2014) Learning from crowds under experts’ supervision. Advances in knowledge discovery and data mining, pp 200–211

38.

Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, ACM pp 287–294

39.

Brinker K (2003) Incorporating diversity in active learning with support vector machines. ICML 3:59–66

40.

Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics, pp 1070–1079

41.

Holub A, Perona P, Burl MC (2008) Entropy-based active learning for object recognition. In: IEEE computer society conference computer vision and pattern recognition workshops, 2008. CVPRW’08, pp 1–8

42.

Zhao L, Sukthankar G, Sukthankar R (2011) Incremental relabeling for active learning with noisy crowdsourced annotations. In: IEEE international conference privacy, security, risk and trust (PASSAT) and 2011 IEEE third international conference on social computing (SocialCom), pp 728–733

43.

Costa J et al (2011) On using crowdsourcing and active learning to improve classification performance. In: IEEE international 11th conference intelligent systems design and applications (ISDA), pp 469–474

44.

Zhang J, Wu X, Sheng VS (2015) Active learning with imbalanced multiple noisy labeling. IEEE Trans Cybern 45:1081–1093

45.

Breiman Leo (2001) Random forests. Mach Learn 45:5–32CrossRef

46.

Shu Z, Sheng VS, Zhang Y, et al (2015) Integrating active learning with supervision for crowdsourcing generalization. In: IEEE 14th international conference on machine learning and applications (ICMLA), pp 232–237

47.

Jiang L (2011) Learning random forests for ranking. Front Comput Sci China 5:79–86MathSciNetCrossRef

48.

Jiang L, Zhang H, Cai Z (2009) A novel bayes model: hidden naive bayes. IEEE Trans Knowl Data Eng 21:1361–1371CrossRef

49.

Jiang L, Cai Z, Wang D, Zhang H (2012) Improving tree augmented naive bayes for class probability estimation. Knowl-Based Syst 26:239–245CrossRef

50.

Qiu C, Jiang L, Li C (2015) Not always simple classification: learning super parent for class probability estimation. Expert Syst Appl 42:5433–5440CrossRef

51.

Gu B, Sheng VS, Wang Z, Ho D, Osman S, Li S (2015) Incremental learning for v-support vector regression. Neural Netw 67:140–150CrossRef

52.

Gu B, Sheng VS, Li S (2015) Bi-parameter space partition for cost-sensitive SVM. In: Proceedings of the 24th international conference on artificial intelligence. AAAI Press, pp 3532–3539

53.

Gu B, Sheng VS, Tay KY, Romano W, Li S (2015) Incremental support vector learning for ordinal regression. IEEE Trans Neural Netw Learn Syst 26(7):1403–1416MathSciNetCrossRef

54.

Gu B, Sun X, Sheng VS (2016) Structural minimax probability machine. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2544779 MathSciNetCrossRef

55.

Gu B, Sheng VS (2016) A robust regularization path algorithm for ν-support vector classification. IEEE Trans Neural Netw Learn Syst. doi:10.1109/TNNLS.2016.2527796 CrossRef

Title: Learning from crowds with active learning and self-healing
Authors: Zhenyu Shu
Victor S. Sheng
Jingjing Li
Publication date: 21-02-2017
Publisher: Springer London
Published in: Neural Computing and Applications / Issue 9/2018
Print ISSN: 0941-0643
Electronic ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-017-2878-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 9/2018

Multi-objective evolutionary algorithm using problem-specific genetic operators for community detection in networks

A hybrid algorithm using a genetic algorithm and multiagent reinforcement learning heuristic to solve the traveling salesman problem

Clustering of fungal hexosaminidase enzymes based on free alignment method using MLP neural network

Non-coaxial rotating flow of viscous fluid with heat and mass transfer

A neuro-inspired visual tracking method based on programmable system-on-chip platform

Discrete cuckoo search algorithms for two-sided robotic assembly line balancing problem

Premium Partner