Skip to main content
Top
Published in: Knowledge and Information Systems 2/2024

09-09-2023 | Regular Paper

Multilabel classification using crowdsourcing under budget constraints

Authors: Himanshu Suyal, Avtar Singh

Published in: Knowledge and Information Systems | Issue 2/2024

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multilabel classification has excelled in several distinct fields during the past few decades but still has significant limitations. One of the critical concerns is the lack or insufficient availability of label instances, and data labelling also needs time and budget, which is a challenge. Crowdsourcing overcomes the problem of label availability, yet, it has drawbacks such as label quality and budget limitations. The paper introduced the multilabel reverse auction framework to address the lack of crowd worker's issue. Each crowd worker must provide cost and confidence for each task for a specific domain. Furthermore, two methods for systematic budget selection are presented to address the insufficient domain coverage within the budget limitation: Greedy bid selection and Multi cover bid selection. Both approaches choose the most inexpensive crowd workers while considering worker expertise and domain coverage. Crowd version binary relevance and multilabel k-nearest neighbours are also introduced to support label aggregation and reduce low-quality workers' impact while considering the domain. An experimental study shows the effectiveness of our approach on seven multilabel datasets using diverse crowds. It delivers more than 16% improvement compared to the random selection with a majority voting baseline technique. The proposed method is compared against five benchmark algorithms and provides promising results when minimal availability of data and workers.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Howe J et al (2006) The rise of crowdsourcing. Wired magazine 14:1–4 Howe J et al (2006) The rise of crowdsourcing. Wired magazine 14:1–4
7.
go back to reference Vuurens J, de Vries AP, Eickhoff C How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy Vuurens J, de Vries AP, Eickhoff C How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy
8.
go back to reference Zhong J, Tang K, Zhou Z-H Active Learning from Crowds with Unsure Option Zhong J, Tang K, Zhou Z-H Active Learning from Crowds with Unsure Option
13.
go back to reference Suyal H, Singh A (2021) Improving multilabel classification in prototype selection scenario. Comput Intell Healthcare Inf 103–119 Suyal H, Singh A (2021) Improving multilabel classification in prototype selection scenario. Comput Intell Healthcare Inf 103–119
19.
go back to reference Mishra NK, Singh PK (2022) Linear ordering problem based classifier chain using genetic algorithm for multilabel classification. Appl Soft Comput 117:108395CrossRef Mishra NK, Singh PK (2022) Linear ordering problem based classifier chain using genetic algorithm for multilabel classification. Appl Soft Comput 117:108395CrossRef
37.
go back to reference Padmanabhan D, Bhat S, Shevade S, Narahari Y (2016) Topic Model Based Multilabel Classification. In: 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 996–1003 Padmanabhan D, Bhat S, Shevade S, Narahari Y (2016) Topic Model Based Multilabel Classification. In: 2016 IEEE 28th international conference on tools with artificial intelligence (ICTAI). IEEE, pp 996–1003
38.
go back to reference Davtyan M, Eickhoff C, Hofmann T (2015) Exploiting document content for efficient aggregation of crowdsourcing votes. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, New York, NY, USA, pp 783–790 Davtyan M, Eickhoff C, Hofmann T (2015) Exploiting document content for efficient aggregation of crowdsourcing votes. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, New York, NY, USA, pp 783–790
40.
go back to reference Gui X, Lu X, Yu G (2021) Cost-effective batch-mode multilabel active learning. Neurocomputing 463:355–367CrossRef Gui X, Lu X, Yu G (2021) Cost-effective batch-mode multilabel active learning. Neurocomputing 463:355–367CrossRef
44.
go back to reference Adamska P, Juźwin M, Wierzbicki A (2020) Picking peaches or squeezing lemons: selecting crowdsourcing workers for reducing cost of redundancy. pp 510–523 Adamska P, Juźwin M, Wierzbicki A (2020) Picking peaches or squeezing lemons: selecting crowdsourcing workers for reducing cost of redundancy. pp 510–523
45.
go back to reference Haruna CR, Hou M, Eghan MJ, Kpiebaareh MY, Tandoh L (2019) An effective and cost-based framework for a qualitative hybrid data deduplication. pp 511–520 Haruna CR, Hou M, Eghan MJ, Kpiebaareh MY, Tandoh L (2019) An effective and cost-based framework for a qualitative hybrid data deduplication. pp 511–520
47.
go back to reference Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds. In: Proceedings of the 24th annual ACM symposium on User interface software and technology-UIST '11. ACM Press, New York, p 33 Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds. In: Proceedings of the 24th annual ACM symposium on User interface software and technology-UIST '11. ACM Press, New York, p 33
48.
go back to reference Itoh Y, Matsubara S (2021) Adaptive budget allocation for cooperative task solving in crowdsourcing. In: 2021 IEEE international conference on big data (big data). IEEE, pp 3525–3533 Itoh Y, Matsubara S (2021) Adaptive budget allocation for cooperative task solving in crowdsourcing. In: 2021 IEEE international conference on big data (big data). IEEE, pp 3525–3533
50.
go back to reference Vazirani VV (2001) Approximation algorithms. Springer, Berlin Vazirani VV (2001) Approximation algorithms. Springer, Berlin
53.
go back to reference Kim H-C, Ghahramani Z (2012) Bayesian classifier combination. In: Artificial Intelligence and Statistics. pp 619–627 Kim H-C, Ghahramani Z (2012) Bayesian classifier combination. In: Artificial Intelligence and Statistics. pp 619–627
54.
go back to reference Kim H, Ghahramani Z (2003) The EM-EP algorithm for Gaussian process classification. In: Proceedings of the workshop on probabilistic graphical models for classification at ECML Kim H, Ghahramani Z (2003) The EM-EP algorithm for Gaussian process classification. In: Proceedings of the workshop on probabilistic graphical models for classification at ECML
57.
go back to reference Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, volume 2. Wiley, Hoboken Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions, volume 2. Wiley, Hoboken
Metadata
Title
Multilabel classification using crowdsourcing under budget constraints
Authors
Himanshu Suyal
Avtar Singh
Publication date
09-09-2023
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 2/2024
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-023-01973-9

Other articles of this Issue 2/2024

Knowledge and Information Systems 2/2024 Go to the issue

Premium Partner