Skip to main content

2018 | OriginalPaper | Buchkapitel

Effective Solution for Labeling Candidates with a Proper Ration for Efficient Crowdsourcing

verfasst von : Zhao Chen, Peng Cheng, Chen Zhang, Lei Chen

Erschienen in: Database Systems for Advanced Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

One of the core problems of crowdsourcing research is how to reduce the cost, in other words, how to get better results with a limited budget. To save budget, most researchers concentrate on internal steps of crowdsourcing while in this work we focus on the pre-processing stage: how to select the input for crowds to contribute. A straightforward application of this work is to help budget-limited machine learning researchers to get better balanced training data from crowd labeling. Specifically, we formulate the prior information based input manipulating procedure as the Candidate Selection Problem (CSP) and propose an end-squeezing algorithm for it. Our results show that a considerable cost reduction can be achieved by manipulating the input to the crowd with the help of some additional prior information. We verify the effectiveness and efficiency of these algorithms through extensive experiments.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: mining association rules from the crowd. Proc. VLDB Endow. 6(12), 1250–1253 (2013)CrossRef Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowdminer: mining association rules from the crowd. Proc. VLDB Endow. 6(12), 1250–1253 (2013)CrossRef
2.
Zurück zum Zitat Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)CrossRef Buhrmester, M., Kwang, T., Gosling, S.D.: Amazon’s mechanical turk a new source of inexpensive, yet high-quality, data? Perspect. Psychol. Sci. 6(1), 3–5 (2011)CrossRef
3.
Zurück zum Zitat Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask?: jury selection for decision making tasks on micro-blog services. Proc. VLDB Endow. 5(11), 1495–1506 (2012)CrossRef Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask?: jury selection for decision making tasks on micro-blog services. Proc. VLDB Endow. 5(11), 1495–1506 (2012)CrossRef
4.
Zurück zum Zitat Cao, C.C., Tong, Y., Chen, L., Jagadish, H.: Wisemarket: a new paradigm for managing wisdom of online social users. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 455–463. ACM (2013) Cao, C.C., Tong, Y., Chen, L., Jagadish, H.: Wisemarket: a new paradigm for managing wisdom of online social users. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 455–463. ACM (2013)
5.
Zurück zum Zitat Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 20–28 (1979) Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Stat. 20–28 (1979)
6.
Zurück zum Zitat Fan, J., Li, G., Ooi, B.C., Tan, K.-L., Feng, J.: icrowd: An adaptive crowdsourcing framework. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1015–1030. ACM (2015) Fan, J., Li, G., Ooi, B.C., Tan, K.-L., Feng, J.: icrowd: An adaptive crowdsourcing framework. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1015–1030. ACM (2015)
7.
Zurück zum Zitat Fujishige, S.: Submodular Functions and Optimization, vol. 58. Elsevier, New York City (2005)MATH Fujishige, S.: Submodular Functions and Optimization, vol. 58. Elsevier, New York City (2005)MATH
8.
Zurück zum Zitat Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 42, 463–484 (2012)CrossRef Galar, M., Fernandez, A., Barrenechea, E., Bustince, H., Herrera, F.: A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 42, 463–484 (2012)CrossRef
9.
Zurück zum Zitat Hong, Y.: On computing the distribution function for the poisson binomial distribution. Comput. Stat. Data Anal. 59, 41–51 (2013)MathSciNetCrossRef Hong, Y.: On computing the distribution function for the poisson binomial distribution. Comput. Stat. Data Anal. 59, 41–51 (2013)MathSciNetCrossRef
10.
Zurück zum Zitat Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report, Technical Report 07–49, University of Massachusetts, Amherst (2007) Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report, Technical Report 07–49, University of Massachusetts, Amherst (2007)
11.
Zurück zum Zitat Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM (2010) Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. ACM (2010)
12.
Zurück zum Zitat Karger, D.R., Oh, S., Shah, D.: Budget-optimal task allocation for reliable crowdsourcing systems. Oper. Res. 62(1), 1–24 (2014)CrossRef Karger, D.R., Oh, S., Shah, D.: Budget-optimal task allocation for reliable crowdsourcing systems. Oper. Res. 62(1), 1–24 (2014)CrossRef
13.
Zurück zum Zitat Le Cam, L., et al.: An approximation theorem for the poisson binomial distribution. Pac. J. Math. 10(4), 1181–1197 (1960)MathSciNetCrossRef Le Cam, L., et al.: An approximation theorem for the poisson binomial distribution. Pac. J. Math. 10(4), 1181–1197 (1960)MathSciNetCrossRef
14.
Zurück zum Zitat Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8(4), 425–436 (2014)CrossRef Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Proc. VLDB Endow. 8(4), 425–436 (2014)CrossRef
15.
Zurück zum Zitat Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)CrossRef Lopez, V., Fernandez, A., Garcia, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013)CrossRef
16.
Zurück zum Zitat Mason, W., Watts, D.J.: Financial incentives and the performance of crowds. ACM SigKDD Explor. Newsl. 11(2), 100–108 (2010)CrossRef Mason, W., Watts, D.J.: Financial incentives and the performance of crowds. ACM SigKDD Explor. Newsl. 11(2), 100–108 (2010)CrossRef
17.
Zurück zum Zitat Thompson, S., Seber, G.: Adaptive Sampling. Wiley series in probability and statistics. Wiley, Hoboken (1996). Show all parts in this seriesMATH Thompson, S., Seber, G.: Adaptive Sampling. Wiley series in probability and statistics. Wiley, Hoboken (1996). Show all parts in this seriesMATH
18.
Zurück zum Zitat Thompson, S.K.: Sampling. Wiley CourseSmart series, 3rd edn. Wiley, Hoboken (2012)CrossRef Thompson, S.K.: Sampling. Wiley CourseSmart series, 3rd edn. Wiley, Hoboken (2012)CrossRef
19.
Zurück zum Zitat Tong, Y., Chen, L., Zhou, Z., Jagadish, H.V., Shou, L., Lv, W.: Slade: a smart large-scale task decomposer in crowdsourcing. IEEE Trans. Knowl. Data Eng. (2018) Tong, Y., Chen, L., Zhou, Z., Jagadish, H.V., Shou, L., Lv, W.: Slade: a smart large-scale task decomposer in crowdsourcing. IEEE Trans. Knowl. Data Eng. (2018)
20.
Zurück zum Zitat Tong, Y., She, J., Ding, B., Chen, L., Wo, T., Xu, K.: Online minimum matching in real-time spatial data: experiments and analysis. PVLDB 9, 1053–1064 (2016) Tong, Y., She, J., Ding, B., Chen, L., Wo, T., Xu, K.: Online minimum matching in real-time spatial data: experiments and analysis. PVLDB 9, 1053–1064 (2016)
21.
Zurück zum Zitat Tong, Y., She, J., Ding, B., Wang, L., Chen, L.: Online mobile micro-task allocation in spatial crowdsourcing. In: ICDE (2016) Tong, Y., She, J., Ding, B., Wang, L., Chen, L.: Online mobile micro-task allocation in spatial crowdsourcing. In: ICDE (2016)
22.
Zurück zum Zitat Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 219–230. IEEE (2015) Verroios, V., Garcia-Molina, H.: Entity resolution with crowd errors. In: 2015 IEEE 31st International Conference on Data Engineering (ICDE), pp. 219–230. IEEE (2015)
23.
Zurück zum Zitat Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRef Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRef
24.
Zurück zum Zitat Volkova, A.Y.: A refinement of the central limit theorem for sums of independent random indicators. Theor. Probab. Appl. 40(4), 791–794 (1996)MathSciNetCrossRef Volkova, A.Y.: A refinement of the central limit theorem for sums of independent random indicators. Theor. Probab. Appl. 40(4), 791–794 (1996)MathSciNetCrossRef
25.
Zurück zum Zitat Wang, D., Hoi, S.C.H., He, Y.: A unified learning framework for auto face annotation by mining web facial images. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1392–1401. ACM (2012) Wang, D., Hoi, S.C.H., He, Y.: A unified learning framework for auto face annotation by mining web facial images. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 1392–1401. ACM (2012)
26.
Metadaten
Titel
Effective Solution for Labeling Candidates with a Proper Ration for Efficient Crowdsourcing
verfasst von
Zhao Chen
Peng Cheng
Chen Zhang
Lei Chen
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-91458-9_23