Skip to main content

2018 | OriginalPaper | Buchkapitel

Finding High-Quality Unstructured Submissions in General Crowdsourcing Tasks

verfasst von : Shanshan Lyu, Wentao Ouyang, Huawei Shen, Xueqi Cheng

Erschienen in: Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The quality of crowdsourced work varies drastically from superior to inferior. As a consequence, the problem of automatically finding high-quality crowdsourced work is of great importance. A variety of aggregation methods have been proposed for multiple-choice tasks such as item labeling with structured claims. However, they do not apply to more general tasks, such as article writing and brand design, with unstructured submissions that cannot be aggregated. Recent work tackles this problem by asking another set of crowd workers to review and grade each submission, essentially transforming unstructured submissions into structured ratings that can be aggregated. Nevertheless, such an approach incurs unnecessary monetary cost and delay. In this paper, we address this problem by exploiting task requesters’ historical feedback and directly modeling the submission quality, without the need of additional crowdsourced ratings. We first propose three sets of features, which try to characterize the submission quality from various perspectives, including the submissions themselves, the workers who make the submissions, and the interactions between task requesters and workers. We then propose two quality models, where one judges the submission quality independently and the other judges comparatively. These models not only incorporate features, but also take worker-specific factors into consideration. Experimental results on three large-scale data sets demonstrate that our models outperform general-purpose learning-to-rank methods such as Logistic Regression, RankBoost, and ListNet for finding high-quality crowdsourced submissions.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Baba, Y., Kashima, H.: Statistical quality estimation for general crowdsourcing tasks. In: KDD, pp. 554–562. ACM (2013) Baba, Y., Kashima, H.: Statistical quality estimation for general crowdsourcing tasks. In: KDD, pp. 554–562. ACM (2013)
2.
Zurück zum Zitat Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH
3.
Zurück zum Zitat Burges, C., et al.: Learning to rank using gradient descent. In: ICML, pp. 89–96. ACM (2005) Burges, C., et al.: Learning to rank using gradient descent. In: ICML, pp. 89–96. ACM (2005)
4.
Zurück zum Zitat Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136. ACM (2007) Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML, pp. 129–136. ACM (2007)
5.
Zurück zum Zitat Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Society. Ser. C (Appl. Stat.) 28, 20–28 (1979) Dawid, A.P., Skene, A.M.: Maximum likelihood estimation of observer error-rates using the EM algorithm. J. R. Stat. Society. Ser. C (Appl. Stat.) 28, 20–28 (1979)
6.
Zurück zum Zitat Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003) Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4(Nov), 933–969 (2003)
7.
Zurück zum Zitat Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with mechanical turk. In: CHI, pp. 453–456. ACM (2008) Kittur, A., Chi, E.H., Suh, B.: Crowdsourcing user studies with mechanical turk. In: CHI, pp. 453–456. ACM (2008)
10.
Zurück zum Zitat Ouyang, R.W., Kaplan, L., Martin, P., Toniolo, A., Srivastava, M., Norman, T.J.: Debiasing crowdsourced quantitative characteristics in local businesses and services. In: IPSN, pp. 190–201. ACM (2015) Ouyang, R.W., Kaplan, L., Martin, P., Toniolo, A., Srivastava, M., Norman, T.J.: Debiasing crowdsourced quantitative characteristics in local businesses and services. In: IPSN, pp. 190–201. ACM (2015)
11.
Zurück zum Zitat Ouyang, R.W., Kaplan, L.M., Toniolo, A., Srivastava, M., Norman, T.J.: Aggregating crowdsourced quantitative claims: additive and multiplicative models. IEEE Trans. Knowl. Data Eng. 28(7), 1621–1634 (2016)CrossRef Ouyang, R.W., Kaplan, L.M., Toniolo, A., Srivastava, M., Norman, T.J.: Aggregating crowdsourced quantitative claims: additive and multiplicative models. IEEE Trans. Knowl. Data Eng. 28(7), 1621–1634 (2016)CrossRef
12.
Zurück zum Zitat Ouyang, R.W., Srivastava, M., Toniolo, A., Norman, T.J.: Truth discovery in crowdsourced detection of spatial events. IEEE Trans. Knowl. Data Eng. 28(4), 1047–1060 (2016)CrossRef Ouyang, R.W., Srivastava, M., Toniolo, A., Norman, T.J.: Truth discovery in crowdsourced detection of spatial events. IEEE Trans. Knowl. Data Eng. 28(4), 1047–1060 (2016)CrossRef
13.
Zurück zum Zitat Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: CHI, pp. 1403–1412. ACM (2011) Quinn, A.J., Bederson, B.B.: Human computation: a survey and taxonomy of a growing field. In: CHI, pp. 1403–1412. ACM (2011)
14.
Zurück zum Zitat Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C.: Learning from crowds. J. Mach. Learn. Res. 99, 1297–1322 (2010)MathSciNet Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C.: Learning from crowds. J. Mach. Learn. Res. 99, 1297–1322 (2010)MathSciNet
15.
Zurück zum Zitat Sunahase, T., Baba, Y., Kashima, H.: Pairwise hits: quality estimation from pairwise comparisons in creator-evaluator crowdsourcing process. In: AAAI, pp. 977–984 (2017) Sunahase, T., Baba, Y., Kashima, H.: Pairwise hits: quality estimation from pairwise comparisons in creator-evaluator crowdsourcing process. In: AAAI, pp. 977–984 (2017)
16.
Zurück zum Zitat Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based Bayesian aggregation models for crowdsourcing. In: WWW, pp. 155–164. ACM (2014) Venanzi, M., Guiver, J., Kazai, G., Kohli, P., Shokouhi, M.: Community-based Bayesian aggregation models for crowdsourcing. In: WWW, pp. 155–164. ACM (2014)
17.
Zurück zum Zitat Whitehill, J., Wu, T.F., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009) Whitehill, J., Wu, T.F., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: NIPS, pp. 2035–2043 (2009)
18.
Zurück zum Zitat Xu, J., Li, H.: Adarank: a boosting algorithm for information retrieval. In: SIGIR, pp. 391–398. ACM (2007) Xu, J., Li, H.: Adarank: a boosting algorithm for information retrieval. In: SIGIR, pp. 391–398. ACM (2007)
19.
Zurück zum Zitat Zaidan, O.F., Callison-Burch, C.: Crowdsourcing translation: professional quality from non-professionals. In: HLT, pp. 1220–1229. ACL (2011) Zaidan, O.F., Callison-Burch, C.: Crowdsourcing translation: professional quality from non-professionals. In: HLT, pp. 1220–1229. ACL (2011)
20.
Zurück zum Zitat Zhuang, H., Parameswaran, A., Roth, D., Han, J.: Debiasing crowdsourced batches. In: KDD, pp. 1593–1602. ACM (2015) Zhuang, H., Parameswaran, A., Roth, D., Han, J.: Debiasing crowdsourced batches. In: KDD, pp. 1593–1602. ACM (2015)
Metadaten
Titel
Finding High-Quality Unstructured Submissions in General Crowdsourcing Tasks
verfasst von
Shanshan Lyu
Wentao Ouyang
Huawei Shen
Xueqi Cheng
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-01012-6_16

Neuer Inhalt