Skip to main content
Top

2017 | OriginalPaper | Chapter

Crowdsourced Entity Alignment: A Decision Theory Based Approach

Authors : Yan Zhuang, Guoliang Li, Jianhua Feng

Published in: Web Information Systems Engineering – WISE 2017

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Crowdsourcing is a new computation paradigm that utilizes the wisdom of the crowd to solve problems which are difficult for computers (e.g., image annotation and entity alignment). In crowdsourced entity alignment tasks, there are usually large numbers of candidate pairs to be verified by the crowd workers, and each pair will be assigned to multiple workers to achieve high quality. Thus, two fundamental problems are raised: (1) question selection – what are the most beneficial questions that should be crowdsourced, and (2) question assignment – which workers should be assigned to answer a selected question? In this paper, we address these two problems by decision theory. Firstly, we define the problems on two budget constraints. The first takes the marginal gain into account, and the second focuses on the limited budget. Then, we formulate the decision-making problems under different budget constraints and build influence diagram to perform result inference. We propose two efficient algorithms to address these two problems. Finally, we conduct extensive experiments to validate the efficiency and effectiveness of our proposed algorithms on both synthetic and real data.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Arasu, A., Götz, M., Kaushik, R.: On active learning of record matching packages. In: SIGMOD 2010, Indianapolis, Indiana, USA, 6 June–10 June 2010, pp. 783–794 (2010) Arasu, A., Götz, M., Kaushik, R.: On active learning of record matching packages. In: SIGMOD 2010, Indianapolis, Indiana, USA, 6 June–10 June 2010, pp. 783–794 (2010)
2.
go back to reference Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB 5(11), 1495–1506 (2012) Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB 5(11), 1495–1506 (2012)
3.
go back to reference Chai, C., Li, G., Li, J., Deng, D., Feng, J.: Cost-effective crowdsourced entity resolution: a partial-order approach. In: SIGMOD 2016, San Francisco, CA, USA, 26 June–01 July 2016 (2016) Chai, C., Li, G., Li, J., Deng, D., Feng, J.: Cost-effective crowdsourced entity resolution: a partial-order approach. In: SIGMOD 2016, San Francisco, CA, USA, 26 June–01 July 2016 (2016)
4.
go back to reference Mo. L., et al.: Optimizing plurality for human intelligence tasks. In: CIKM 2013, San Francisco, CA, USA, 27 October–1 November 2013, pp. 1929–1938 (2013) Mo. L., et al.: Optimizing plurality for human intelligence tasks. In: CIKM 2013, San Francisco, CA, USA, 27 October–1 November 2013, pp. 1929–1938 (2013)
5.
go back to reference Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: iCrowd: an adaptive crowdsourcing framework. In: SIGMOD, Melbourne, Victoria, Australia, 31 May–4 June 2015, pp. 1015–1030 (2015) Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: iCrowd: an adaptive crowdsourcing framework. In: SIGMOD, Melbourne, Victoria, Australia, 31 May–4 June 2015, pp. 1015–1030 (2015)
6.
go back to reference Getoor, L., Machanavajjhala, A.: Entity resolution for big data. In: KDD 2013, Chicago, IL, USA, 11 August–14 August 2013, p. 1527 (2013) Getoor, L., Machanavajjhala, A.: Entity resolution for big data. In: KDD 2013, Chicago, IL, USA, 11 August–14 August 2013, p. 1527 (2013)
7.
go back to reference Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD 2014, Snowbird, UT, USA, 22 June–27 June 2014, pp. 601–612 (2014) Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD 2014, Snowbird, UT, USA, 22 June–27 June 2014, pp. 601–612 (2014)
8.
go back to reference Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBPedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015) Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBPedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)
9.
go back to reference Li, G.: Human-in-the-loop data integration. PVLDB 10(12), 2006–2017 (2017) Li, G.: Human-in-the-loop data integration. PVLDB 10(12), 2006–2017 (2017)
10.
go back to reference Li, G., Chai, C., Fan, J., Weng, X., Li, J., Zheng, Y., Li, Y., Yu, X., Zhang, X., Yuan, H.: CDB: optimizing queries with crowd-based selections and joins. In: SIGMOD (2017) Li, G., Chai, C., Fan, J., Weng, X., Li, J., Zheng, Y., Li, Y., Yu, X., Zhang, X., Yuan, H.: CDB: optimizing queries with crowd-based selections and joins. In: SIGMOD (2017)
11.
go back to reference Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. IEEE Trans. Knowl. Data Eng. 28(9), 2296–2319 (2016)CrossRef Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. IEEE Trans. Knowl. Data Eng. 28(9), 2296–2319 (2016)CrossRef
12.
go back to reference Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley, New York (1990) Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley, New York (1990)
13.
go back to reference Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Pearson Education, London (2010). (3. internat. ed.) Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Pearson Education, London (2010). (3. internat. ed.)
14.
go back to reference Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. J. Web Sem. 6(3), 203–217 (2008)CrossRef Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. J. Web Sem. 6(3), 203–217 (2008)CrossRef
15.
go back to reference Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014) Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)
16.
go back to reference Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012) Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)
17.
go back to reference Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD 2013, New York, NY, USA, 22 June–27 June 2013 (2013) Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD 2013, New York, NY, USA, 22 June–27 June 2013 (2013)
18.
go back to reference Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: an adaptive approach. In: SIGMOD 2015, Melbourne, Victoria, Australia, 31 May–June 4 2015, pp. 1263–1277 (2015) Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: an adaptive approach. In: SIGMOD 2015, Melbourne, Victoria, Australia, 31 May–June 4 2015, pp. 1263–1277 (2015)
19.
go back to reference Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013) Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)
20.
go back to reference Zheng, Y., Cheng, R., Maniu, S., Mo, L.: On optimality of jury selection in crowdsourcing. In: EDBT 2015, Brussels, Belgium, 23 March–27 March 2015, pp. 193–204 (2015) Zheng, Y., Cheng, R., Maniu, S., Mo, L.: On optimality of jury selection in crowdsourcing. In: EDBT 2015, Brussels, Belgium, 23 March–27 March 2015, pp. 193–204 (2015)
21.
go back to reference Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: is the problem solved? PVLDB 10(5), 541–552 (2017) Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: is the problem solved? PVLDB 10(5), 541–552 (2017)
22.
go back to reference Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: a quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046 (2015) Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: a quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046 (2015)
Metadata
Title
Crowdsourced Entity Alignment: A Decision Theory Based Approach
Authors
Yan Zhuang
Guoliang Li
Jianhua Feng
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-68786-5_2

Premium Partner