Skip to main content
main-content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

2021 | OriginalPaper | Buchkapitel

Graph-Boosted Active Learning for Multi-source Entity Resolution

verfasst von: Anna Primpeli, Christian Bizer

Erschienen in: The Semantic Web – ISWC 2021

Verlag: Springer International Publishing

share
TEILEN

Abstract

Supervised entity resolution methods rely on labeled record pairs for learning matching patterns between two or more data sources. Active learning minimizes the labeling effort by selecting informative pairs for labeling. The existing active learning methods for entity resolution all target two-source matching scenarios and ignore signals that only exist in multi-source settings, such as the Web of Data. In this paper, we propose ALMSER, a graph-boosted active learning method for multi-source entity resolution. To the best of our knowledge, ALMSER is the first active learning-based entity resolution method that is especially tailored to the multi-source setting. ALMSER exploits the rich correspondence graph that exists in multi-source settings for selecting informative record pairs. In addition, the correspondence graph is used to derive complementary training data. We evaluate our method using five multi-source matching tasks having different profiling characteristics. The experimental evaluation shows that leveraging graph signals leads to improved results over active learning methods using margin-based and committee-based query strategies in terms of F1 score on all tasks.
Literatur
1.
Zurück zum Zitat Bellare, K., Curino, C., Machanavajihala, A., Mika, P., Rahurkar, M., Sane, A.: WOO: a scalable and multi-tenant platform for continuous knowledge base synthesis. PVLDB 6(11), 1114–1125 (2013) Bellare, K., Curino, C., Machanavajihala, A., Mika, P., Rahurkar, M., Sane, A.: WOO: a scalable and multi-tenant platform for continuous knowledge base synthesis. PVLDB 6(11), 1114–1125 (2013)
2.
Zurück zum Zitat Bilgic, M., Mihalkova, L., Getoor, L.: Active learning for networked data. In: Proceedings of ICML (2010) Bilgic, M., Mihalkova, L., Getoor, L.: Active learning for networked data. In: Proceedings of ICML (2010)
5.
Zurück zum Zitat Christophides, V., Efthymiou, V., Palpanas, T., Papadakis, G., Stefanidis, K.: An overview of end-to-end entity resolution for big data. ACM Comput. Surv. (CSUR) 53(6), 1–42 (2020) Christophides, V., Efthymiou, V., Palpanas, T., Papadakis, G., Stefanidis, K.: An overview of end-to-end entity resolution for big data. ACM Comput. Surv. (CSUR) 53(6), 1–42 (2020)
6.
Zurück zum Zitat Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969) Fellegi, I.P., Sunter, A.B.: A theory for record linkage. J. Am. Stat. Assoc. 64(328), 1183–1210 (1969)
7.
Zurück zum Zitat Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: Proc. VLDB, 9–16 (2006) Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: Proc. VLDB, 9–16 (2006)
8.
Zurück zum Zitat Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers (2011) Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers (2011)
9.
Zurück zum Zitat Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web Semant. 23, 2–15 (2013) Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web Semant. 23, 2–15 (2013)
10.
Zurück zum Zitat Kasai, J., Qian, K., Gurajada, S., Li, Y., Popa, L.: Low-resource deep entity resolution with transfer and active learning. In: Proceedings of ACL (2019) Kasai, J., Qian, K., Gurajada, S., Li, Y., Popa, L.: Low-resource deep entity resolution with transfer and active learning. In: Proceedings of ACL (2019)
11.
Zurück zum Zitat Konda, P., et al.: Magellan: toward building entity matching management systems over data science stacks. PVLDB 9(13), 1581–1584 (2016) Konda, P., et al.: Magellan: toward building entity matching management systems over data science stacks. PVLDB 9(13), 1581–1584 (2016)
12.
Zurück zum Zitat Konyushkova, K., Sznitman, R., Fua, P.: Learning active learning from data. In: Proceedings of Advances in Neural Information Processing Systems (2017) Konyushkova, K., Sznitman, R., Fua, P.: Learning active learning from data. In: Proceedings of Advances in Neural Information Processing Systems (2017)
13.
Zurück zum Zitat Meduri, V., Popa, L., Sen, P., Sarwat, M.: A comprehensive benchmark framework for active learning methods in entity matching. In: Proceedings of SIGMOD (2020) Meduri, V., Popa, L., Sen, P., Sarwat, M.: A comprehensive benchmark framework for active learning methods in entity matching. In: Proceedings of SIGMOD (2020)
14.
Zurück zum Zitat Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014) Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014)
16.
Zurück zum Zitat Nentwig, M., Hartung, M., Ngonga Ngomo, A.C., Rahm, E.: A survey of current link discovery frameworks. Semant. Web 8(3), 419–436 (2017) Nentwig, M., Hartung, M., Ngonga Ngomo, A.C., Rahm, E.: A survey of current link discovery frameworks. Semant. Web 8(3), 419–436 (2017)
18.
Zurück zum Zitat Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proceedings of ICML (2004) Nguyen, H.T., Smeulders, A.: Active learning using pre-clustering. In: Proceedings of ICML (2004)
19.
Zurück zum Zitat Papadakis, G., Ioannou, E., Thanos, E., Palpanas, T.: The Four Generations of Entity Resolution. Synth. Lect. Data Manag. 16(2), 1–170 (2021) Papadakis, G., Ioannou, E., Thanos, E., Palpanas, T.: The Four Generations of Entity Resolution. Synth. Lect. Data Manag. 16(2), 1–170 (2021)
20.
Zurück zum Zitat Peeters, R., Bizer, C.: Dual-objective fine-tuning of BERT for entity matching. PVLDB 14(10) (2021) Peeters, R., Bizer, C.: Dual-objective fine-tuning of BERT for entity matching. PVLDB 14(10) (2021)
21.
Zurück zum Zitat Primpeli, A., Bizer, C.: Profiling entity matching benchmark tasks. In: Proceedings of CIKM (2020) Primpeli, A., Bizer, C.: Profiling entity matching benchmark tasks. In: Proceedings of CIKM (2020)
23.
Zurück zum Zitat Primpeli, A., Peeters, R., Bizer, C.: The WDC training dataset and gold standard for large-scale product matching. In: Companion Proceedings of WWW (2019) Primpeli, A., Peeters, R., Bizer, C.: The WDC training dataset and gold standard for large-scale product matching. In: Companion Proceedings of WWW (2019)
26.
Zurück zum Zitat Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings of SIGKDD (2002) Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: Proceedings of SIGKDD (2002)
27.
Zurück zum Zitat Settles, B.: Active Learning: Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2012) Settles, B.: Active Learning: Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers (2012)
28.
Zurück zum Zitat Shen, W., DeRose, P., Vu, L., Doan, A., Ramakrishnan, R.: Source-aware entity matching: a compositional approach. In: Proceedings of ICDE (2007) Shen, W., DeRose, P., Vu, L., Doan, A., Ramakrishnan, R.: Source-aware entity matching: a compositional approach. In: Proceedings of ICDE (2007)
Metadaten
Titel
Graph-Boosted Active Learning for Multi-source Entity Resolution
verfasst von
Anna Primpeli
Christian Bizer
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-88361-4_11

Premium Partner