Skip to main content

21.11.2023 | Special Issue Paper

Alfa: active learning for graph neural network-based semantic schema alignment

verfasst von: Venkata Vamsikrishna Meduri, Abdul Quamar, Chuan Lei, Xiao Qin, Berthold Reinwald

Erschienen in: The VLDB Journal

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Semantic schema alignment aims to match elements across a pair of schemas based on their semantic representation. It is a key primitive for data integration that facilitates the creation of a common data fabric across heterogeneous data sources. Deep learning approaches such as graph representation learning have shown promise for effective alignment of semantically rich schemas, often captured as ontologies. Most of these approaches are supervised and require large amounts of labeled training data, which is expensive in terms of cost and manual labor. Active learning (AL) techniques can alleviate this issue by intelligently choosing the data to be labeled utilizing a human-in-the-loop approach, while minimizing the amount of labeled training data required. However, existing active learning techniques are limited in their ability to utilize the rich semantic information from underlying schemas. Therefore, they cannot drive effective and efficient sample selection for human labeling that is necessary to scale to larger datasets. In this paper, we propose Alfa, an active learning framework to overcome these limitations. Alfa exploits the schema element properties as well as the relationships between schema elements (structure) to drive a novel ontology-aware sample selection and label propagation algorithm for training highly accurate alignment models. We propose semantic blocking to scale to larger datasets without compromising model quality. Our experimental results across three real-world datasets show that (1) Alfa leads to a substantial reduction (27–82%) in the cost of human labeling, (2) semantic blocking reduces label skew up to 40\(\times \) without adversely affecting model quality and scales AL to large datasets, and (3) sample selection achieves comparable schema matching quality (90% F1-score) to models trained on the entire set of available training data. We also show that Alfa outperforms the state-of-the-art ontology alignment system, BERTMap, in terms of (1) 10\(\times \) shorter time per AL iteration and (2) requiring half of the AL iterations to achieve the highest convergent F1-score.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Fußnoten
1
Note that variants of blocking which are applied in each AL iteration exist in the entity matching literature [29].
 
2
Scikit library uses Euclidean distance by default for K-means clustering, replacing which by other metrics does not bring a significant difference in clustering quality.
 
3
Semantically similar means that the constituent nodes have the same cluster belongingness as the nodes in the labeled pair. Ranking by node similarity covers the constraint that the constituent node similarity should be larger or smaller than the node similarity of the labeled pair depending on matching or non-matching label being propagated.
 
Literatur
5.
Zurück zum Zitat Bento, A., Zouaq, A., Gagnon, M.: Ontology matching using convolutional neural networks. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pp. 5648–5653. European Language Resources Association (2020) Bento, A., Zouaq, A., Gagnon, M.: Ontology matching using convolutional neural networks. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pp. 5648–5653. European Language Resources Association (2020)
6.
Zurück zum Zitat Berrendorf, M., Faerman, E., Tresp, V.: Active learning for entity alignment. In: Hiemstra, D., Moens, M., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (Eds.) Advances in Information Retrieval—43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12656, pp. 48–62. Springer (2021). https://doi.org/10.1007/978-3-030-72113-8_4 Berrendorf, M., Faerman, E., Tresp, V.: Active learning for entity alignment. In: Hiemstra, D., Moens, M., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (Eds.) Advances in Information Retrieval—43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part I, Lecture Notes in Computer Science, vol. 12656, pp. 48–62. Springer (2021). https://​doi.​org/​10.​1007/​978-3-030-72113-8_​4
7.
Zurück zum Zitat Beygelzimer, A., Dasgupta, S., Langford, J.: Importance weighted active learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (Eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009, ACM International Conference Proceeding Series, vol. 382, pp. 49–56. ACM (2009). https://doi.org/10.1145/1553374.1553381 Beygelzimer, A., Dasgupta, S., Langford, J.: Importance weighted active learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (Eds.) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009, ACM International Conference Proceeding Series, vol. 382, pp. 49–56. ACM (2009). https://​doi.​org/​10.​1145/​1553374.​1553381
10.
Zurück zum Zitat Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Sung, Y., Strope, B., Kurzweil, R.: Universal sentence encoder. CoRR arXiv:1803.11175 (2018) Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N., John, R.S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Sung, Y., Strope, B., Kurzweil, R.: Universal sentence encoder. CoRR arXiv:​1803.​11175 (2018)
13.
Zurück zum Zitat Chen, J., Jiménez-Ruiz, E., Horrocks, I., Antonyrajah, D., Hadian, A., Lee, J.: Augmenting ontology alignment by semantic embedding and distant supervision. In: Verborgh, R., Hose, K., Paulheim, H., Champin, P., Maleshkova, M., Corcho, Ó., Ristoski, P., Alam, M. (Eds.) The Semantic Web—18th International Conference, ESWC 2021, Virtual Event, June 6–10, 2021, Proceedings, Lecture Notes in Computer Science, vol. 12731, pp. 392–408. Springer (2021). https://doi.org/10.1007/978-3-030-77385-4_23 Chen, J., Jiménez-Ruiz, E., Horrocks, I., Antonyrajah, D., Hadian, A., Lee, J.: Augmenting ontology alignment by semantic embedding and distant supervision. In: Verborgh, R., Hose, K., Paulheim, H., Champin, P., Maleshkova, M., Corcho, Ó., Ristoski, P., Alam, M. (Eds.) The Semantic Web—18th International Conference, ESWC 2021, Virtual Event, June 6–10, 2021, Proceedings, Lecture Notes in Computer Science, vol. 12731, pp. 392–408. Springer (2021). https://​doi.​org/​10.​1007/​978-3-030-77385-4_​23
15.
Zurück zum Zitat Cheng, A., Zhou, C., Yang, H., Wu, J., Li, L., Tan, J., Guo, L.: Deep active learning for anchor user prediction. In: Kraus, S. (Ed.) Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10–16, 2019, pp. 2151–2157. ijcai.org (2019). https://doi.org/10.24963/ijcai.2019/298 Cheng, A., Zhou, C., Yang, H., Wu, J., Li, L., Tan, J., Guo, L.: Deep active learning for anchor user prediction. In: Kraus, S. (Ed.) Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10–16, 2019, pp. 2151–2157. ijcai.org (2019). https://​doi.​org/​10.​24963/​ijcai.​2019/​298
16.
Zurück zum Zitat Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 66, 201–221 (1994)CrossRef Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 66, 201–221 (1994)CrossRef
17.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://doi.org/10.18653/v1/N19-1423 Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics, Minneapolis (2019). https://​doi.​org/​10.​18653/​v1/​N19-1423
18.
Zurück zum Zitat Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I.F., Couto, F.M.: The agreementmakerlight ontology matching system. In: Meersman, R., Panetto, H., Dillon, T.S., Eder, J., Bellahsene, Z., Ritter, N., Leenheer, P.D., Dou, D. (Eds.) On the Move to Meaningful Internet Systems: OTM 2013 Conferences—Confederated International Conferences: CoopIS, DOA-Trusted Cloud, and ODBASE 2013, Graz, Austria, September 9–13, 2013. Proceedings, Lecture Notes in Computer Science, vol. 8185, pp. 527–541. Springer (2013). https://doi.org/10.1007/978-3-642-41030-7_38 Faria, D., Pesquita, C., Santos, E., Palmonari, M., Cruz, I.F., Couto, F.M.: The agreementmakerlight ontology matching system. In: Meersman, R., Panetto, H., Dillon, T.S., Eder, J., Bellahsene, Z., Ritter, N., Leenheer, P.D., Dou, D. (Eds.) On the Move to Meaningful Internet Systems: OTM 2013 Conferences—Confederated International Conferences: CoopIS, DOA-Trusted Cloud, and ODBASE 2013, Graz, Austria, September 9–13, 2013. Proceedings, Lecture Notes in Computer Science, vol. 8185, pp. 527–541. Springer (2013). https://​doi.​org/​10.​1007/​978-3-642-41030-7_​38
20.
Zurück zum Zitat Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28(2–3), 133–168 (1997)CrossRefMATH Freund, Y., Seung, H., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28(2–3), 133–168 (1997)CrossRefMATH
21.
Zurück zum Zitat Gal, A., Roitman, H., Sagi, T.: From diversity-based prediction to better ontology & schema matching. In: Bourdeau, J., Hendler, J., Nkambou, R., Horrocks, I., Zhao, B.Y. (Eds.) Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11–15, 2016, pp. 1145–1155. ACM (2016). https://doi.org/10.1145/2872427.2882999 Gal, A., Roitman, H., Sagi, T.: From diversity-based prediction to better ontology & schema matching. In: Bourdeau, J., Hendler, J., Nkambou, R., Horrocks, I., Zhao, B.Y. (Eds.) Proceedings of the 25th International Conference on World Wide Web, WWW 2016, Montreal, Canada, April 11–15, 2016, pp. 1145–1155. ACM (2016). https://​doi.​org/​10.​1145/​2872427.​2882999
22.
23.
Zurück zum Zitat Gao, L., Yang, H., Zhou, C., Wu, J., Pan, S., Hu, Y.: Active discriminative network representation learning. In: IJCAI International Joint Conference on Artificial Intelligence (2018) Gao, L., Yang, H., Zhou, C., Wu, J., Pan, S., Hu, Y.: Active discriminative network representation learning. In: IJCAI International Joint Conference on Artificial Intelligence (2018)
24.
25.
Zurück zum Zitat Hao, J., Lei, C., Efthymiou, V., Quamar, A., Özcan, F., Sun, Y., Wang, W.: MEDTO: medical data to ontology matching using hybrid graph neural networks. In: Zhu, F., Ooi, B.C., Miao, C. (Eds.) KDD’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14–18, 2021, pp. 2946–2954. ACM (2021). https://doi.org/10.1145/3447548.3467138 Hao, J., Lei, C., Efthymiou, V., Quamar, A., Özcan, F., Sun, Y., Wang, W.: MEDTO: medical data to ontology matching using hybrid graph neural networks. In: Zhu, F., Ooi, B.C., Miao, C. (Eds.) KDD’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14–18, 2021, pp. 2946–2954. ACM (2021). https://​doi.​org/​10.​1145/​3447548.​3467138
27.
Zurück zum Zitat He, Y., Chen, J., Antonyrajah, D., Horrocks, I.: Bertmap: a BERT-based ontology alignment system. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22–March 1, 2022, pp. 5684–5691. AAAI Press (2022). https://ojs.aaai.org/index.php/AAAI/article/view/20510 He, Y., Chen, J., Antonyrajah, D., Horrocks, I.: Bertmap: a BERT-based ontology alignment system. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22–March 1, 2022, pp. 5684–5691. AAAI Press (2022). https://​ojs.​aaai.​org/​index.​php/​AAAI/​article/​view/​20510
28.
Zurück zum Zitat Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: a semi-automatic tool for schema mapping. In: Mehrotra, S., Sellis, T.K. (Eds.) Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, May 21–24, 2001, p. 607. ACM (2001). https://doi.org/10.1145/375663.375767 Hernández, M.A., Miller, R.J., Haas, L.M.: Clio: a semi-automatic tool for schema mapping. In: Mehrotra, S., Sellis, T.K. (Eds.) Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, May 21–24, 2001, p. 607. ACM (2001). https://​doi.​org/​10.​1145/​375663.​375767
30.
Zurück zum Zitat Jiménez-Ruiz, E., Grau, B.C.: Logmap: logic-based and scalable ontology matching. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N.F., Blomqvist, E. (Eds.) The Semantic Web—ISWC 2011—10th International Semantic Web Conference, Bonn, Germany, October 23–27, 2011, Proceedings, Part I, Lecture Notes in Computer Science, vol. 7031, pp. 273–288. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-25073-6_18 Jiménez-Ruiz, E., Grau, B.C.: Logmap: logic-based and scalable ontology matching. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N.F., Blomqvist, E. (Eds.) The Semantic Web—ISWC 2011—10th International Semantic Web Conference, Bonn, Germany, October 23–27, 2011, Proceedings, Part I, Lecture Notes in Computer Science, vol. 7031, pp. 273–288. Springer, Berlin (2011). https://​doi.​org/​10.​1007/​978-3-642-25073-6_​18
31.
Zurück zum Zitat Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)CrossRef Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with GPUs. IEEE Trans. Big Data 7(3), 535–547 (2019)CrossRef
32.
Zurück zum Zitat Jurisch, M., Igler, B.: Graph-convolution-based classification for ontology alignment change prediction. In: Proceedings of the Workshop on Deep Learning for Knowledge Graphs (DL4KG2019) Co-located with (ESWC 2019), vol. 2377, pp. 11–20. CEUR-WS.org (2019) Jurisch, M., Igler, B.: Graph-convolution-based classification for ontology alignment change prediction. In: Proceedings of the Workshop on Deep Learning for Knowledge Graphs (DL4KG2019) Co-located with (ESWC 2019), vol. 2377, pp. 11–20. CEUR-WS.org (2019)
33.
Zurück zum Zitat Kasai, J., Qian, K., Gurajada, S., Li, Y., Popa, L.: Low-resource deep entity resolution with transfer and active learning. In: Korhonen, A., Traum, D.R., Màrquez, L. (Eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, Volume 1: Long Papers, pp. 5851–5861. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/p19-1586 Kasai, J., Qian, K., Gurajada, S., Li, Y., Popa, L.: Low-resource deep entity resolution with transfer and active learning. In: Korhonen, A., Traum, D.R., Màrquez, L. (Eds.) Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28–August 2, 2019, Volume 1: Long Papers, pp. 5851–5861. Association for Computational Linguistics (2019). https://​doi.​org/​10.​18653/​v1/​p19-1586
34.
Zurück zum Zitat Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net (2017). https://openreview.net/forum?id=SJU4ayYgl Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net (2017). https://​openreview.​net/​forum?​id=​SJU4ayYgl
35.
Zurück zum Zitat Konda, P., Das, S.C., Gory, P.S., Doan, A., Ardalan, A., Ballard, J.R., Li, H., Panahi, F., Zhang, H., Naughton, J.F., Prasad, S., Krishnan, G., Deep, R., Raghavendra, V.: Magellan: Toward building entity matching management systems. PVLDB 9(12), 1197–1208 (2016). https://doi.org/10.14778/2994509.2994535 Konda, P., Das, S.C., Gory, P.S., Doan, A., Ardalan, A., Ballard, J.R., Li, H., Panahi, F., Zhang, H., Naughton, J.F., Prasad, S., Krishnan, G., Deep, R., Raghavendra, V.: Magellan: Toward building entity matching management systems. PVLDB 9(12), 1197–1208 (2016). https://​doi.​org/​10.​14778/​2994509.​2994535
37.
Zurück zum Zitat MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (Eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967) MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (Eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press (1967)
39.
Zurück zum Zitat Meduri, V.V., Popa, L., Sen, P., Sarwat, M.: A comprehensive benchmark framework for active learning methods in entity matching. In: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, Online Conference [Portland, OR, USA], June 14–19, 2020, pp. 1133–1147 (2020). https://doi.org/10.1145/3318464.3380597 Meduri, V.V., Popa, L., Sen, P., Sarwat, M.: A comprehensive benchmark framework for active learning methods in entity matching. In: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, Online Conference [Portland, OR, USA], June 14–19, 2020, pp. 1133–1147 (2020). https://​doi.​org/​10.​1145/​3318464.​3380597
40.
Zurück zum Zitat Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014) Mozafari, B., Sarkar, P., Franklin, M., Jordan, M., Madden, S.: Scaling up crowd-sourcing to very large datasets: a case for active learning. PVLDB 8(2), 125–136 (2014)
41.
Zurück zum Zitat Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., Raghavendra, V.: Deep learning for entity matching: a design space exploration. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD’18, pp. 19–34. ACM, New York (2018). https://doi.org/10.1145/3183713.3196926 Mudgal, S., Li, H., Rekatsinas, T., Doan, A., Park, Y., Krishnan, G., Deep, R., Arcaute, E., Raghavendra, V.: Deep learning for entity matching: a design space exploration. In: Proceedings of the 2018 International Conference on Management of Data, SIGMOD’18, pp. 19–34. ACM, New York (2018). https://​doi.​org/​10.​1145/​3183713.​3196926
47.
Zurück zum Zitat Ostapuk, N., Yang, J., Cudré-Mauroux, P.: Activelink: deep active learning for link prediction in knowledge graphs. In: Liu, L., White, R.W., Mantrach, A., Silvestri, F., McAuley, J.J., Baeza-Yates, R., Zia, L. (Eds.) The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13–17, 2019, pp. 1398–1408. ACM (2019). https://doi.org/10.1145/3308558.3313620 Ostapuk, N., Yang, J., Cudré-Mauroux, P.: Activelink: deep active learning for link prediction in knowledge graphs. In: Liu, L., White, R.W., Mantrach, A., Silvestri, F., McAuley, J.J., Baeza-Yates, R., Zia, L. (Eds.) The World Wide Web Conference, WWW 2019, San Francisco, CA, USA, May 13–17, 2019, pp. 1398–1408. ACM (2019). https://​doi.​org/​10.​1145/​3308558.​3313620
50.
Zurück zum Zitat Qian, K., Popa, L., Sen, P.: Active learning for large-scale entity resolution. In: Lim, E., Winslett, M., Sanderson, M., Fu, A.W., Sun, J., Culpepper, J.S., Lo, E., Ho, J.C., Donato, D., Agrawal, R., Zheng, Y., Castillo, C., Sun, A., Tseng, V.S., Li, C. (Eds.) Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 6–10, 2017, pp. 1379–1388. ACM (2017). https://doi.org/10.1145/3132847.3132949 Qian, K., Popa, L., Sen, P.: Active learning for large-scale entity resolution. In: Lim, E., Winslett, M., Sanderson, M., Fu, A.W., Sun, J., Culpepper, J.S., Lo, E., Ho, J.C., Donato, D., Agrawal, R., Zheng, Y., Castillo, C., Sun, A., Tseng, V.S., Li, C. (Eds.) Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, CIKM 2017, Singapore, November 6–10, 2017, pp. 1379–1388. ACM (2017). https://​doi.​org/​10.​1145/​3132847.​3132949
52.
Zurück zum Zitat Qian, K., Raman, P.C., Li, Y., Popa, L.: Learning structured representations of entity names using active learning and weak supervision. In: Webber, B., Cohn, T., He, Y., Liu, Y. (Eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pp. 6376–6383. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.emnlp-main.517 Qian, K., Raman, P.C., Li, Y., Popa, L.: Learning structured representations of entity names using active learning and weak supervision. In: Webber, B., Cohn, T., He, Y., Liu, Y. (Eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16–20, 2020, pp. 6376–6383. Association for Computational Linguistics (2020). https://​doi.​org/​10.​18653/​v1/​2020.​emnlp-main.​517
53.
Zurück zum Zitat Qin, X., Sheikh, N., Reinwald, B., Wu, L.: Relation-aware graph attention model with adaptive self-adversarial training. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp. 9368–9376. AAAI Press (2021). https://ojs.aaai.org/index.php/AAAI/article/view/17129 Qin, X., Sheikh, N., Reinwald, B., Wu, L.: Relation-aware graph attention model with adaptive self-adversarial training. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp. 9368–9376. AAAI Press (2021). https://​ojs.​aaai.​org/​index.​php/​AAAI/​article/​view/​17129
56.
Zurück zum Zitat Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)CrossRefMATH Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)CrossRefMATH
57.
Zurück zum Zitat Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Brodley, C.E., Danyluk, A.P. (Eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28–July 1, 2001, pp. 441–448. Morgan Kaufmann (2001) Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Brodley, C.E., Danyluk, A.P. (Eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28–July 1, 2001, pp. 441–448. Morgan Kaufmann (2001)
60.
Zurück zum Zitat Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: KDD, pp. 269–278 (2002) Sarawagi, S., Bhamidipaty, A.: Interactive deduplication using active learning. In: KDD, pp. 269–278 (2002)
61.
Zurück zum Zitat Satopaa, V., Albrecht, J.R., Irwin, D.E., Raghavan, B.: Finding a “kneedle" in a haystack: detecting knee points in system behavior. In: ICDCS Workshops, pp. 166–171 (2011) Satopaa, V., Albrecht, J.R., Irwin, D.E., Raghavan, B.: Finding a “kneedle" in a haystack: detecting knee points in system behavior. In: ICDCS Workshops, pp. 166–171 (2011)
63.
Zurück zum Zitat Seung, H., Opper, M., Sompolinsky, H.: Query by committee. In: Workshop on COLT, pp. 287–294 (1992) Seung, H., Opper, M., Sompolinsky, H.: Query by committee. In: Workshop on COLT, pp. 287–294 (1992)
66.
Zurück zum Zitat ten Cate, B., Kolaitis, P.G., Qian, K., Tan, W.: Active learning of GAV schema mappings. In: den Bussche, J.V., Arenas, M. (Eds.) Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10–15, 2018, pp. 355–368. ACM (2018). https://doi.org/10.1145/3196959.3196974 ten Cate, B., Kolaitis, P.G., Qian, K., Tan, W.: Active learning of GAV schema mappings. In: den Bussche, J.V., Arenas, M. (Eds.) Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Houston, TX, USA, June 10–15, 2018, pp. 355–368. ACM (2018). https://​doi.​org/​10.​1145/​3196959.​3196974
69.
Zurück zum Zitat Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012) Wang, J., Kraska, T., Franklin, M.J., Feng, J.: CrowdER: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)
71.
Zurück zum Zitat Wang, Z., Cruz, I.F.: Agreementmakerdeep results for OAEI 2021. In: Shvaiko, P., Euzenat, J., Jiménez-Ruiz, E., Hassanzadeh, O., Trojahn, C. (Eds.) Proceedings of the 16th International Workshop on Ontology Matching co-located with the 20th International Semantic Web Conference (ISWC 2021), Virtual conference, October 25, 2021, CEUR Workshop Proceedings, vol. 3063, pp. 124–130. CEUR-WS.org (2021). http://ceur-ws.org/Vol-3063/oaei21_paper3.pdf Wang, Z., Cruz, I.F.: Agreementmakerdeep results for OAEI 2021. In: Shvaiko, P., Euzenat, J., Jiménez-Ruiz, E., Hassanzadeh, O., Trojahn, C. (Eds.) Proceedings of the 16th International Workshop on Ontology Matching co-located with the 20th International Semantic Web Conference (ISWC 2021), Virtual conference, October 25, 2021, CEUR Workshop Proceedings, vol. 3063, pp. 124–130. CEUR-WS.org (2021). http://​ceur-ws.​org/​Vol-3063/​oaei21_​paper3.​pdf
72.
Zurück zum Zitat Wu, R., Chaba, S., Sawlani, S., Chu, X., Thirumuruganathan, S.: Zeroer: entity resolution using zero labeled examples. In: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, Online Conference [Portland, OR, USA], June 14–19, 2020, pp. 1149–1164 (2020). https://doi.org/10.1145/3318464.3389743 Wu, R., Chaba, S., Sawlani, S., Chu, X., Thirumuruganathan, S.: Zeroer: entity resolution using zero labeled examples. In: Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, Online Conference [Portland, OR, USA], June 14–19, 2020, pp. 1149–1164 (2020). https://​doi.​org/​10.​1145/​3318464.​3389743
73.
Zurück zum Zitat Wu, Y., Xu, Y., Singh, A., Yang, Y., Dubrawski, A.: Active learning for graph neural networks via node feature propagation. arXiv preprint arXiv:1910.07567 (2019) Wu, Y., Xu, Y., Singh, A., Yang, Y., Dubrawski, A.: Active learning for graph neural networks via node feature propagation. arXiv preprint arXiv:​1910.​07567 (2019)
74.
Zurück zum Zitat Yan, Y., Liu, L., Ban, Y., Jing, B., Tong, H.: Dynamic knowledge graph alignment. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp. 4564–4572. AAAI Press (2021). https://ojs.aaai.org/index.php/AAAI/article/view/16585 Yan, Y., Liu, L., Ban, Y., Jing, B., Tong, H.: Dynamic knowledge graph alignment. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp. 4564–4572. AAAI Press (2021). https://​ojs.​aaai.​org/​index.​php/​AAAI/​article/​view/​16585
76.
Zurück zum Zitat Zhang, W., Shen, Y., Li, Y., Chen, L., Yang, Z., Cui, B.: ALG: fast and accurate active learning framework for graph convolutional networks. In: Li, G., Li, Z., Idreos, S., Srivastava, D. (Eds.) SIGMOD’21: International Conference on Management of Data, Virtual Event, China, June 20–25, 2021, pp. 2366–2374. ACM (2021). https://doi.org/10.1145/3448016.3457325 Zhang, W., Shen, Y., Li, Y., Chen, L., Yang, Z., Cui, B.: ALG: fast and accurate active learning framework for graph convolutional networks. In: Li, G., Li, Z., Idreos, S., Srivastava, D. (Eds.) SIGMOD’21: International Conference on Management of Data, Virtual Event, China, June 20–25, 2021, pp. 2366–2374. ACM (2021). https://​doi.​org/​10.​1145/​3448016.​3457325
77.
Zurück zum Zitat Zhang, W., Wei, H., Sisman, B., Dong, X.L., Faloutsos, C., Page, D.: Autoblock: a hands-off blocking framework for entity matching. In: Caverlee, J., Hu, X.B., Lalmas, M., Wang, W. (Eds.) WSDM’20: The Thirteenth ACM International Conference on Web Search and Data Mining, Houston, TX, USA, February 3–7, 2020, pp. 744–752. ACM (2020). https://doi.org/10.1145/3336191.3371813 Zhang, W., Wei, H., Sisman, B., Dong, X.L., Faloutsos, C., Page, D.: Autoblock: a hands-off blocking framework for entity matching. In: Caverlee, J., Hu, X.B., Lalmas, M., Wang, W. (Eds.) WSDM’20: The Thirteenth ACM International Conference on Web Search and Data Mining, Houston, TX, USA, February 3–7, 2020, pp. 744–752. ACM (2020). https://​doi.​org/​10.​1145/​3336191.​3371813
Metadaten
Titel
Alfa: active learning for graph neural network-based semantic schema alignment
verfasst von
Venkata Vamsikrishna Meduri
Abdul Quamar
Chuan Lei
Xiao Qin
Berthold Reinwald
Publikationsdatum
21.11.2023
Verlag
Springer Berlin Heidelberg
Erschienen in
The VLDB Journal
Print ISSN: 1066-8888
Elektronische ISSN: 0949-877X
DOI
https://doi.org/10.1007/s00778-023-00822-z