Skip to main content
Top

2025 | OriginalPaper | Chapter

PreAdapter: Pre-training Language Models on Knowledge Graphs

Authors : Janna Omeliyanenko, Andreas Hotho, Daniel Schlör

Published in: The Semantic Web – ISWC 2024

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Pre-trained language models have demonstrated state-of-the-art performance in various downstream tasks such as summarization, sentiment classification, and question answering. Leveraging vast amounts of textual data during training, these models inherently hold a certain amount of factual knowledge, which is particularly beneficial for knowledge-driven tasks such as question answering. However, the knowledge implicitly contained within the language models is not complete. Consequently, many studies incorporate additional knowledge from Semantic Web resources such as knowledge graphs, which provide an explicit representation of knowledge in the form of triples.
Seamless integration of this knowledge into language models remains an active research area. Direct pre-training of language models on knowledge graphs followed by fine-tuning on downstream tasks has proven ineffective, primarily due to the catastrophic forgetting effect. Many approaches suggest fusing language models with graph embedding models to enrich language models with information from knowledge graphs, showing improvement over solutions that lack knowledge graph integration in downstream tasks. However, these methods often require additional computational overhead, for instance, by training graph embedding models.
In our work, we propose a novel adapter-based method for integrating knowledge graphs into language models through pre-training. This approach effectively mitigates catastrophic forgetting that can otherwise affect both the original language modeling capabilities and the access to pre-trained knowledge. Through this scheme, our approach ensures access to both the original capabilities of the language model and the integrated Semantic Web knowledge during fine-tuning on downstream tasks. Experimental results on multiple choice question answering tasks demonstrate performance improvements compared to baseline models without knowledge graph integration and other pre-training-based knowledge integration methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Our code is publicly available at: https://​professor-x.​de/​code-preadapter.
 
Literature
1.
go back to reference Aksenov, D., Moreno-Schneider, J., Bourgonje, P., Schwarzenberg, R., Hennig, L., Rehm, G.: Abstractive text summarization based on language model conditioning and locality modeling. In: Calzolari, N. et al. (eds.) Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6680–6689 (May 2020) Aksenov, D., Moreno-Schneider, J., Bourgonje, P., Schwarzenberg, R., Hennig, L., Rehm, G.: Abstractive text summarization based on language model conditioning and locality modeling. In: Calzolari, N. et al. (eds.) Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6680–6689 (May 2020)
2.
go back to reference Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating Embeddings for Modeling Multi-relational Data. In: Advances in Neural Information Processing Systems, vol. 26 (2013) Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating Embeddings for Modeling Multi-relational Data. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
3.
go back to reference Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi, Y.: COMET: commonsense Transformers for Automatic Knowledge Graph Construction. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4762–4779 (Jul 2019) Bosselut, A., Rashkin, H., Sap, M., Malaviya, C., Celikyilmaz, A., Choi, Y.: COMET: commonsense Transformers for Automatic Knowledge Graph Construction. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 4762–4779 (Jul 2019)
4.
go back to reference Delange, M., et al.: A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. (2021) Delange, M., et al.: A continual learning survey: Defying forgetting in classification tasks. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
5.
go back to reference Feng, Y., Chen, X., Lin, B.Y., Wang, P., Yan, J., Ren, X.: Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering (Sep 2020) Feng, Y., Chen, X., Lin, B.Y., Wang, P., Yan, J., Ren, X.: Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering (Sep 2020)
6.
go back to reference Fichtel, L., Kalo, J.C., Balke, W.T.: Prompt tuning or fine-tuning-investigating relational knowledge in pre-trained language models. In: 3rd Conference on Automated Knowledge Base Construction (2021) Fichtel, L., Kalo, J.C., Balke, W.T.: Prompt tuning or fine-tuning-investigating relational knowledge in pre-trained language models. In: 3rd Conference on Automated Knowledge Base Construction (2021)
8.
go back to reference Houlsby, N., et al.: Parameter-efficient transfer learning for nlp. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019) Houlsby, N., et al.: Parameter-efficient transfer learning for nlp. In: International Conference on Machine Learning, pp. 2790–2799. PMLR (2019)
10.
go back to reference Ke, Z., Lin, H., Shao, Y., Xu, H., Shu, L., Liu, B.: Continual training of language models for few-shot learning. arXiv preprint arXiv:2210.05549 (2022) Ke, Z., Lin, H., Shao, Y., Xu, H., Shu, L., Liu, B.: Continual training of language models for few-shot learning. arXiv preprint arXiv:​2210.​05549 (2022)
11.
go back to reference Ke, Z., Liu, B., Ma, N., Xu, H., Shu, L.: Achieving forgetting prevention and knowledge transfer in continual learning. Adv. Neural. Inf. Process. Syst. 34, 22443–22456 (2021) Ke, Z., Liu, B., Ma, N., Xu, H., Shu, L.: Achieving forgetting prevention and knowledge transfer in continual learning. Adv. Neural. Inf. Process. Syst. 34, 22443–22456 (2021)
13.
go back to reference Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)MathSciNetCrossRef Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)MathSciNetCrossRef
14.
go back to reference Lauscher, A., Majewska, O., Ribeiro, L.F., Gurevych, I., Rozanov, N., Glavaš, G.: Common sense or world knowledge? investigating adapter-based knowledge injection into pretrained transformers. In: Proceedings of Deep Learning Inside Out (DeeLIO): the First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 43–49 (2020) Lauscher, A., Majewska, O., Ribeiro, L.F., Gurevych, I., Rozanov, N., Glavaš, G.: Common sense or world knowledge? investigating adapter-based knowledge injection into pretrained transformers. In: Proceedings of Deep Learning Inside Out (DeeLIO): the First Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, pp. 43–49 (2020)
15.
go back to reference Lin, B.Y., Chen, X., Chen, J., Ren, X.: KagNet: knowledge-aware graph networks for commonsense reasoning. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2829–2839 (Nov 2019) Lin, B.Y., Chen, X., Chen, J., Ren, X.: KagNet: knowledge-aware graph networks for commonsense reasoning. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2829–2839 (Nov 2019)
16.
go back to reference Liu, W., et al.: K-BERT: enabling Language Representation with Knowledge Graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34(03), pp. 2901–2908 (2020) Liu, W., et al.: K-BERT: enabling Language Representation with Knowledge Graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34(03), pp. 2901–2908 (2020)
17.
go back to reference Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach (Sep 2019) Liu, Y., et al.: RoBERTa: A Robustly Optimized BERT Pretraining Approach (Sep 2019)
18.
go back to reference Luo, H., et al.: Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models (2023) Luo, H., et al.: Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models (2023)
19.
go back to reference Luo, L., Li, Y.F., Haf, R., Pan, S.: Reasoning on graphs: faithful and interpretable large language model reasoning. In: The Twelfth International Conference on Learning Representations (Oct 2023) Luo, L., Li, Y.F., Haf, R., Pan, S.: Reasoning on graphs: faithful and interpretable large language model reasoning. In: The Twelfth International Conference on Learning Representations (Oct 2023)
20.
go back to reference Mihaylov, T., Clark, P., Khot, T., Sabharwal, A.: Can a suit of armor conduct electricity? a new dataset for open book question answering. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2381–2391 (Oct 2018) Mihaylov, T., Clark, P., Khot, T., Sabharwal, A.: Can a suit of armor conduct electricity? a new dataset for open book question answering. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2381–2391 (Oct 2018)
21.
go back to reference Moiseev, F., Dong, Z., Alfonseca, E., Jaggi, M.: SKILL: structured knowledge infusion for large language models. In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V. (eds.) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1581–1588 (Jul 2022) Moiseev, F., Dong, Z., Alfonseca, E., Jaggi, M.: SKILL: structured knowledge infusion for large language models. In: Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V. (eds.) Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1581–1588 (Jul 2022)
23.
go back to reference Omeliyanenko, J., Zehe, A., Hotho, A., Schlör, D.: CapsKG: enabling continual knowledge integration in language models for automatic knowledge graph completion. In: Payne, T.R., et al.(eds.) The Semantic Web - ISWC 2023. pp. 618–636. Springer Nature Switzerland, Cham (2023). https://doi.org/10.1007/978-3-031-47240-4_33 Omeliyanenko, J., Zehe, A., Hotho, A., Schlör, D.: CapsKG: enabling continual knowledge integration in language models for automatic knowledge graph completion. In: Payne, T.R., et al.(eds.) The Semantic Web - ISWC 2023. pp. 618–636. Springer Nature Switzerland, Cham (2023). https://​doi.​org/​10.​1007/​978-3-031-47240-4_​33
24.
go back to reference Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng., 1–20 (2024) Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng., 1–20 (2024)
25.
go back to reference Pan, X., et al.: Improving Question Answering with External Knowledge Pan, X., et al.: Improving Question Answering with External Knowledge
27.
go back to reference Rajani, N.F., McCann, B., Xiong, C., Socher, R.: Explain Yourself! Leveraging Language Models for Commonsense Reasoning arXiv:1906.02361 [cs] (Jun 2019) Rajani, N.F., McCann, B., Xiong, C., Socher, R.: Explain Yourself! Leveraging Language Models for Commonsense Reasoning arXiv:​1906.​02361 [cs] (Jun 2019)
28.
go back to reference Roberts, A., Raffel, C., Shazeer, N.: How Much Knowledge Can You Pack Into the Parameters of a Language Model? arXiv:2002.08910 [cs, stat] (Oct 2020) Roberts, A., Raffel, C., Shazeer, N.: How Much Knowledge Can You Pack Into the Parameters of a Language Model? arXiv:​2002.​08910 [cs, stat] (Oct 2020)
29.
go back to reference Shen, T., Mao, Y., He, P., Long, G., Trischler, A., Chen, W.: Exploiting structured knowledge in text via graph-guided representation learning. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8980–8994 (Nov 2020) Shen, T., Mao, Y., He, P., Long, G., Trischler, A., Chen, W.: Exploiting structured knowledge in text via graph-guided representation learning. In: Webber, B., Cohn, T., He, Y., Liu, Y. (eds.) Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8980–8994 (Nov 2020)
30.
go back to reference Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts arXiv:2010.15980 [cs] (Nov 2020) Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts arXiv:​2010.​15980 [cs] (Nov 2020)
31.
go back to reference Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: An Open Multilingual Graph of General Knowledge (Dec 2018) Speer, R., Chin, J., Havasi, C.: ConceptNet 5.5: An Open Multilingual Graph of General Knowledge (Dec 2018)
32.
go back to reference Sun, T., et al.: CoLAKE: contextualized language and knowledge embedding. In: Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the 28th International Conference on Computational Linguistics, pp. 3660–3670 (Dec 2020) Sun, T., et al.: CoLAKE: contextualized language and knowledge embedding. In: Scott, D., Bel, N., Zong, C. (eds.) Proceedings of the 28th International Conference on Computational Linguistics, pp. 3660–3670 (Dec 2020)
33.
34.
go back to reference Talmor, A., Herzig, J., Lourie, N., Berant, J.: CommonsenseQA: a question answering challenge targeting commonsense knowledge. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4149–4158 (Jun 2019) Talmor, A., Herzig, J., Lourie, N., Berant, J.: CommonsenseQA: a question answering challenge targeting commonsense knowledge. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4149–4158 (Jun 2019)
35.
go back to reference Wang, J., et al.: Knowledge prompting in pre-trained language model for natural language understanding. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3164–3177 (Dec 2022) Wang, J., et al.: Knowledge prompting in pre-trained language model for natural language understanding. In: Goldberg, Y., Kozareva, Z., Zhang, Y. (eds.) Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 3164–3177 (Dec 2022)
36.
go back to reference Wang, R., et al.: K-Adapter: infusing knowledge into pre-trained models with adapters. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1405–1418 (Aug 2021) Wang, R., et al.: K-Adapter: infusing knowledge into pre-trained models with adapters. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 1405–1418 (Aug 2021)
37.
go back to reference Wang, X., et al.: Improving Natural Language Inference Using External Knowledge in the Science Questions Domain arXiv:1809.05724 [cs] (Nov 2018) Wang, X., et al.: Improving Natural Language Inference Using External Knowledge in the Science Questions Domain arXiv:​1809.​05724 [cs] (Nov 2018)
38.
go back to reference Wang, X., et al.: KEPLER: a unified model for knowledge embedding and pre-trained language representation. Trans. Associat. Comput. Linguist. 9, 176–194 (2021)CrossRef Wang, X., et al.: KEPLER: a unified model for knowledge embedding and pre-trained language representation. Trans. Associat. Comput. Linguist. 9, 176–194 (2021)CrossRef
40.
go back to reference Yasunaga, M., Ren, H., Bosselut, A., Liang, P., Leskovec, J.: QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering (Dec 2022) Yasunaga, M., Ren, H., Bosselut, A., Liang, P., Leskovec, J.: QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering (Dec 2022)
41.
go back to reference Ye, H., et al.: Ontology-enhanced Prompt-tuning for Few-shot Learning. In: Proceedings of the ACM Web Conference 2022, WWW 2022, pp. 778–787 (Apr 2022) Ye, H., et al.: Ontology-enhanced Prompt-tuning for Few-shot Learning. In: Proceedings of the ACM Web Conference 2022, WWW 2022, pp. 778–787 (Apr 2022)
42.
go back to reference Ye, Z.X., Chen, Q., Wang, W., Ling, Z.H.: Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models arXiv:1908.06725 [cs] (May 2020) Ye, Z.X., Chen, Q., Wang, W., Ling, Z.H.: Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models arXiv:​1908.​06725 [cs] (May 2020)
43.
go back to reference Zhang, D., Yuan, Z., Liu, Y., Zhuang, F., Chen, H., Xiong, H.: E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce (Dec 2021) Zhang, D., Yuan, Z., Liu, Y., Zhuang, F., Chen, H., Xiong, H.: E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce (Dec 2021)
44.
45.
go back to reference Zhang, X., et al.: GreaseLM: Graph REASoning Enhanced Language Models for Question Answering (Jan 2022) Zhang, X., et al.: GreaseLM: Graph REASoning Enhanced Language Models for Question Answering (Jan 2022)
46.
go back to reference Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., Liu, Q.: ERNIE: enhanced language representation with informative entities. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441–1451 (Jul 2019) Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., Liu, Q.: ERNIE: enhanced language representation with informative entities. In: Korhonen, A., Traum, D., Màrquez, L. (eds.) Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441–1451 (Jul 2019)
47.
go back to reference Zhao, A., Yu, Y.: Knowledge-enabled BERT for aspect-based sentiment analysis. Knowl.-Based Syst. 227, 107220 (2021)CrossRef Zhao, A., Yu, Y.: Knowledge-enabled BERT for aspect-based sentiment analysis. Knowl.-Based Syst. 227, 107220 (2021)CrossRef
Metadata
Title
PreAdapter: Pre-training Language Models on Knowledge Graphs
Authors
Janna Omeliyanenko
Andreas Hotho
Daniel Schlör
Copyright Year
2025
DOI
https://doi.org/10.1007/978-3-031-77850-6_12

Premium Partner