Skip to main content
Top

2025 | OriginalPaper | Chapter

Deconfounded Causality-Aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

Authors : Ruoyu Wang, Xiaoxuan Li, Lina Yao

Published in: Web Information Systems Engineering – WISE 2024

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Large Language Models (LLMs) have demonstrated remarkable efficiency in tackling various tasks based on human instructions, but studies reveal that they often struggle with tasks requiring reasoning, such as math or physics. This limitation raises questions about whether LLMs truly comprehend embedded knowledge or merely learn to replicate the token distribution without a true understanding of the content. In this paper, we delve into this problem and aim to enhance the reasoning capabilities of LLMs. First, we investigate if the model has genuine reasoning capabilities by visualizing the text generation process at the attention and representation level. Then, we formulate the reasoning process of LLMs into a causal framework, which provides a formal explanation of the problems observed in the visualization. Finally, building upon this causal framework, we propose Deconfounded Causal Adaptation (DCA), a novel parameter-efficient fine-tuning (PEFT) method to enhance the model’s reasoning capabilities by encouraging the model to extract the general problem-solving skills and apply these skills to different questions. Experiments show that our method outperforms the baseline consistently across multiple benchmarks, and with only 1.2M tunable parameters, we achieve better or comparable results to other fine-tuning methods. This demonstrates the effectiveness and efficiency of our method in improving the overall accuracy and reliability of LLMs.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Besta, M., Blach, N., Kubicek, A., et al.: Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687 (2023) Besta, M., Blach, N., Kubicek, A., et al.: Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:​2308.​09687 (2023)
3.
go back to reference Edalati, A., Tahaei, M., Kobyzev, I., Nia, V.P., et al.: KronA: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650 (2022) Edalati, A., Tahaei, M., Kobyzev, I., Nia, V.P., et al.: KronA: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:​2212.​10650 (2022)
4.
go back to reference Geng, X., et al.: Koala: A dialogue model for academic research. Blog post, April 1 (2023) Geng, X., et al.: Koala: A dialogue model for academic research. Blog post, April 1 (2023)
5.
go back to reference He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021) He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:​2110.​04366 (2021)
6.
go back to reference Hosseini, M.J., Hajishirzi, H., Etzioni, O., Kushman, N.: Learning to solve arithmetic word problems with verb categorization. In: EMNLP, pp. 523–533 (2014) Hosseini, M.J., Hajishirzi, H., Etzioni, O., Kushman, N.: Learning to solve arithmetic word problems with verb categorization. In: EMNLP, pp. 523–533 (2014)
7.
go back to reference Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019) Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019)
8.
go back to reference Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., et al.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021) Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., et al.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:​2106.​09685 (2021)
9.
go back to reference Hu, Z., Lan, Y., et al.: LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933 (2023) Hu, Z., Lan, Y., et al.: LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:​2304.​01933 (2023)
11.
go back to reference Jin, Z., Liu, J., Lyu, Z., Poff, Spencer Schölkopf, B., et al.: Can large language models infer causation from correlation? arXiv preprint arXiv:2306.05836 (2023) Jin, Z., Liu, J., Lyu, Z., Poff, Spencer Schölkopf, B., et al.: Can large language models infer causation from correlation? arXiv preprint arXiv:​2306.​05836 (2023)
12.
go back to reference Kocaoglu, M., Snyder, C., et al.: CausalGAN: Learning causal implicit generative models with adversarial training. arXiv preprint arXiv:1709.02023 (2017) Kocaoglu, M., Snyder, C., et al.: CausalGAN: Learning causal implicit generative models with adversarial training. arXiv preprint arXiv:​1709.​02023 (2017)
13.
go back to reference Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021) Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:​2104.​08691 (2021)
15.
go back to reference Li, Y., Yu, Y., Liang, C., He, P., et al.: LoftQ: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659 (2023) Li, Y., Yu, Y., Liang, C., He, P., et al.: LoftQ: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:​2310.​08659 (2023)
17.
18.
go back to reference Pearl, J.: Causality. Cambridge University Press, Cambridge (2009) Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
19.
go back to reference Pfeiffer, J., Vulić, I., Gurevych, I., Ruder, S.: MAD-X: An adapter-based framework for multi-task cross-lingual transfer. arXiv preprint arXiv:2005.00052 (2020) Pfeiffer, J., Vulić, I., Gurevych, I., Ruder, S.: MAD-X: An adapter-based framework for multi-task cross-lingual transfer. arXiv preprint arXiv:​2005.​00052 (2020)
20.
go back to reference Qiao, S., Ou, Y., Zhang, N., Chen, X., Yao, Y., Deng, S., et al.: Reasoning with language model prompting: A survey. arXiv preprint arXiv:2212.09597 (2022) Qiao, S., Ou, Y., Zhang, N., Chen, X., Yao, Y., Deng, S., et al.: Reasoning with language model prompting: A survey. arXiv preprint arXiv:​2212.​09597 (2022)
21.
go back to reference Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:​2304.​15004 (2023)
22.
go back to reference Shen, X., Liu, F., Dong, H., Lian, Q., et al.: Weakly supervised disentangled generative causal representation learning. JMLR 23(1), 10994–11048 (2022) Shen, X., Liu, F., Dong, H., Lian, Q., et al.: Weakly supervised disentangled generative causal representation learning. JMLR 23(1), 10994–11048 (2022)
23.
go back to reference Srivastava, A., et al.: Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615 (2022) Srivastava, A., et al.: Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:​2206.​04615 (2022)
24.
go back to reference Tang, K., Huang, J., Zhang, H.: Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33, 1513–1524 (2020) Tang, K., Huang, J., Zhang, H.: Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33, 1513–1524 (2020)
25.
go back to reference Taori, R., et al.: Stanford alpaca: An instruction-following llama model (2023) Taori, R., et al.: Stanford alpaca: An instruction-following llama model (2023)
26.
go back to reference Touvron, H., Lavril, T., Izacard, G., Martinet, X., et al.: LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023) Touvron, H., Lavril, T., Izacard, G., Martinet, X., et al.: LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:​2302.​13971 (2023)
28.
go back to reference Vig, J.: BertViz: A tool for visualizing multihead self-attention in the BERT model. In: ICLR Workshop: Debugging Machine Learning Models. vol. 23 (2019) Vig, J.: BertViz: A tool for visualizing multihead self-attention in the BERT model. In: ICLR Workshop: Debugging Machine Learning Models. vol. 23 (2019)
30.
go back to reference Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022) Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022)
31.
go back to reference Xu, C., Guo, D., et al.: Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196 (2023) Xu, C., Guo, D., et al.: Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:​2304.​01196 (2023)
32.
go back to reference Yang, M., Liu, F., Chen, Z., Shen, X., et al.: CausalVAE: disentangled representation learning via neural structural causal models. In: CVPR, pp. 9593–9602 (2021) Yang, M., Liu, F., Chen, Z., Shen, X., et al.: CausalVAE: disentangled representation learning via neural structural causal models. In: CVPR, pp. 9593–9602 (2021)
33.
go back to reference Yao, S., Yu, D., Zhao, J., Shafran, I., et al.: Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023) Yao, S., Yu, D., Zhao, J., Shafran, I., et al.: Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:​2305.​10601 (2023)
34.
go back to reference Yuan, Z., Yuan, H., Tan, C., Wang, W., Huang, S.: How well do large language models perform in arithmetic tasks? (2023) Yuan, Z., Yuan, H., Tan, C., Wang, W., Huang, S.: How well do large language models perform in arithmetic tasks? (2023)
35.
go back to reference Yue, Z., Zhang, H., Sun, Q., Hua, X.S.: Interventional few-shot learning. Adv. Neural Inf. Process. Syst. 33, 2734–2746 (2020) Yue, Z., Zhang, H., Sun, Q., Hua, X.S.: Interventional few-shot learning. Adv. Neural Inf. Process. Syst. 33, 2734–2746 (2020)
36.
go back to reference Zhang, Q., Chen, M., Bukharin, A., He, P., et al.: Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512 (2023) Zhang, Q., Chen, M., Bukharin, A., He, P., et al.: Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:​2303.​10512 (2023)
37.
go back to reference Zhang, R., Han, J., Zhou, A., Hu, X., et al.: LLaMA-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 (2023) Zhang, R., Han, J., Zhou, A., Hu, X., et al.: LLaMA-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:​2303.​16199 (2023)
Metadata
Title
Deconfounded Causality-Aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs
Authors
Ruoyu Wang
Xiaoxuan Li
Lina Yao
Copyright Year
2025
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-96-0573-6_12

Premium Partner