Skip to main content

2025 | OriginalPaper | Buchkapitel

Deconfounded Causality-Aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs

verfasst von : Ruoyu Wang, Xiaoxuan Li, Lina Yao

Erschienen in: Web Information Systems Engineering – WISE 2024

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Large Language Models (LLMs) have demonstrated remarkable efficiency in tackling various tasks based on human instructions, but studies reveal that they often struggle with tasks requiring reasoning, such as math or physics. This limitation raises questions about whether LLMs truly comprehend embedded knowledge or merely learn to replicate the token distribution without a true understanding of the content. In this paper, we delve into this problem and aim to enhance the reasoning capabilities of LLMs. First, we investigate if the model has genuine reasoning capabilities by visualizing the text generation process at the attention and representation level. Then, we formulate the reasoning process of LLMs into a causal framework, which provides a formal explanation of the problems observed in the visualization. Finally, building upon this causal framework, we propose Deconfounded Causal Adaptation (DCA), a novel parameter-efficient fine-tuning (PEFT) method to enhance the model’s reasoning capabilities by encouraging the model to extract the general problem-solving skills and apply these skills to different questions. Experiments show that our method outperforms the baseline consistently across multiple benchmarks, and with only 1.2M tunable parameters, we achieve better or comparable results to other fine-tuning methods. This demonstrates the effectiveness and efficiency of our method in improving the overall accuracy and reliability of LLMs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Besta, M., Blach, N., Kubicek, A., et al.: Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687 (2023) Besta, M., Blach, N., Kubicek, A., et al.: Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:​2308.​09687 (2023)
3.
Zurück zum Zitat Edalati, A., Tahaei, M., Kobyzev, I., Nia, V.P., et al.: KronA: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:2212.10650 (2022) Edalati, A., Tahaei, M., Kobyzev, I., Nia, V.P., et al.: KronA: Parameter efficient tuning with kronecker adapter. arXiv preprint arXiv:​2212.​10650 (2022)
4.
Zurück zum Zitat Geng, X., et al.: Koala: A dialogue model for academic research. Blog post, April 1 (2023) Geng, X., et al.: Koala: A dialogue model for academic research. Blog post, April 1 (2023)
5.
Zurück zum Zitat He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021) He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:​2110.​04366 (2021)
6.
Zurück zum Zitat Hosseini, M.J., Hajishirzi, H., Etzioni, O., Kushman, N.: Learning to solve arithmetic word problems with verb categorization. In: EMNLP, pp. 523–533 (2014) Hosseini, M.J., Hajishirzi, H., Etzioni, O., Kushman, N.: Learning to solve arithmetic word problems with verb categorization. In: EMNLP, pp. 523–533 (2014)
7.
Zurück zum Zitat Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019) Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019)
8.
Zurück zum Zitat Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., et al.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021) Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., et al.: Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:​2106.​09685 (2021)
9.
Zurück zum Zitat Hu, Z., Lan, Y., et al.: LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:2304.01933 (2023) Hu, Z., Lan, Y., et al.: LLM-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv preprint arXiv:​2304.​01933 (2023)
11.
Zurück zum Zitat Jin, Z., Liu, J., Lyu, Z., Poff, Spencer Schölkopf, B., et al.: Can large language models infer causation from correlation? arXiv preprint arXiv:2306.05836 (2023) Jin, Z., Liu, J., Lyu, Z., Poff, Spencer Schölkopf, B., et al.: Can large language models infer causation from correlation? arXiv preprint arXiv:​2306.​05836 (2023)
12.
Zurück zum Zitat Kocaoglu, M., Snyder, C., et al.: CausalGAN: Learning causal implicit generative models with adversarial training. arXiv preprint arXiv:1709.02023 (2017) Kocaoglu, M., Snyder, C., et al.: CausalGAN: Learning causal implicit generative models with adversarial training. arXiv preprint arXiv:​1709.​02023 (2017)
13.
Zurück zum Zitat Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691 (2021) Lester, B., Al-Rfou, R., Constant, N.: The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:​2104.​08691 (2021)
14.
15.
Zurück zum Zitat Li, Y., Yu, Y., Liang, C., He, P., et al.: LoftQ: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:2310.08659 (2023) Li, Y., Yu, Y., Liang, C., He, P., et al.: LoftQ: Lora-fine-tuning-aware quantization for large language models. arXiv preprint arXiv:​2310.​08659 (2023)
17.
Zurück zum Zitat OpenAI: GPT-4 technical report (2023) OpenAI: GPT-4 technical report (2023)
18.
Zurück zum Zitat Pearl, J.: Causality. Cambridge University Press, Cambridge (2009) Pearl, J.: Causality. Cambridge University Press, Cambridge (2009)
19.
Zurück zum Zitat Pfeiffer, J., Vulić, I., Gurevych, I., Ruder, S.: MAD-X: An adapter-based framework for multi-task cross-lingual transfer. arXiv preprint arXiv:2005.00052 (2020) Pfeiffer, J., Vulić, I., Gurevych, I., Ruder, S.: MAD-X: An adapter-based framework for multi-task cross-lingual transfer. arXiv preprint arXiv:​2005.​00052 (2020)
20.
Zurück zum Zitat Qiao, S., Ou, Y., Zhang, N., Chen, X., Yao, Y., Deng, S., et al.: Reasoning with language model prompting: A survey. arXiv preprint arXiv:2212.09597 (2022) Qiao, S., Ou, Y., Zhang, N., Chen, X., Yao, Y., Deng, S., et al.: Reasoning with language model prompting: A survey. arXiv preprint arXiv:​2212.​09597 (2022)
21.
Zurück zum Zitat Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:2304.15004 (2023) Schaeffer, R., Miranda, B., Koyejo, S.: Are emergent abilities of large language models a mirage? arXiv preprint arXiv:​2304.​15004 (2023)
22.
Zurück zum Zitat Shen, X., Liu, F., Dong, H., Lian, Q., et al.: Weakly supervised disentangled generative causal representation learning. JMLR 23(1), 10994–11048 (2022) Shen, X., Liu, F., Dong, H., Lian, Q., et al.: Weakly supervised disentangled generative causal representation learning. JMLR 23(1), 10994–11048 (2022)
23.
Zurück zum Zitat Srivastava, A., et al.: Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615 (2022) Srivastava, A., et al.: Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:​2206.​04615 (2022)
24.
Zurück zum Zitat Tang, K., Huang, J., Zhang, H.: Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33, 1513–1524 (2020) Tang, K., Huang, J., Zhang, H.: Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS 33, 1513–1524 (2020)
25.
Zurück zum Zitat Taori, R., et al.: Stanford alpaca: An instruction-following llama model (2023) Taori, R., et al.: Stanford alpaca: An instruction-following llama model (2023)
26.
Zurück zum Zitat Touvron, H., Lavril, T., Izacard, G., Martinet, X., et al.: LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023) Touvron, H., Lavril, T., Izacard, G., Martinet, X., et al.: LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:​2302.​13971 (2023)
28.
Zurück zum Zitat Vig, J.: BertViz: A tool for visualizing multihead self-attention in the BERT model. In: ICLR Workshop: Debugging Machine Learning Models. vol. 23 (2019) Vig, J.: BertViz: A tool for visualizing multihead self-attention in the BERT model. In: ICLR Workshop: Debugging Machine Learning Models. vol. 23 (2019)
30.
Zurück zum Zitat Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022) Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 35, 24824–24837 (2022)
31.
Zurück zum Zitat Xu, C., Guo, D., et al.: Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196 (2023) Xu, C., Guo, D., et al.: Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:​2304.​01196 (2023)
32.
Zurück zum Zitat Yang, M., Liu, F., Chen, Z., Shen, X., et al.: CausalVAE: disentangled representation learning via neural structural causal models. In: CVPR, pp. 9593–9602 (2021) Yang, M., Liu, F., Chen, Z., Shen, X., et al.: CausalVAE: disentangled representation learning via neural structural causal models. In: CVPR, pp. 9593–9602 (2021)
33.
Zurück zum Zitat Yao, S., Yu, D., Zhao, J., Shafran, I., et al.: Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601 (2023) Yao, S., Yu, D., Zhao, J., Shafran, I., et al.: Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:​2305.​10601 (2023)
34.
Zurück zum Zitat Yuan, Z., Yuan, H., Tan, C., Wang, W., Huang, S.: How well do large language models perform in arithmetic tasks? (2023) Yuan, Z., Yuan, H., Tan, C., Wang, W., Huang, S.: How well do large language models perform in arithmetic tasks? (2023)
35.
Zurück zum Zitat Yue, Z., Zhang, H., Sun, Q., Hua, X.S.: Interventional few-shot learning. Adv. Neural Inf. Process. Syst. 33, 2734–2746 (2020) Yue, Z., Zhang, H., Sun, Q., Hua, X.S.: Interventional few-shot learning. Adv. Neural Inf. Process. Syst. 33, 2734–2746 (2020)
36.
Zurück zum Zitat Zhang, Q., Chen, M., Bukharin, A., He, P., et al.: Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512 (2023) Zhang, Q., Chen, M., Bukharin, A., He, P., et al.: Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:​2303.​10512 (2023)
37.
Zurück zum Zitat Zhang, R., Han, J., Zhou, A., Hu, X., et al.: LLaMA-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:2303.16199 (2023) Zhang, R., Han, J., Zhou, A., Hu, X., et al.: LLaMA-adapter: Efficient fine-tuning of language models with zero-init attention. arXiv preprint arXiv:​2303.​16199 (2023)
Metadaten
Titel
Deconfounded Causality-Aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs
verfasst von
Ruoyu Wang
Xiaoxuan Li
Lina Yao
Copyright-Jahr
2025
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-96-0573-6_12