Skip to main content

2025 | OriginalPaper | Buchkapitel

Native vs Non-native Language Prompting: A Comparative Analysis

verfasst von : Mohamed Bayan Kmainasi, Rakif Khan, Ali Ezzat Shahroor, Boushra Bendou, Maram Hasanain, Firoj Alam

Erschienen in: Web Information Systems Engineering – WISE 2024

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Large language models (LLMs) have shown remarkable abilities in different fields, including standard Natural Language Processing (NLP) tasks. To elicit knowledge from LLMs, prompts play a key role, consisting of natural language instructions. Most open and closed source LLMs are trained on available labeled and unlabeled resources-digital content such as text, images, audio, and videos. Hence, these models have better knowledge for high-resourced languages but struggle with low-resourced languages. Since prompts play a crucial role in understanding their capabilities, the language used for prompts remains an important research question. Although there has been significant research in this area, it is still limited, and less has been explored for medium to low-resourced languages. In this study, we investigate different prompting strategies (native vs. non-native) on 11 different NLP tasks associated with 11 different Arabic datasets (8.7K data points). In total, we conducted 198 experiments involving 3 open and closed LLMs (including an Arabic-centric model), and 3 prompting strategies. Our findings suggest that, on average, the non-native prompt performs the best, followed by mixed and native prompts. All prompts will be made available to the community through the LLMeBench (https://​llmebench.​qcri.​org/​) framework.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Note that we use the term ‘native’ to refer to the language of the user input. In our case, Arabic is the native language of the data tested.
 
Literatur
1.
Zurück zum Zitat Abdelali, A., et al.: LAraBench: benchmarking arabic AI with large language models. In: Graham, Y., Purver, M. (eds.) Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 487–520. Association for Computational Linguistics, St. Julian’s (2024) Abdelali, A., et al.: LAraBench: benchmarking arabic AI with large language models. In: Graham, Y., Purver, M. (eds.) Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 487–520. Association for Computational Linguistics, St. Julian’s (2024)
2.
Zurück zum Zitat Ahuja, K., et al.: MEGA: multilingual evaluation of generative AI. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 4232–4267. Association for Computational Linguistics, Singapore (2023) Ahuja, K., et al.: MEGA: multilingual evaluation of generative AI. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 4232–4267. Association for Computational Linguistics, Singapore (2023)
3.
Zurück zum Zitat Alam, F., et al.: A survey on multimodal disinformation detection. In: Proceedings of the 29th International Conference on Computational Linguistics. COLING 2022, Gyeongju, pp. 6625–6643 (2022) Alam, F., et al.: A survey on multimodal disinformation detection. In: Proceedings of the 29th International Conference on Computational Linguistics. COLING 2022, Gyeongju, pp. 6625–6643 (2022)
4.
Zurück zum Zitat Alam, F., Mubarak, H., Zaghouani, W., Da San Martino, G., Nakov, P.: Overview of the WANLP 2022 Shared Task on Propaganda Detection in Arabic, pp. 108–118 (2022) Alam, F., Mubarak, H., Zaghouani, W., Da San Martino, G., Nakov, P.: Overview of the WANLP 2022 Shared Task on Propaganda Detection in Arabic, pp. 108–118 (2022)
5.
Zurück zum Zitat Alam, F., et al.: Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 611–649. Association for Computational Linguistics, Punta Cana (2021) Alam, F., et al.: Fighting the COVID-19 infodemic: modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 611–649. Association for Computational Linguistics, Punta Cana (2021)
6.
Zurück zum Zitat Bang, Y., et al.: A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 675—718. Association for Computational Linguistics, Indonesia (2023) Bang, Y., et al.: A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity. In: Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 675—718. Association for Computational Linguistics, Indonesia (2023)
7.
Zurück zum Zitat Brooke, S.: “Condescending, rude, assholes”: framing gender and hostility on Stack Overflow. In: WALO, pp. 172–180 (2019) Brooke, S.: “Condescending, rude, assholes”: framing gender and hostility on Stack Overflow. In: WALO, pp. 172–180 (2019)
8.
Zurück zum Zitat Brown, T.B., et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. (2020) Brown, T.B., et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. (2020)
9.
Zurück zum Zitat Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998) Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)
10.
Zurück zum Zitat Dalvi, F., et al.: LLMeBench: a flexible framework for accelerating LLMs benchmarking. In: Aletras, N., De Clercq, O. (eds.) Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 214–222. Association for Computational Linguistics, St. Julians (2024) Dalvi, F., et al.: LLMeBench: a flexible framework for accelerating LLMs benchmarking. In: Aletras, N., De Clercq, O. (eds.) Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 214–222. Association for Computational Linguistics, St. Julians (2024)
11.
Zurück zum Zitat Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the International AAAI Conference on Web and Social Media (AAAI 2017), vol. 11 (2017) Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the International AAAI Conference on Web and Social Media (AAAI 2017), vol. 11 (2017)
12.
Zurück zum Zitat Dimitrov, D., et al.: Detecting propaganda techniques in memes. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6603–6617. Association for Computational Linguistics (2021) Dimitrov, D., et al.: Detecting propaganda techniques in memes. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 6603–6617. Association for Computational Linguistics (2021)
13.
Zurück zum Zitat Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 1–30 (2018)CrossRef Fortuna, P., Nunes, S.: A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51(4), 1–30 (2018)CrossRef
14.
Zurück zum Zitat Galassi, A., et al.: Overview of the CLEF-2023 CheckThat! lab task 2 on subjectivity in news articles. In: Working Notes of CLEF 2023–Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki (2023) Galassi, A., et al.: Overview of the CLEF-2023 CheckThat! lab task 2 on subjectivity in news articles. In: Working Notes of CLEF 2023–Conference and Labs of the Evaluation Forum (CLEF 2023), Thessaloniki (2023)
15.
Zurück zum Zitat Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.N.: Towards online spam filtering in social networks. In: Network and Distributed System Security Symposium (NDSS 2012), pp. 1–16 (2012) Gao, H., Chen, Y., Lee, K., Palsetia, D., Choudhary, A.N.: Towards online spam filtering in social networks. In: Network and Distributed System Security Symposium (NDSS 2012), pp. 1–16 (2012)
16.
Zurück zum Zitat Guo, Z., Schlichtkrull, M., Vlachos, A.: A survey on automated fact-checking. Trans. Assoc. Comput. Linguist. 10, 178–206 (2022)CrossRef Guo, Z., Schlichtkrull, M., Vlachos, A.: A survey on automated fact-checking. Trans. Assoc. Comput. Linguist. 10, 178–206 (2022)CrossRef
17.
Zurück zum Zitat Jiao, W., Wang, W., Huang, J.T., Wang, X., Shi, S., Tu, Z.: Is chatgpt a good translator? Yes with gpt-4 as the engine. arXiv preprint arXiv:2301.08745 (2023) Jiao, W., Wang, W., Huang, J.T., Wang, X., Shi, S., Tu, Z.: Is chatgpt a good translator? Yes with gpt-4 as the engine. arXiv preprint arXiv:​2301.​08745 (2023)
18.
Zurück zum Zitat Jin, Y., Choi, M., Verma, G., Wang, J., Kumar, S.: Mm-soc: benchmarking multimodal large language models in social media platforms. arXiv preprint arXiv:2402.14154 (2024) Jin, Y., Choi, M., Verma, G., Wang, J., Kumar, S.: Mm-soc: benchmarking multimodal large language models in social media platforms. arXiv preprint arXiv:​2402.​14154 (2024)
19.
Zurück zum Zitat Joksimovic, S., et al.: Automated identification of verbally abusive behaviors in online discussions. In: WALO, pp. 36–45 (2019) Joksimovic, S., et al.: Automated identification of verbally abusive behaviors in online discussions. In: WALO, pp. 36–45 (2019)
20.
Zurück zum Zitat Khondaker, M.T.I., Waheed, A., Abdul-Mageed, M., et al.: GPTAraEval: a comprehensive evaluation of chatgpt on Arabic NLP. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 220–247 (2023) Khondaker, M.T.I., Waheed, A., Abdul-Mageed, M., et al.: GPTAraEval: a comprehensive evaluation of chatgpt on Arabic NLP. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp. 220–247 (2023)
21.
Zurück zum Zitat Khouja, J.: Stance prediction and claim verification: an Arabic perspective. In: Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER), pp. 8–17. Association for Computational Linguistics (2020) Khouja, J.: Stance prediction and claim verification: an Arabic perspective. In: Proceedings of the Third Workshop on Fact Extraction and VERification (FEVER), pp. 8–17. Association for Computational Linguistics (2020)
22.
Zurück zum Zitat Konstantinovskiy, L., Price, O., Babakar, M., Zubiaga, A.: Towards automated factchecking: Developing an annotation schema and benchmark for consistent automated claim detection. arXiv preprint arXiv:1809.08193 (2018) Konstantinovskiy, L., Price, O., Babakar, M., Zubiaga, A.: Towards automated factchecking: Developing an annotation schema and benchmark for consistent automated claim detection. arXiv preprint arXiv:​1809.​08193 (2018)
23.
Zurück zum Zitat Liang, P., et al.: Holistic evaluation of language models. Trans. Mach. Learn. Res. Liang, P., et al.: Holistic evaluation of language models. Trans. Mach. Learn. Res.
24.
Zurück zum Zitat Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., Zou, J.: Gpt detectors are biased against non-native English writers. Patterns 4(7), 100779 (2023) Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., Zou, J.: Gpt detectors are biased against non-native English writers. Patterns 4(7), 100779 (2023)
25.
Zurück zum Zitat Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)CrossRef Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)CrossRef
26.
Zurück zum Zitat Mahmud, M.S., Huang, J.Z., Salloum, S., Emara, T.Z., Sadatdiynov, K.: A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining Analyt. 3(2), 85–101 (2020)CrossRef Mahmud, M.S., Huang, J.Z., Salloum, S., Emara, T.Z., Sadatdiynov, K.: A survey of data partitioning and sampling methods to support big data analysis. Big Data Mining Analyt. 3(2), 85–101 (2020)CrossRef
27.
Zurück zum Zitat Marchisio, K., Ko, W.Y., Bérard, A., Dehaze, T., Ruder, S.: Understanding and mitigating language confusion in LLMS. arXiv preprint arXiv:2406.20052 (2024) Marchisio, K., Ko, W.Y., Bérard, A., Dehaze, T., Ruder, S.: Understanding and mitigating language confusion in LLMS. arXiv preprint arXiv:​2406.​20052 (2024)
28.
Zurück zum Zitat Mubarak, H., Abdaljalil, S., Nassar, A., Alam, F.: Detecting and identifying the reasons for deleted tweets before they are posted. Front. Artif. Intell. 6, 1219767 (2023)CrossRef Mubarak, H., Abdaljalil, S., Nassar, A., Alam, F.: Detecting and identifying the reasons for deleted tweets before they are posted. Front. Artif. Intell. 6, 1219767 (2023)CrossRef
29.
Zurück zum Zitat Mubarak, H., Abdelali, A., Hassan, S., Darwish, K.: Spam detection on Arabic twitter. In: Social Informatics: 12th International Conference, SocInfo 2020, Pisa, 6–9 October 2020, Proceedings 12, pp. 237–251. Springer (2020) Mubarak, H., Abdelali, A., Hassan, S., Darwish, K.: Spam detection on Arabic twitter. In: Social Informatics: 12th International Conference, SocInfo 2020, Pisa, 6–9 October 2020, Proceedings 12, pp. 237–251. Springer (2020)
30.
Zurück zum Zitat Mubarak, H., Darwish, K., Magdy, W., Elsayed, T., Al-Khalifa, H.: Overview of OSACT4 Arabic offensive language detection shared task. In: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pp. 48–52. European Language Resource Association, Marseille (2020) Mubarak, H., Darwish, K., Magdy, W., Elsayed, T., Al-Khalifa, H.: Overview of OSACT4 Arabic offensive language detection shared task. In: Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, pp. 48–52. European Language Resource Association, Marseille (2020)
31.
Zurück zum Zitat Mubarak, H., Hassan, S., Abdelali, A.: Adult content detection on Arabic twitter: analysis and experiments. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 136–144 (2021) Mubarak, H., Hassan, S., Abdelali, A.: Adult content detection on Arabic twitter: analysis and experiments. In: Proceedings of the Sixth Arabic Natural Language Processing Workshop, pp. 136–144 (2021)
32.
Zurück zum Zitat Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! lab task 1 on identifying relevant claims in tweets. In: Working Notes of CLEF 2022—Conference and Labs of the Evaluation Forum (CLEF 2022) (2022) Nakov, P., et al.: Overview of the CLEF-2022 CheckThat! lab task 1 on identifying relevant claims in tweets. In: Working Notes of CLEF 2022—Conference and Labs of the Evaluation Forum (CLEF 2022) (2022)
33.
Zurück zum Zitat Nakov, P., et al.: Automated fact-checking for assisting human fact-checkers. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021), pp. 4551–4558 (2021) Nakov, P., et al.: Automated fact-checking for assisting human fact-checkers. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021), pp. 4551–4558 (2021)
34.
Zurück zum Zitat Nguyen, X.P., Aljunied, S.M., Joty, S., Bing, L.: Democratizing LLMS for low-resource languages by leveraging their English dominant abilities with linguistically-diverse prompts. arXiv preprint arXiv:2306.11372 (2023) Nguyen, X.P., Aljunied, S.M., Joty, S., Bing, L.: Democratizing LLMS for low-resource languages by leveraging their English dominant abilities with linguistically-diverse prompts. arXiv preprint arXiv:​2306.​11372 (2023)
35.
Zurück zum Zitat OpenAI: GPT-4 technical report. Tech. rep. OpenAI (2023) OpenAI: GPT-4 technical report. Tech. rep. OpenAI (2023)
36.
Zurück zum Zitat Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6086–6093 (2020) Oshikawa, R., Qian, J., Wang, W.Y.: A survey on natural language processing for fake news detection. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6086–6093 (2020)
37.
Zurück zum Zitat Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, pp. 3982–3992 (2019) Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019), Hong Kong, pp. 3982–3992 (2019)
38.
Zurück zum Zitat Sengupta, N., et al.: Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models. arXiv preprint arXiv:2308.16149 (2023) Sengupta, N., et al.: Jais and jais-chat: Arabic-centric foundation and instruction-tuned open generative large language models. arXiv preprint arXiv:​2308.​16149 (2023)
39.
Zurück zum Zitat Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4222–4235 (2020) Shin, T., Razeghi, Y., Logan IV, R.L., Wallace, E., Singh, S.: AutoPrompt: eliciting knowledge from language models with automatically generated prompts. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4222–4235 (2020)
40.
Zurück zum Zitat Suwaileh, R., Hasanain, M., Hubail, F., Zaghouani, W., Alam, F.: ThatiAR: subjectivity detection in Arabic news sentences. arXiv preprint arXiv:2406.05559 (2024) Suwaileh, R., Hasanain, M., Hubail, F., Zaghouani, W., Alam, F.: ThatiAR: subjectivity detection in Arabic news sentences. arXiv preprint arXiv:​2406.​05559 (2024)
41.
Zurück zum Zitat Team, L.: The Llama 3 herd of models. arXiv (2023) Team, L.: The Llama 3 herd of models. arXiv (2023)
42.
Zurück zum Zitat Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, pp. 24824–24837 (2022) Wei, J., et al.: Chain-of-thought prompting elicits reasoning in large language models. In: Proceedings of the 36th International Conference on Neural Information Processing Systems, pp. 24824–24837 (2022)
43.
Zurück zum Zitat Zhou, Y., et al.: Large language models are human-level prompt engineers. In: NeurIPS 2022 Foundation Models for Decision Making Workshop (2022) Zhou, Y., et al.: Large language models are human-level prompt engineers. In: NeurIPS 2022 Foundation Models for Decision Making Workshop (2022)
Metadaten
Titel
Native vs Non-native Language Prompting: A Comparative Analysis
verfasst von
Mohamed Bayan Kmainasi
Rakif Khan
Ali Ezzat Shahroor
Boushra Bendou
Maram Hasanain
Firoj Alam
Copyright-Jahr
2025
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-96-0576-7_30