Skip to main content
Top

2023 | OriginalPaper | Chapter

On Membership Inference Attacks to Generative Language Models Across Language Domains

Authors : Myung Gyo Oh, Leo Hyun Park, Jaeuk Kim, Jaewoo Park, Taekyoung Kwon

Published in: Information Security Applications

Publisher: Springer Nature Switzerland

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The confidentiality threat against training data has become a significant security problem in neural language models. Recent studies have shown that memorized training data can be extracted by injecting well-chosen prompts into generative language models. While these attacks have achieved remarkable success in the English-based Transformer architecture, it is unclear whether they are still effective in other language domains. This paper studies the effectiveness of attacks against Korean models and the potential for attack improvements that might be beneficial for future defense studies.
The contribution of this study is two-fold. First, we perform a membership inference attack against the state-of-the-art Korea-based GPT model. We found approximate training data with 20% to 90% precision in the top 100 samples and confirmed that the proposed attack technique for naive GPT is valid across the language domains. Second, in this process, we observed that the redundancy of the selected sentences could hardly be detected with the existing attack method. Since the information appearing in a few documents is more likely to be meaningful, it is desirable to increase the uniqueness of the sentences to improve the effectiveness of the attack. Thus, we propose a deduplication strategy to replace the traditional word-level similarity metric with the BPE token level. As a result, we show 6% to 22% of the underestimated samples among the selected samples.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Two, three, and five identical sentences appear 6, 1, and 4 times, respectively.
 
Literature
2.
go back to reference Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016) Abadi, M., et al.: Deep learning with differential privacy. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318 (2016)
3.
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014) Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:​1409.​0473 (2014)
4.
go back to reference Black, S., Gao, L., Wang, P., Leahy, C., Biderman, S.: Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow. If you use this software, please cite it using these metadata 58 (2021) Black, S., Gao, L., Wang, P., Leahy, C., Biderman, S.: Gpt-neo: Large scale autoregressive language modeling with mesh-tensorflow. If you use this software, please cite it using these metadata 58 (2021)
5.
go back to reference Brown, H., Lee, K., Mireshghallah, F., Shokri, R., Tramèr, F.: What does it mean for a language model to preserve privacy? arXiv preprint arXiv:2202.05520 (2022) Brown, H., Lee, K., Mireshghallah, F., Shokri, R., Tramèr, F.: What does it mean for a language model to preserve privacy? arXiv preprint arXiv:​2202.​05520 (2022)
6.
go back to reference Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020) Brown, T., et al.: Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020)
7.
go back to reference Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., Zhang, C.: Quantifying memorization across neural language models. arXiv preprint arXiv:2202.07646 (2022) Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., Zhang, C.: Quantifying memorization across neural language models. arXiv preprint arXiv:​2202.​07646 (2022)
8.
go back to reference Carlini, N., et al.: Extracting training data from large language models. In: 30th USENIX Security Symposium (USENIX Security 2021), pp. 2633–2650 (2021) Carlini, N., et al.: Extracting training data from large language models. In: 30th USENIX Security Symposium (USENIX Security 2021), pp. 2633–2650 (2021)
10.
go back to reference Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
13.
go back to reference Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994) Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
14.
go back to reference Gailly, J.l., Adler, M.: Zlib compression library (2004) Gailly, J.l., Adler, M.: Zlib compression library (2004)
15.
go back to reference Hayes, J., Melis, L., Danezis, G., De Cristofaro, E.: Logan: membership inference attacks against generative models. In: Proceedings on Privacy Enhancing Technologies (PoPETs), De Gruyter, vol. 2019, pp. 133–152 (2019) Hayes, J., Melis, L., Danezis, G., De Cristofaro, E.: Logan: membership inference attacks against generative models. In: Proceedings on Privacy Enhancing Technologies (PoPETs), De Gruyter, vol. 2019, pp. 133–152 (2019)
16.
go back to reference Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751 (2019) Holtzman, A., Buys, J., Du, L., Forbes, M., Choi, Y.: The curious case of neural text degeneration. arXiv preprint arXiv:​1904.​09751 (2019)
17.
go back to reference Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P.S., Zhang, X.: Membership inference attacks on machine learning: a survey. ACM Comput. Surv. (CSUR) 54, 1–37 (2021) Hu, H., Salcic, Z., Sun, L., Dobbie, G., Yu, P.S., Zhang, X.: Membership inference attacks on machine learning: a survey. ACM Comput. Surv. (CSUR) 54, 1–37 (2021)
18.
go back to reference Jagannatha, A., Rawat, B.P.S., Yu, H.: Membership inference attack susceptibility of clinical language models. arXiv preprint arXiv:2104.08305 (2021) Jagannatha, A., Rawat, B.P.S., Yu, H.: Membership inference attack susceptibility of clinical language models. arXiv preprint arXiv:​2104.​08305 (2021)
19.
go back to reference Kandpal, N., Wallace, E., Raffel, C.: Deduplicating training data mitigates privacy risks in language models. arXiv preprint arXiv:2202.06539 (2022) Kandpal, N., Wallace, E., Raffel, C.: Deduplicating training data mitigates privacy risks in language models. arXiv preprint arXiv:​2202.​06539 (2022)
20.
go back to reference Kim, B., et al.: What changes can large-scale language models bring? intensive study on hyperclova: billions-scale Korean generative pretrained transformers. arXiv preprint arXiv:2109.04650 (2021) Kim, B., et al.: What changes can large-scale language models bring? intensive study on hyperclova: billions-scale Korean generative pretrained transformers. arXiv preprint arXiv:​2109.​04650 (2021)
23.
go back to reference Lehman, E., Jain, S., Pichotta, K., Goldberg, Y., Wallace, B.C.: Does bert pretrained on clinical notes reveal sensitive data? arXiv preprint arXiv:2104.07762 (2021) Lehman, E., Jain, S., Pichotta, K., Goldberg, Y., Wallace, B.C.: Does bert pretrained on clinical notes reveal sensitive data? arXiv preprint arXiv:​2104.​07762 (2021)
24.
go back to reference Mireshghallah, F., Goyal, K., Uniyal, A., Berg-Kirkpatrick, T., Shokri, R.: Quantifying privacy risks of masked language models using membership inference attacks. arXiv preprint arXiv:2203.03929 (2022) Mireshghallah, F., Goyal, K., Uniyal, A., Berg-Kirkpatrick, T., Shokri, R.: Quantifying privacy risks of masked language models using membership inference attacks. arXiv preprint arXiv:​2203.​03929 (2022)
25.
go back to reference Nasr, M., Songi, S., Thakurta, A., Papemoti, N., Carlin, N.: Adversary instantiation: lower bounds for differentially private machine learning. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 866–882. IEEE (2021) Nasr, M., Songi, S., Thakurta, A., Papemoti, N., Carlin, N.: Adversary instantiation: lower bounds for differentially private machine learning. In: 2021 IEEE Symposium on Security and Privacy (SP), pp. 866–882. IEEE (2021)
26.
go back to reference Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018) Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)
27.
go back to reference Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019) Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
28.
go back to reference Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019) Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:​1910.​10683 (2019)
30.
go back to reference Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015) Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:​1508.​07909 (2015)
31.
go back to reference Shejwalkar, V., Inan, H.A., Houmansadr, A., Sim, R.: Membership inference attacks against nlp classification models. In: NeurIPS 2021 Workshop Privacy in Machine Learning (2021) Shejwalkar, V., Inan, H.A., Houmansadr, A., Sim, R.: Membership inference attacks against nlp classification models. In: NeurIPS 2021 Workshop Privacy in Machine Learning (2021)
32.
go back to reference Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017) Shokri, R., Stronati, M., Song, C., Shmatikov, V.: Membership inference attacks against machine learning models. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. IEEE (2017)
33.
go back to reference Song, C., Raghunathan, A.: Information leakage in embedding models. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 377–390 (2020) Song, C., Raghunathan, A.: Information leakage in embedding models. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 377–390 (2020)
34.
go back to reference Thakkar, O.D., Ramaswamy, S., Mathews, R., Beaufays, F.: Understanding unintended memorization in language models under federated learning. In: Proceedings of the Third Workshop on Privacy in Natural Language Processing, pp. 1–10 (2021) Thakkar, O.D., Ramaswamy, S., Mathews, R., Beaufays, F.: Understanding unintended memorization in language models under federated learning. In: Proceedings of the Third Workshop on Privacy in Natural Language Processing, pp. 1–10 (2021)
35.
go back to reference Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1–11 (2017) Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 1–11 (2017)
36.
go back to reference Wang, B., Komatsuzaki, A.: Gpt-j-6b: A 6 billion parameter autoregressive language model (2021) Wang, B., Komatsuzaki, A.: Gpt-j-6b: A 6 billion parameter autoregressive language model (2021)
37.
38.
go back to reference Zhang, C., Ippolito, D., Lee, K., Jagielski, M., Tramèr, F., Carlini, N.: Counterfactual memorization in neural language models. arXiv preprint arXiv:2112.12938 (2021) Zhang, C., Ippolito, D., Lee, K., Jagielski, M., Tramèr, F., Carlini, N.: Counterfactual memorization in neural language models. arXiv preprint arXiv:​2112.​12938 (2021)
39.
go back to reference Zhang, J., Zhao, Y., Saleh, M., Liu, P.: Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International Conference on Machine Learning, pp. 11328–11339. PMLR (2020) Zhang, J., Zhao, Y., Saleh, M., Liu, P.: Pegasus: pre-training with extracted gap-sentences for abstractive summarization. In: International Conference on Machine Learning, pp. 11328–11339. PMLR (2020)
40.
go back to reference Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., Huang, X.: Extractive summarization as text matching. arXiv preprint arXiv:2004.08795 (2020) Zhong, M., Liu, P., Chen, Y., Wang, D., Qiu, X., Huang, X.: Extractive summarization as text matching. arXiv preprint arXiv:​2004.​08795 (2020)
Metadata
Title
On Membership Inference Attacks to Generative Language Models Across Language Domains
Authors
Myung Gyo Oh
Leo Hyun Park
Jaeuk Kim
Jaewoo Park
Taekyoung Kwon
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-25659-2_11

Premium Partner