Skip to main content
Erschienen in:

2024 | OriginalPaper | Buchkapitel

MIC: An Effective Defense Against Word-Level Textual Backdoor Attacks

verfasst von : Shufan Yang, Qianmu Li, Zhichao Lian, Pengchuan Wang, Jun Hou

Erschienen in: Neural Information Processing

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Backdoor attacks, which manipulate model output, have garnered significant attention from researchers. However, some existing word-level backdoor attack methods in NLP models are difficult to defend effectively due to their concealment and diversity. These covert attacks use two words that appear similar to the naked eye but will be mapped to different word vectors by the NLP model as a way of bypassing existing defenses. To address this issue, we propose incorporating triple metric learning into the standard training phase of NLP models to defend against existing word-level backdoor attacks. Specifically, metric learning is used to minimize the distance between vectors of similar words while maximizing the distance between them and vectors of other words. Additionally, given that metric learning may reduce a model’s sensitivity to semantic changes caused by subtle perturbations, we added contrastive learning after the model’s standard training. Experimental results demonstrate that our method performs well against the two most stealthy existing word-level backdoor attacks.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
3.
Zurück zum Zitat Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Z.: Kallima: a clean-label framework for textual backdoor attacks. arXiv preprint arXiv:2206.01832 (2022) Chen, X., Dong, Y., Sun, Z., Zhai, S., Shen, Q., Wu, Z.: Kallima: a clean-label framework for textual backdoor attacks. arXiv preprint arXiv:​2206.​01832 (2022)
4.
Zurück zum Zitat Chen, X., Salem, A., Backes, M., Ma, S., Zhang, Y.: BadNL: backdoor attacks against NLP models. In: ICML 2021 Workshop on Adversarial Machine Learning (2021) Chen, X., Salem, A., Backes, M., Ma, S., Zhang, Y.: BadNL: backdoor attacks against NLP models. In: ICML 2021 Workshop on Adversarial Machine Learning (2021)
6.
Zurück zum Zitat Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)CrossRef Dai, J., Chen, C., Li, Y.: A backdoor attack against LSTM-based text classification systems. IEEE Access 7, 138872–138878 (2019)CrossRef
7.
Zurück zum Zitat Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
8.
Zurück zum Zitat Doan, B.G., Abbasnejad, E., Ranasinghe, D.C.: Februus: input purification defense against trojan attacks on deep neural network systems. In: Annual Computer Security Applications Conference, ACSAC 2020, pp. 897–912. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3427228.3427264 Doan, B.G., Abbasnejad, E., Ranasinghe, D.C.: Februus: input purification defense against trojan attacks on deep neural network systems. In: Annual Computer Security Applications Conference, ACSAC 2020, pp. 897–912. Association for Computing Machinery, New York, NY, USA (2020). https://​doi.​org/​10.​1145/​3427228.​3427264
9.
Zurück zum Zitat Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: STRIP: a defence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC 2019, pp. 113–125. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3359789.3359790 Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: STRIP: a defence against trojan attacks on deep neural networks. In: Proceedings of the 35th Annual Computer Security Applications Conference, ACSAC 2019, pp. 113–125. Association for Computing Machinery, New York, NY, USA (2019). https://​doi.​org/​10.​1145/​3359789.​3359790
10.
Zurück zum Zitat Garg, S., Kumar, A., Goel, V., Liang, Y.: Can adversarial weight perturbations inject neural backdoors. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 2020, pp. 2029–2032. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3340531.3412130 Garg, S., Kumar, A., Goel, V., Liang, Y.: Can adversarial weight perturbations inject neural backdoors. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM 2020, pp. 2029–2032. Association for Computing Machinery, New York, NY, USA (2020). https://​doi.​org/​10.​1145/​3340531.​3412130
11.
12.
Zurück zum Zitat Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733 (2017) Gu, T., Dolan-Gavitt, B., Garg, S.: BadNets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:​1708.​06733 (2017)
13.
Zurück zum Zitat Le, T., Lee, J., Yen, K., Hu, Y., Lee, D.: Perturbations in the wild: leveraging human-written text perturbations for realistic adversarial attack and defense. arXiv preprint arXiv:2203.10346 (2022) Le, T., Lee, J., Yen, K., Hu, Y., Lee, D.: Perturbations in the wild: leveraging human-written text perturbations for realistic adversarial attack and defense. arXiv preprint arXiv:​2203.​10346 (2022)
15.
Zurück zum Zitat Li, Y., Zhai, T., Wu, B., Jiang, Y., Li, Z., Xia, S.: Rethinking the trigger of backdoor attack (2021) Li, Y., Zhai, T., Wu, B., Jiang, Y., Li, Z., Xia, S.: Rethinking the trigger of backdoor attack (2021)
16.
Zurück zum Zitat Liao, C., Zhong, H., Squicciarini, A., Zhu, S., Miller, D.: Backdoor embedding in convolutional neural network models via invisible perturbation. arXiv preprint arXiv:1808.10307 (2018) Liao, C., Zhong, H., Squicciarini, A., Zhu, S., Miller, D.: Backdoor embedding in convolutional neural network models via invisible perturbation. arXiv preprint arXiv:​1808.​10307 (2018)
17.
Zurück zum Zitat Liu, Y., et al.: Trojaning attack on neural networks (2017) Liu, Y., et al.: Trojaning attack on neural networks (2017)
22.
Zurück zum Zitat Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. arXiv preprint arXiv:2110.07139 (2021) Qi, F., Chen, Y., Zhang, X., Li, M., Liu, Z., Sun, M.: Mind the style of text! Adversarial and backdoor attacks based on text style transfer. arXiv preprint arXiv:​2110.​07139 (2021)
23.
24.
Zurück zum Zitat Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: learnable textual backdoor attacks via word substitution. arXiv preprint arXiv:2106.06361 (2021) Qi, F., Yao, Y., Xu, S., Liu, Z., Sun, M.: Turn the combination lock: learnable textual backdoor attacks via word substitution. arXiv preprint arXiv:​2106.​06361 (2021)
25.
Zurück zum Zitat Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11957–11965 (2020) Saha, A., Subramanya, A., Pirsiavash, H.: Hidden trigger backdoor attacks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11957–11965 (2020)
26.
Zurück zum Zitat Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013) Socher, R., et al.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642 (2013)
27.
Zurück zum Zitat Sun, L.: Natural backdoor attack on text data (2021) Sun, L.: Natural backdoor attack on text data (2021)
34.
Zurück zum Zitat Yang, Y., Wang, X., He, K.: Robust textual embedding against word-level adversarial attacks. In: Cussens, J., Zhang, K. (eds.) Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence. Proceedings of Machine Learning Research, vol. 180, pp. 2214–2224. PMLR, 01–05 August 2022. https://proceedings.mlr.press/v180/yang22c.html Yang, Y., Wang, X., He, K.: Robust textual embedding against word-level adversarial attacks. In: Cussens, J., Zhang, K. (eds.) Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence. Proceedings of Machine Learning Research, vol. 180, pp. 2214–2224. PMLR, 01–05 August 2022. https://​proceedings.​mlr.​press/​v180/​yang22c.​html
35.
Zurück zum Zitat Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666 (2019) Zampieri, M., Malmasi, S., Nakov, P., Rosenthal, S., Farra, N., Kumar, R.: Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:​1902.​09666 (2019)
36.
Zurück zum Zitat Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. IN: Advances in Neural Information Processing Systems, vol. 28 (2015) Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. IN: Advances in Neural Information Processing Systems, vol. 28 (2015)
37.
Zurück zum Zitat Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14443–14452 (2020) Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14443–14452 (2020)
Metadaten
Titel
MIC: An Effective Defense Against Word-Level Textual Backdoor Attacks
verfasst von
Shufan Yang
Qianmu Li
Zhichao Lian
Pengchuan Wang
Jun Hou
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-99-8076-5_1