Skip to main content
Top

2024 | OriginalPaper | Chapter

DQAC: Detoxifying Query Auto-completion with Adapters

Authors : Aishwarya Maheswaran, Kaushal Kumar Maurya, Manish Gupta, Maunendra Sankar Desarkar

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer Nature Singapore

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Recent Query Auto-completion (QAC) systems leverage natural language generation or pre-trained language models (PLMs) to demonstrate remarkable performance. However, these systems also suffer from biased and toxic completions. Efforts have been made to address language detoxification within PLMs using controllable text generation (CTG) techniques, involving training with non-toxic data and employing decoding time approaches. As the completions for QAC systems are usually short, these existing CTG methods based on decoding and training are not directly transferable. Towards these concerns, we propose the first public QAC detoxification model, Detoxifying Query Auto-Completion (or DQAC), which utilizes adapters in a CTG framework. DQAC operates on latent representations with no additional overhead. It leverages two adapters for toxic and non-toxic cases. During inference, we fuse these representations in a controlled manner that guides the generation of query completions towards non-toxicity. We evaluate toxicity levels in the generated completions across two real-world datasets using two classifiers: a publicly available (Detoxify) and a search query-specific classifier which we develop (QDetoxify). DQAC consistently outperforms all existing baselines and emerges as a state-of-the-art model providing high quality and low toxicity. We make the code publicly available\(^{1}\).(\(^{1}\) https://​shorturl.​at/​zJ024)

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Cai, F., De Rijke, M., et al.: A survey of query auto completion in information retrieval. Found. Trends® in Inf. Retrieval 10(4), 273–363 (2016) Cai, F., De Rijke, M., et al.: A survey of query auto completion in information retrieval. Found. Trends® in Inf. Retrieval 10(4), 273–363 (2016)
2.
go back to reference Dathathri, S., Madotto, A., Lan, J., Hung, J., Frank, E., Molino, P., Yosinski, J., Liu, R.: Plug and play language models: a simple approach to controlled text generation. In: ICLR (2020) Dathathri, S., Madotto, A., Lan, J., Hung, J., Frank, E., Molino, P., Yosinski, J., Liu, R.: Plug and play language models: a simple approach to controlled text generation. In: ICLR (2020)
3.
go back to reference Gehman, S., Gururangan, S., Sap, M., Choi, Y., Smith, N.A.: RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In: EMNLP Findings, pp. 3356–3369 (2020) Gehman, S., Gururangan, S., Sap, M., Choi, Y., Smith, N.A.: RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In: EMNLP Findings, pp. 3356–3369 (2020)
5.
go back to reference Gururangan, S., et al.: Don’t stop pretraining: Adapt language models to domains and tasks. In: ACL, pp. 8342–8360. Association for Computational Linguistics (2020) Gururangan, S., et al.: Don’t stop pretraining: Adapt language models to domains and tasks. In: ACL, pp. 8342–8360. Association for Computational Linguistics (2020)
7.
go back to reference Hartvigsen, T., Gabriel, S., Palangi, H., Sap, M., Ray, D., Kamar, E.: ToxiGen: a large-scale machine-generated dataset for adversarial and implicit hate speech detection. In: ACL, pp. 3309–3326 (May 2022) Hartvigsen, T., Gabriel, S., Palangi, H., Sap, M., Ray, D., Kamar, E.: ToxiGen: a large-scale machine-generated dataset for adversarial and implicit hate speech detection. In: ACL, pp. 3309–3326 (May 2022)
8.
go back to reference Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019) Houlsby, N., et al.: Parameter-efficient transfer learning for NLP. In: ICML, pp. 2790–2799. PMLR (2019)
9.
go back to reference Lees, A., et al.: A new generation of perspective API: efficient multilingual character-level transformers. KDD (2022) Lees, A., et al.: A new generation of perspective API: efficient multilingual character-level transformers. KDD (2022)
10.
go back to reference Liu, A., et al.: DExperts: decoding-time controlled text generation with experts and anti-experts. In: ACL-IJCNLP, pp. 6691–6706 (Aug 2021) Liu, A., et al.: DExperts: decoding-time controlled text generation with experts and anti-experts. In: ACL-IJCNLP, pp. 6691–6706 (Aug 2021)
11.
go back to reference Logacheva, V., et al.: ParaDetox: detoxification with parallel data. In: ACL, pp. 6804–6818 (2022) Logacheva, V., et al.: ParaDetox: detoxification with parallel data. In: ACL, pp. 6804–6818 (2022)
12.
go back to reference Lu, X., et al.: Quark: controllable text generation with reinforced unlearning. NeurIPS 35, 27591–27609 (2022) Lu, X., et al.: Quark: controllable text generation with reinforced unlearning. NeurIPS 35, 27591–27609 (2022)
13.
go back to reference Maurya, K.K., Desarkar, M.S., Gupta, M., Agrawal, P.: TRIE-NLG: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes. In: DMKD, vol. 1573-756X. ECML-PKDD 2023 (2023) Maurya, K.K., Desarkar, M.S., Gupta, M., Agrawal, P.: TRIE-NLG: trie context augmentation to improve personalized query auto-completion for short and unseen prefixes. In: DMKD, vol. 1573-756X. ECML-PKDD 2023 (2023)
14.
go back to reference Maurya, K.K., Desarkar, M.S., Kano, Y., Deepshikha, K.: ZmBART: an unsupervised cross-lingual transfer framework for language generation. In: ACL-IJCNLP Findings, pp. 2804–2818 (Aug 2021) Maurya, K.K., Desarkar, M.S., Kano, Y., Deepshikha, K.: ZmBART: an unsupervised cross-lingual transfer framework for language generation. In: ACL-IJCNLP Findings, pp. 2804–2818 (Aug 2021)
15.
go back to reference Mitra, B., Craswell, N.: Query auto-completion for rare prefixes. In: CIKM, pp. 1755–1758 (2015) Mitra, B., Craswell, N.: Query auto-completion for rare prefixes. In: CIKM, pp. 1755–1758 (2015)
16.
go back to reference Olteanu, A., Diaz, F., Kazai, G.: When are search completion suggestions problematic? Proc. ACM on Hum.-Comput. Inter. 4(CSCW2), 1–25 (2020)CrossRef Olteanu, A., Diaz, F., Kazai, G.: When are search completion suggestions problematic? Proc. ACM on Hum.-Comput. Inter. 4(CSCW2), 1–25 (2020)CrossRef
17.
go back to reference Pozzobon, L.A., Ermis, B., Lewis, P., Hooker, S.: On the challenges of using black-box APIs for toxicity evaluation in research. ArXiv abs/2304.12397 (2023) Pozzobon, L.A., Ermis, B., Lewis, P., Hooker, S.: On the challenges of using black-box APIs for toxicity evaluation in research. ArXiv abs/2304.12397 (2023)
19.
go back to reference Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: EMNLP (11 2019) Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: EMNLP (11 2019)
20.
go back to reference Stickland, A.C., Murray, I.: BERT and PALs: projected attention layers for efficient adaptation in multi-task learning. In: ICML, pp. 5986–5995. PMLR (2019) Stickland, A.C., Murray, I.: BERT and PALs: projected attention layers for efficient adaptation in multi-task learning. In: ICML, pp. 5986–5995. PMLR (2019)
21.
go back to reference Üstün, A., Bérard, A., Besacier, L., Gallé, M.: Multilingual unsupervised neural machine translation with denoising adapters. In: EMNLP (2021) Üstün, A., Bérard, A., Besacier, L., Gallé, M.: Multilingual unsupervised neural machine translation with denoising adapters. In: EMNLP (2021)
22.
go back to reference Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13, 254–270 (2010)CrossRef Wu, Q., Burges, C.J., Svore, K.M., Gao, J.: Adapting boosting for information retrieval measures. Inf. Retrieval 13, 254–270 (2010)CrossRef
23.
go back to reference Yadav, N., Sen, R., Hill, D.N., Mazumdar, A., Dhillon, I.S.: Session-aware query auto-completion using extreme multi-label ranking. In: KDD, pp. 3835–3844 (2021) Yadav, N., Sen, R., Hill, D.N., Mazumdar, A., Dhillon, I.S.: Session-aware query auto-completion using extreme multi-label ranking. In: KDD, pp. 3835–3844 (2021)
Metadata
Title
DQAC: Detoxifying Query Auto-completion with Adapters
Authors
Aishwarya Maheswaran
Kaushal Kumar Maurya
Manish Gupta
Maunendra Sankar Desarkar
Copyright Year
2024
Publisher
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-2266-2_9

Premium Partner