Skip to main content
Top
Published in:
Cover of the book

2017 | OriginalPaper | Chapter

Convolutional Bi-directional LSTM for Detecting Inappropriate Query Suggestions in Web Search

Authors : Harish Yenala, Manoj Chinnakotla, Jay Goyal

Published in: Advances in Knowledge Discovery and Data Mining

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

A web search query is considered inappropriate if it may cause anger, annoyance to certain users or exhibits lack of respect, rudeness, discourteousness towards certain individuals/communities or may be capable of inflicting harm to oneself or others. A search engine should regulate its query completion suggestions by detecting and filtering such queries as it may hurt the user sentiments or may lead to legal issues thereby tarnishing the brand image. Hence, automatic detection and pruning of such inappropriate queries from completions and related search suggestions is an important problem for most commercial search engines. The problem is rendered difficult due to unique challenges posed by search queries such as lack of sufficient context, natural language ambiguity and presence of spelling mistakes and variations.
In this paper, we propose a novel deep learning based technique for automatically identifying inappropriate query suggestions. We propose a novel deep learning architecture called “Convolutional Bi-Directional LSTM (C-BiLSTM)” which combines the strengths of both Convolution Neural Networks (CNN) and Bi-directional LSTMs (BLSTM). Given a query, C-BiLSTM uses a convolutional layer for extracting feature representations for each query word which is then fed as input to the BLSTM layer which captures the various sequential patterns in the entire query and outputs a richer representation encoding them. The query representation thus learnt passes through a deep fully connected network which predicts the target class. C-BiLSTM doesn’t rely on hand-crafted features, is trained end-end as a single model, and effectively captures both local features as well as their global semantics. Evaluating C-BiLSTM on real-world search queries from a commercial search engine reveals that it significantly outperforms both pattern based and other hand-crafted feature based baselines. Moreover, C-BiLSTM also performs better than individual CNN, LSTM and BLSTM models trained for the same task.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Bar-Yossef, Z., Kraus, N.: Context-sensitive query auto-completion. In: WWW 2011, pp. 107–116. ACM, New York (2011) Bar-Yossef, Z., Kraus, N.: Context-sensitive query auto-completion. In: WWW 2011, pp. 107–116. ACM, New York (2011)
2.
go back to reference Shokouhi, M., Radinsky, K.: Time-sensitive query auto-completion. In: SIGIR 2012, pp. 601–610. ACM, New York (2012) Shokouhi, M., Radinsky, K.: Time-sensitive query auto-completion. In: SIGIR 2012, pp. 601–610. ACM, New York (2012)
3.
go back to reference Whiting, S., Jose, J.M.: Recent and robust query auto-completion. In: WWW 2014 (2014) Whiting, S., Jose, J.M.: Recent and robust query auto-completion. In: WWW 2014 (2014)
4.
go back to reference Cai, F., de Rijke, M.: A survey of query auto completion in information retrieval. Found. Trends\(\textregistered \) Inf. Retrieval 10(4), 273–363 (2016) Cai, F., de Rijke, M.: A survey of query auto completion in information retrieval. Found. Trends\(\textregistered \) Inf. Retrieval 10(4), 273–363 (2016)
5.
go back to reference Di Santo, G., McCreadie, R., Macdonald, C., Ounis, I.: Comparing approaches for query autocompletion. In: SIGIR 2015. ACM, New York (2015) Di Santo, G., McCreadie, R., Macdonald, C., Ounis, I.: Comparing approaches for query autocompletion. In: SIGIR 2015. ACM, New York (2015)
6.
go back to reference Vandersmissen, B., De Turck, F., Wauters, T.: Automated Detection of Offensive Language Behavior on Social Networking Sites, vol. xiv, 81 p. (2012) Vandersmissen, B., De Turck, F., Wauters, T.: Automated Detection of Offensive Language Behavior on Social Networking Sites, vol. xiv, 81 p. (2012)
7.
go back to reference Xiang, G., Fan, B., Wang, L., Hong, J., Rose, C.: Detecting offensive tweets via topical feature discovery over a large scale Twitter corpus, pp. 1980–1984 (2012) Xiang, G., Fan, B., Wang, L., Hong, J., Rose, C.: Detecting offensive tweets via topical feature discovery over a large scale Twitter corpus, pp. 1980–1984 (2012)
8.
go back to reference Xu, Z., Zhu., S.: Filtering offensive language in online communities using grammatical relations. In: Proceedings of the Seventh Annual CEAS (2010) Xu, Z., Zhu., S.: Filtering offensive language in online communities using grammatical relations. In: Proceedings of the Seventh Annual CEAS (2010)
9.
go back to reference Razavi, A.H., Inkpen, D., Uritsky, S., Matwin, S.: Offensive language detection using multi-level classification. In: Farzindar, A., Kešelj, V. (eds.) AI 2010. LNCS (LNAI), vol. 6085, pp. 16–27. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13059-5_5 CrossRef Razavi, A.H., Inkpen, D., Uritsky, S., Matwin, S.: Offensive language detection using multi-level classification. In: Farzindar, A., Kešelj, V. (eds.) AI 2010. LNCS (LNAI), vol. 6085, pp. 16–27. Springer, Heidelberg (2010). doi:10.​1007/​978-3-642-13059-5_​5 CrossRef
10.
go back to reference Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATH
11.
go back to reference Chuklin, A., Lavrentyeva, A.: Adult query classification for web search and recommendation. In: Search and Exploration of X-Rated Information (WSDM 2013) Chuklin, A., Lavrentyeva, A.: Adult query classification for web search and recommendation. In: Search and Exploration of X-Rated Information (WSDM 2013)
12.
go back to reference Yelong, S., Xiaodong, H., Jianfeng, G., Li, D., Gregoire, M.: A latent semantic model with convolutional-pooling structure for information retrieval. In: CIKM, November 2014 Yelong, S., Xiaodong, H., Jianfeng, G., Li, D., Gregoire, M.: A latent semantic model with convolutional-pooling structure for information retrieval. In: CIKM, November 2014
13.
go back to reference Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM 2013 (2013) Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM 2013 (2013)
14.
go back to reference Zhou, C., Sun, C., Liu, Z., Lau, F.C.M.: A C-LSTM neural network for text classification. CoRR abs/1511.08630 (2015) Zhou, C., Sun, C., Liu, Z., Lau, F.C.M.: A C-LSTM neural network for text classification. CoRR abs/1511.08630 (2015)
15.
go back to reference Sainath, T.N., Senior, A.W., Vinyals, O., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. US Patent App. 14/847,133, 7 April 2016 Sainath, T.N., Senior, A.W., Vinyals, O., Sak, H.: Convolutional, long short-term memory, fully connected deep neural networks. US Patent App. 14/847,133, 7 April 2016
16.
go back to reference Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LTM-CNNs. CoRR abs/1511.08308 (2015) Chiu, J.P.C., Nichols, E.: Named entity recognition with bidirectional LTM-CNNs. CoRR abs/1511.08308 (2015)
17.
go back to reference Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML 2010, 21–24 June 2010, Haifa, Israel, pp. 807–814 (2010) Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML 2010, 21–24 June 2010, Haifa, Israel, pp. 807–814 (2010)
18.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
19.
go back to reference Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)CrossRef Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)CrossRef
20.
go back to reference Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988) Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)
21.
go back to reference Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Technical report UCB/EECS-2010-24 Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Technical report UCB/EECS-2010-24
22.
go back to reference Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)CrossRef Fleiss, J.L.: Measuring nominal scale agreement among many raters. Psychol. Bull. 76(5), 378–382 (1971)CrossRef
23.
go back to reference Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3) (1995) Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3) (1995)
24.
go back to reference Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, Berlin (2001)MATH Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer Series in Statistics, vol. 1. Springer, Berlin (2001)MATH
25.
go back to reference Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
26.
go back to reference Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATH
Metadata
Title
Convolutional Bi-directional LSTM for Detecting Inappropriate Query Suggestions in Web Search
Authors
Harish Yenala
Manoj Chinnakotla
Jay Goyal
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-57454-7_1

Premium Partner