Skip to main content

2021 | OriginalPaper | Buchkapitel

Extracting Search Tasks from Query Logs Using a Recurrent Deep Clustering Architecture

verfasst von : Luis Lugo, Jose G. Moreno, Gilles Hubert

Erschienen in: Advances in Information Retrieval

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Users fulfill their information needs by expressing them using search queries and running the queries in available search engines. The mining of query logs from search engines enables the automatic extraction of search tasks by clustering related queries into groups representing search tasks. The extraction of search tasks is crucial for multiple user supporting applications like query recommendation, query term prediction, and results ranking depending on search tasks. Most existing search task extraction methods use graph-based or nonparametric models, which grow as the query log size increases. Deep clustering methods offer a parametric alternative, but most deep clustering architectures fail to exploit recurrent neural networks for learning text data representations. We propose a recurrent deep clustering model for extracting search tasks from query logs. The proposed architecture leverages self-training and dual recurrent encoders for learning suitable latent representations of user queries, outperforming previous deep clustering methods. It is also a parametric approach that offers the possibility of having a fixed-sized architecture for analyzing increasingly large search query logs.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aljalbout, E., Golkov, V., Siddiqui, Y., Strobel, M., Cremers, D.: Clustering with deep learning: taxonomy and new methods. arXiv preprint arXiv:1801.07648 (2018) Aljalbout, E., Golkov, V., Siddiqui, Y., Strobel, M., Cremers, D.: Clustering with deep learning: taxonomy and new methods. arXiv preprint arXiv:​1801.​07648 (2018)
2.
Zurück zum Zitat Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48 (2009) Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 41–48 (2009)
3.
Zurück zum Zitat Blundell, C., Teh, Y.W., Heller, K.A.: Bayesian rose trees. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, UAI 2010, pp. 65–72. AUAI Press, Arlington, Virginia, United States (2010) Blundell, C., Teh, Y.W., Heller, K.A.: Bayesian rose trees. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, UAI 2010, pp. 65–72. AUAI Press, Arlington, Virginia, United States (2010)
4.
Zurück zum Zitat Callan, J.: The Lemur project and its ClueWeb12B dataset. In: Invited talk at the SIGIR 2012 Workshop on Open-Source Information Retrieval (2012) Callan, J.: The Lemur project and its ClueWeb12B dataset. In: Invited talk at the SIGIR 2012 Workshop on Open-Source Information Retrieval (2012)
5.
Zurück zum Zitat Carterette, B., Clough, P., Hall, M., Kanoulas, E., Sanderson, M.: Evaluating retrieval over sessions: the trec session track 2011–2014. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 685–688. ACM (2016) Carterette, B., Clough, P., Hall, M., Kanoulas, E., Sanderson, M.: Evaluating retrieval over sessions: the trec session track 2011–2014. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 685–688. ACM (2016)
6.
Zurück zum Zitat Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C.: Deep adaptive image clustering. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5880–5888. IEEE (2017) Chang, J., Wang, L., Meng, G., Xiang, S., Pan, C.: Deep adaptive image clustering. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5880–5888. IEEE (2017)
7.
Zurück zum Zitat Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:2002.05709 (2020) Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv preprint arXiv:​2002.​05709 (2020)
8.
Zurück zum Zitat Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014) Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:​1412.​3555 (2014)
9.
Zurück zum Zitat Du, C., Shu, P., Li, Y.: CA-LSTM: search task identification with context attention based LSTM. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1101–1104. ACM (2018) Du, C., Shu, P., Li, Y.: CA-LSTM: search task identification with context attention based LSTM. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1101–1104. ACM (2018)
10.
Zurück zum Zitat Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 489–500 (2018) Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 489–500 (2018)
11.
Zurück zum Zitat Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: The 2nd International Conference on Knowledge Discovery and Data Mining vol. 96, pp. 226–231 (1996) Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: The 2nd International Conference on Knowledge Discovery and Data Mining vol. 96, pp. 226–231 (1996)
12.
Zurück zum Zitat Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Heidelberg (2012)CrossRef Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Heidelberg (2012)CrossRef
13.
Zurück zum Zitat Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: International Joint Conference on Artificial Intelligence, pp. 1753–1759 (2017) Guo, X., Gao, L., Liu, X., Yin, J.: Improved deep embedded clustering with local structure preservation. In: International Joint Conference on Artificial Intelligence, pp. 1753–1759 (2017)
14.
Zurück zum Zitat Hagen, M., Gomoll, J., Beyer, A., Stein, B.: From search session detection to search mission detection. In: Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pp. 85–92 (2013) Hagen, M., Gomoll, J., Beyer, A., Stein, B.: From search session detection to search mission detection. In: Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pp. 85–92 (2013)
15.
Zurück zum Zitat Hearst, M.: Search User Interfaces. Cambridge University Press, Cambridge, CB2 8BS, UK (2009) Hearst, M.: Search User Interfaces. Cambridge University Press, Cambridge, CB2 8BS, UK (2009)
17.
Zurück zum Zitat Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015) Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:​1506.​00019 (2015)
18.
Zurück zum Zitat Lucchese, C., Orlando, S., Perego, R., Silvestri, F., Tolomei, G.: Identifying task-based sessions in search engine query logs. In: Proceedings of the 4th ACM International Conference on Web Search and Data mining, pp. 277–286. ACM (2011) Lucchese, C., Orlando, S., Perego, R., Silvestri, F., Tolomei, G.: Identifying task-based sessions in search engine query logs. In: Proceedings of the 4th ACM International Conference on Web Search and Data mining, pp. 277–286. ACM (2011)
19.
Zurück zum Zitat Lugo, L., Moreno, J.G., Hubert, G.: A multilingual approach for unsupervised search task identification. In: The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2041–2044. ACM (2020) Lugo, L., Moreno, J.G., Hubert, G.: A multilingual approach for unsupervised search task identification. In: The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2041–2044. ACM (2020)
20.
Zurück zum Zitat Lugo, L., Moreno, J.G., Hubert, G.: Segmenting search query logs by learning to detect search task boundaries. In: The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2037–2040. ACM (2020) Lugo, L., Moreno, J.G., Hubert, G.: Segmenting search query logs by learning to detect search task boundaries. In: The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2037–2040. ACM (2020)
21.
Zurück zum Zitat Luo, Y., Chen, Z., Hershey, J.R., Le Roux, J., Mesgarani, N.: Deep clustering and conventional networks for music separation: stronger together. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 61–65. IEEE (2017) Luo, Y., Chen, Z., Hershey, J.R., Le Roux, J., Mesgarani, N.: Deep clustering and conventional networks for music separation: stronger together. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 61–65. IEEE (2017)
22.
Zurück zum Zitat Mehrotra, R., Bhattacharya, P., Yilmaz, E.: Deconstructing complex search tasks: a Bayesian nonparametric approach for extracting sub-tasks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 599–605 (2016) Mehrotra, R., Bhattacharya, P., Yilmaz, E.: Deconstructing complex search tasks: a Bayesian nonparametric approach for extracting sub-tasks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 599–605 (2016)
23.
Zurück zum Zitat Mehrotra, R., Yilmaz, E.: Extracting hierarchies of search tasks & subtasks via a Bayesian nonparametric approach. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 285–294. ACM (2017) Mehrotra, R., Yilmaz, E.: Extracting hierarchies of search tasks & subtasks via a Bayesian nonparametric approach. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 285–294. ACM (2017)
24.
Zurück zum Zitat Min, E., Guo, X., Liu, Q., Zhang, G., Cui, J., Long, J.: A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access 6, 39501–39514 (2018)CrossRef Min, E., Guo, X., Liu, Q., Zhang, G., Cui, J., Long, J.: A survey of clustering with deep learning: From the perspective of network architecture. IEEE Access 6, 39501–39514 (2018)CrossRef
25.
Zurück zum Zitat Mitchell, M.: Artificial Intelligence: A Guide for Thinking Humans. Farrar, Straus and Giroux, New York, NY, US (2019) Mitchell, M.: Artificial Intelligence: A Guide for Thinking Humans. Farrar, Straus and Giroux, New York, NY, US (2019)
26.
Zurück zum Zitat Moreno, J.G.: Point symmetry-based deep clustering. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1747–1750. ACM (2018) Moreno, J.G.: Point symmetry-based deep clustering. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 1747–1750. ACM (2018)
27.
Zurück zum Zitat Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2012)MATH Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2012)MATH
28.
Zurück zum Zitat Murtagh, F., Legendre, P.: Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J. Classif. 31(3), 274–295 (2014)MathSciNetCrossRef Murtagh, F., Legendre, P.: Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J. Classif. 31(3), 274–295 (2014)MathSciNetCrossRef
29.
Zurück zum Zitat Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014) Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
30.
Zurück zum Zitat Sen, P., Ganguly, D., Jones, G.: Tempo-lexical context driven word embedding for cross-session search task extraction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 283–292 (2018) Sen, P., Ganguly, D., Jones, G.: Tempo-lexical context driven word embedding for cross-session search task extraction. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 283–292 (2018)
31.
Zurück zum Zitat Tan, P.N., Steinbach, M., Karpatne, A., Kumar, V.: Introduction to Data Mining, 2nd edn. Pearson Education, London (2018) Tan, P.N., Steinbach, M., Karpatne, A., Kumar, V.: Introduction to Data Mining, 2nd edn. Pearson Education, London (2018)
32.
Zurück zum Zitat Völske, M., Fatehifar, E., Stein, B., Hagen, M.: Query-task mapping. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 969–972 (2019) Völske, M., Fatehifar, E., Stein, B., Hagen, M.: Query-task mapping. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 969–972 (2019)
33.
Zurück zum Zitat Wang, H., Song, Y., Chang, M.W., He, X., White, R.W., Chu, W.: Learning to extract cross-session search tasks. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1353–1364. ACM (2013) Wang, H., Song, Y., Chang, M.W., He, X., White, R.W., Chu, W.: Learning to extract cross-session search tasks. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 1353–1364. ACM (2013)
34.
Zurück zum Zitat Wang, Z.Q., Le Roux, J., Hershey, J.R.: Alternative objective functions for deep clustering. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 686–690. IEEE (2018) Wang, Z.Q., Le Roux, J., Hershey, J.R.: Alternative objective functions for deep clustering. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 686–690. IEEE (2018)
35.
Zurück zum Zitat Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)MathSciNetCrossRef Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58(301), 236–244 (1963)MathSciNetCrossRef
36.
Zurück zum Zitat Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016) Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016)
37.
Zurück zum Zitat Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019) Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:​1904.​12848 (2019)
38.
Zurück zum Zitat Yang, Y., et al.: Improving multilingual sentence embedding using bi-directional dual encoder with additive margin softmax. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 5370–5378. AAAI Press (2019) Yang, Y., et al.: Improving multilingual sentence embedding using bi-directional dual encoder with additive margin softmax. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 5370–5378. AAAI Press (2019)
39.
Zurück zum Zitat Yang, Y., et al.: Multilingual universal sentence encoder for semantic retrieval. In: Proceedings of the 58th Annual Meeting of the ACL: System Demonstrations, pp. 87–94. ACL (2020) Yang, Y., et al.: Multilingual universal sentence encoder for semantic retrieval. In: Proceedings of the 58th Annual Meeting of the ACL: System Demonstrations, pp. 87–94. ACL (2020)
40.
Zurück zum Zitat Zhang, H., et al.: Generic intent representation in web search. In: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2019) Zhang, H., et al.: Generic intent representation in web search. In: The 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2019)
Metadaten
Titel
Extracting Search Tasks from Query Logs Using a Recurrent Deep Clustering Architecture
verfasst von
Luis Lugo
Jose G. Moreno
Gilles Hubert
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-030-72113-8_26