Skip to main content
Top

2022 | OriginalPaper | Chapter

Evaluating the Use of Synthetic Queries for Pre-training a Semantic Query Tagger

Authors : Elias Bassani, Gabriella Pasi

Published in: Advances in Information Retrieval

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Semantic Query Labeling is the task of locating the constituent parts of a query and assigning domain-specific semantic labels to each of them. It allows unfolding the relations between the query terms and the documents’ structure while leaving unaltered the keyword-based query formulation. In this paper, we investigate the pre-training of a semantic query-tagger with synthetic data generated by leveraging the documents’ structure. By simulating a dynamic environment, we also evaluate the consistency of performance improvements brought by pre-training as real-world training data becomes available. The results of our experiments suggest both the utility of pre-training with synthetic data and its improvements’ consistency over time.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
2.
go back to reference Bassani, E., Pasi, G.: On building benchmark datasets for understudied information retrieval tasks: the case of semantic query labeling. In: Anelli, V.W., Noia, T.D., Ferro, N., Narducci, F. (eds.) Proceedings of the 11th Italian Information Retrieval Workshop 2021, Bari, Italy, 13–15 September 2021. CEUR Workshop Proceedings, vol. 2947. CEUR-WS.org (2021). http://ceur-ws.org/Vol-2947/paper16.pdf Bassani, E., Pasi, G.: On building benchmark datasets for understudied information retrieval tasks: the case of semantic query labeling. In: Anelli, V.W., Noia, T.D., Ferro, N., Narducci, F. (eds.) Proceedings of the 11th Italian Information Retrieval Workshop 2021, Bari, Italy, 13–15 September 2021. CEUR Workshop Proceedings, vol. 2947. CEUR-WS.org (2021). http://​ceur-ws.​org/​Vol-2947/​paper16.​pdf
4.
go back to reference Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
5.
go back to reference Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/n19-1423, https://doi.org/10.18653/v1/n19-1423 Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., Solorio, T. (eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019, Volume 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://​doi.​org/​10.​18653/​v1/​n19-1423, https://​doi.​org/​10.​18653/​v1/​n19-1423
6.
go back to reference Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
8.
go back to reference Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28–July 1 2001, pp. 282–289. Morgan Kaufmann (2001) Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28–July 1 2001, pp. 282–289. Morgan Kaufmann (2001)
9.
go back to reference Li, X.: Understanding the semantic structure of noun phrase queries. In: Hajic, J., Carberry, S., Clark, S. (eds.) ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 11–16 July 2010, Uppsala, Sweden, pp. 1337–1345. The Association for Computer Linguistics (2010). https://www.aclweb.org/anthology/P10-1136/ Li, X.: Understanding the semantic structure of noun phrase queries. In: Hajic, J., Carberry, S., Clark, S. (eds.) ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 11–16 July 2010, Uppsala, Sweden, pp. 1337–1345. The Association for Computer Linguistics (2010). https://​www.​aclweb.​org/​anthology/​P10-1136/​
12.
go back to reference Manshadi, M., Li, X.: Semantic tagging of web search queries. In: Su, K., Su, J., Wiebe, J. (eds.) ACL 2009, Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2–7 August 2009, Singapore, pp. 861–869. The Association for Computer Linguistics (2009). https://www.aclweb.org/anthology/P09-1097/ Manshadi, M., Li, X.: Semantic tagging of web search queries. In: Su, K., Su, J., Wiebe, J. (eds.) ACL 2009, Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2–7 August 2009, Singapore, pp. 861–869. The Association for Computer Linguistics (2009). https://​www.​aclweb.​org/​anthology/​P09-1097/​
15.
go back to reference Sarkas, N., Paparizos, S., Tsaparas, P.: Structured annotations of web queries. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 771–782 (2010) Sarkas, N., Paparizos, S., Tsaparas, P.: Structured annotations of web queries. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 771–782 (2010)
Metadata
Title
Evaluating the Use of Synthetic Queries for Pre-training a Semantic Query Tagger
Authors
Elias Bassani
Gabriella Pasi
Copyright Year
2022
DOI
https://doi.org/10.1007/978-3-030-99739-7_5