Skip to main content
Top

2020 | OriginalPaper | Chapter

Improving Arabic Microblog Retrieval with Distributed Representations

Authors : Shahad Alshalan, Raghad Alshalan, Hend Al-Khalifa, Reem Suwaileh, Tamer Elsayed

Published in: Information Retrieval Technology

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Query expansion (QE) using pseudo relevance feedback (PRF) is one of the approaches that has been shown to be effective for improving microblog retrieval. In this paper, we investigate the performance of three different embedding-based methods on Arabic microblog retrieval: Embedding-based QE, Embedding-based PRF, and PRF incorporated with embedding-based reranking. Our experimental results over three variants of EveTAR test collection showed a consistent improvement of the reranking method over the traditional PRF baseline using both MAP and P@10 evaluation measures. The improvement is statistically-significant in some cases. However, while the embedding-based QE fails to improve over the traditional PRF, the embedding-based PRF successfully outperforms the baseline in several cases, with a statistically-significant improvement using MAP measure over two variants of the test collection.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
Literature
1.
go back to reference Abdelali, A., Darwish, K., Durrani, N., Mubarak, H.: Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 11–16 (2016) Abdelali, A., Darwish, K., Durrani, N., Mubarak, H.: Farasa: a fast and furious segmenter for Arabic. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 11–16 (2016)
3.
go back to reference Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:1705.02364 (2017) Conneau, A., Kiela, D., Schwenk, H., Barrault, L., Bordes, A.: Supervised learning of universal sentence representations from natural language inference data. arXiv preprint arXiv:​1705.​02364 (2017)
4.
go back to reference Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018) Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:​1810.​04805 (2018)
5.
go back to reference Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2Vec: character-based distributed representations for social media. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 269–274. Association for Computational Linguistics, Berlin, August 2016. http://anthology.aclweb.org/P16-2044 Dhingra, B., Zhou, Z., Fitzpatrick, D., Muehl, M., Cohen, W.: Tweet2Vec: character-based distributed representations for social media. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 269–274. Association for Computational Linguistics, Berlin, August 2016. http://​anthology.​aclweb.​org/​P16-2044
6.
go back to reference Diaz, F., Mitra, B., Craswell, N.: Query expansion with locally-trained word embeddings. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 367–377. Association for Computational Linguistics, Berlin (2016) Diaz, F., Mitra, B., Craswell, N.: Query expansion with locally-trained word embeddings. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Long Papers), pp. 367–377. Association for Computational Linguistics, Berlin (2016)
7.
go back to reference Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 911–920. ACM (2012) Efron, M., Organisciak, P., Fenlon, K.: Improving retrieval of short texts through document expansion. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 911–920. ACM (2012)
8.
go back to reference El-Ganainy, T., Magdy, W., Gao, W., Wei, Z.: QCRI at TREC 2013 microblog track. In: Proceedings of the 22nd Text Retrieval Conference (TREC) (2013) El-Ganainy, T., Magdy, W., Gao, W., Wei, Z.: QCRI at TREC 2013 microblog track. In: Proceedings of the 22nd Text Retrieval Conference (TREC) (2013)
9.
go back to reference El-Ganainy, T., Magdy, W., Rafea, A.: Hyperlink-extended pseudo relevance feedback for improved microblog retrieval. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis (SoMeRA 2014), pp. 7–12. ACM Press, Gold Coast (2014) El-Ganainy, T., Magdy, W., Rafea, A.: Hyperlink-extended pseudo relevance feedback for improved microblog retrieval. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis (SoMeRA 2014), pp. 7–12. ACM Press, Gold Coast (2014)
10.
go back to reference El Mahdaouy, A., El Alaoui, S.O., Gaussier, E.: Word-embedding-based pseudo-relevance feedback for Arabic information retrieval. J. Inf. Sci. 45(4), 429–442 (2018)CrossRef El Mahdaouy, A., El Alaoui, S.O., Gaussier, E.: Word-embedding-based pseudo-relevance feedback for Arabic information retrieval. J. Inf. Sci. 45(4), 429–442 (2018)CrossRef
11.
go back to reference Ganguly, D., Roy, D., Mitra, M., Jones, G.: Representing documents and queries as sets of word embedded vectors for information retrieval. In: ACM SIGIR Workshop on Neural Information Retrieval (Neu-IR) (2016) Ganguly, D., Roy, D., Mitra, M., Jones, G.: Representing documents and queries as sets of word embedded vectors for information retrieval. In: ACM SIGIR Workshop on Neural Information Retrieval (Neu-IR) (2016)
12.
go back to reference Han, Z., Li, X., Yang, M., Qi, H., Li, S., Zhao, T.: HIT at TREC 2012 microblog track. In: Proceedings of the 21st Text Retrieval Conference (TREC), vol. 12, p. 19 (2012) Han, Z., Li, X., Yang, M., Qi, H., Li, S., Zhao, T.: HIT at TREC 2012 microblog track. In: Proceedings of the 21st Text Retrieval Conference (TREC), vol. 12, p. 19 (2012)
13.
go back to reference Hasanain, M., Elsayed, T.: QU at TREC-2014: online clustering with temporal and topical expansion for tweet timeline generation. In: Proceedings of the 23rd Text Retrieval Conference (TREC) (2014) Hasanain, M., Elsayed, T.: QU at TREC-2014: online clustering with temporal and topical expansion for tweet timeline generation. In: Proceedings of the 23rd Text Retrieval Conference (TREC) (2014)
14.
go back to reference Hasanain, M., Suwaileh, R., Elsayed, T., Kutlu, M., Almerekhi, H.: EveTAR: building a large-scale multi-task test collection over Arabic tweets. Inf. Retr. J. 21(4), 307–336 (2018)CrossRef Hasanain, M., Suwaileh, R., Elsayed, T., Kutlu, M., Almerekhi, H.: EveTAR: building a large-scale multi-task test collection over Arabic tweets. Inf. Retr. J. 21(4), 307–336 (2018)CrossRef
15.
go back to reference Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016) Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:​1607.​01759 (2016)
16.
go back to reference Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1411–1420. ACM (2015) Kenter, T., De Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1411–1420. ACM (2015)
17.
go back to reference Kuzi, S., Carmel, D., Libov, A., Raviv, A.: Query expansion for email search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 849–852. ACM (2017) Kuzi, S., Carmel, D., Libov, A., Raviv, A.: Query expansion for email search. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 849–852. ACM (2017)
18.
go back to reference Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM 2016, pp. 1929–1932. ACM Press, Indianapolis (2016) Kuzi, S., Shtok, A., Kurland, O.: Query expansion using word embeddings. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management - CIKM 2016, pp. 1929–1932. ACM Press, Indianapolis (2016)
19.
go back to reference Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs], January 2013 Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv:​1301.​3781 [cs], January 2013
20.
go back to reference Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
22.
go back to reference Mitra, B., Craswell, N., et al.: An introduction to neural information retrieval. Found. Trends® Inf. Retr. 13(1), 1–126 (2018)CrossRef Mitra, B., Craswell, N., et al.: An introduction to neural information retrieval. Found. Trends® Inf. Retr. 13(1), 1–126 (2018)CrossRef
23.
go back to reference Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprint arXiv:1602.01137 (2016) Mitra, B., Nalisnick, E., Craswell, N., Caruana, R.: A dual embedding space model for document ranking. arXiv preprint arXiv:​1602.​01137 (2016)
24.
go back to reference Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 439–448. ACM (2013) Miyanishi, T., Seki, K., Uehara, K.: Improving pseudo-relevance feedback via tweet selection. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 439–448. ACM (2013)
25.
go back to reference Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 83–84. International World Wide Web Conferences Steering Committee (2016) Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 83–84. International World Wide Web Conferences Steering Committee (2016)
26.
go back to reference Onal, K.D., et al.: Neural information retrieval: at the end of the early years. Inf. Retr. J. 21(2–3), 111–182 (2018)CrossRef Onal, K.D., et al.: Neural information retrieval: at the end of the early years. Inf. Retr. J. 21(2–3), 111–182 (2018)CrossRef
28.
go back to reference Rekabsaz, N., Lupu, M., Hanbury, A., Zamani, H.: Word embedding causes topic shifting; exploit global context! In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1105–1108. ACM (2017) Rekabsaz, N., Lupu, M., Hanbury, A., Zamani, H.: Word embedding causes topic shifting; exploit global context! In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1105–1108. ACM (2017)
29.
go back to reference Soliman, A.B., Eissa, K., El-Beltagy, S.R.: AraVec: a set of Arabic word embedding models for use in Arabic NLP. Procedia Comput. Sci. 117, 256–265 (2017)CrossRef Soliman, A.B., Eissa, K., El-Beltagy, S.R.: AraVec: a set of Arabic word embedding models for use in Arabic NLP. Procedia Comput. Sci. 117, 256–265 (2017)CrossRef
30.
go back to reference Wei, Z., Gao, W., El-Ganainy, T., Magdy, W., Wong, K.F.: Ranking model selection and fusion for effective microblog search. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis (SoMeRA 2014), pp. 21–26. ACM Press, Gold Coast (2014) Wei, Z., Gao, W., El-Ganainy, T., Magdy, W., Wong, K.F.: Ranking model selection and fusion for effective microblog search. In: Proceedings of the First International Workshop on Social Media Retrieval and Analysis (SoMeRA 2014), pp. 21–26. ACM Press, Gold Coast (2014)
31.
go back to reference Zamani, H., Croft, W.B.: Embedding-based query language models. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 147–156. ACM (2016) Zamani, H., Croft, W.B.: Embedding-based query language models. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, pp. 147–156. ACM (2016)
32.
go back to reference Zheng, G., Callan, J.: Learning to reweight terms with distributed representations. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 575–584. ACM (2015) Zheng, G., Callan, J.: Learning to reweight terms with distributed representations. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 575–584. ACM (2015)
Metadata
Title
Improving Arabic Microblog Retrieval with Distributed Representations
Authors
Shahad Alshalan
Raghad Alshalan
Hend Al-Khalifa
Reem Suwaileh
Tamer Elsayed
Copyright Year
2020
DOI
https://doi.org/10.1007/978-3-030-42835-8_16