Skip to main content
Top

2021 | OriginalPaper | Chapter

Using Document Embeddings for Background Linking of News Articles

Authors : Pavel Khloponin, Leila Kosseim

Published in: Natural Language Processing and Information Systems

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper describes our experiments in using document embeddings to provide background links to news articles. This work was done as part of the recent TREC 2020 News Track [26] whose goal is to provide a ranked list of related news articles from a large collection, given a query article. For our participation, we explored a variety of document embedding representations and proximity measures. Experiments with the 2018 and 2019 validation sets showed that GPT2 and XLNet embeddings lead to higher performances. In addition, regardless of the embedding, higher performances were reached when mean pooling, larger models and smaller token chunks are used. However, no embedding configuration alone led to a performance that matched the classic Okapi BM25 method. For our official TREC 2020 News Track submission, we therefore combined the BM25 model with an embedding method. The augmented model led to more diverse sets of related articles with minimal decrease in performance (nDCG@5 of 0.5873 versus 0.5924 with the vanilla BM25). This result is promising as diversity is a key factor used by journalists when providing background links and contextual information to news articles [27].

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Adomavicius, G., et al.: Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. Inf. Syst. 23(1), 103–145 (2005)CrossRef Adomavicius, G., et al.: Incorporating contextual information in recommender systems using a multidimensional approach. ACM Trans. Inf. Syst. 23(1), 103–145 (2005)CrossRef
2.
go back to reference Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Model. Meth. Appl. Sci. 1 (2007) Cha, S.H.: Comprehensive survey on distance/similarity measures between probability density functions. Int. J. Math. Model. Meth. Appl. Sci. 1 (2007)
3.
go back to reference Day, N., Worley, D., Allison, T.: OSC at TREC 2020 - news track’s background linking task. In: TREC [30] Day, N., Worley, D., Allison, T.: OSC at TREC 2020 - news track’s background linking task. In: TREC [30]
6.
go back to reference Essam, M., Elsayed, T.: bigIR at TREC 2019: Graph-based Analysis for News Background Linking. In: TREC [29] Essam, M., Elsayed, T.: bigIR at TREC 2019: Graph-based Analysis for News Background Linking. In: TREC [29]
7.
go back to reference Fabbri, A., Li, I., She, T., Li, S., Radev, D.: Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. In: Proceedings of the ACL, pp. 1074–1084. Florence, Italy, July 2019 Fabbri, A., Li, I., She, T., Li, S., Radev, D.: Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. In: Proceedings of the ACL, pp. 1074–1084. Florence, Italy, July 2019
8.
go back to reference Grusky, M., Naaman, M., Artzi, Y.: Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. In: Proceedings of NAACL/HLT, pp. 708–719. New Orleans, June 2018 Grusky, M., Naaman, M., Artzi, Y.: Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies. In: Proceedings of NAACL/HLT, pp. 708–719. New Orleans, June 2018
9.
go back to reference Järvelin, K., Kekäläinen, J.: Cumulated Gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)CrossRef Järvelin, K., Kekäläinen, J.: Cumulated Gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 422–446 (2002)CrossRef
10.
go back to reference Kashyapi, S., Chatterjee, S., Ramsdell, J., Dietz, L.: TREMA-UNH at TREC 2018: Complex Answer Retrieval and News Track. In: TREC [28] Kashyapi, S., Chatterjee, S., Ramsdell, J., Dietz, L.: TREMA-UNH at TREC 2018: Complex Answer Retrieval and News Track. In: TREC [28]
11.
go back to reference Khloponin, P., Kosseim, L.: The CLaC System at the TREC 2019 News Track. In: TREC [29] Khloponin, P., Kosseim, L.: The CLaC System at the TREC 2019 News Track. In: TREC [29]
12.
go back to reference Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th ACM SIGIR Conference, pp. 120–127. New York, NY (2001) Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proceedings of the 24th ACM SIGIR Conference, pp. 120–127. New York, NY (2001)
14.
go back to reference Lu, K., Fang, H.: Leveraging Entities in Background Document Retrieval for News Articles. In: TREC [29] Lu, K., Fang, H.: Leveraging Entities in Background Document Retrieval for News Articles. In: TREC [29]
15.
go back to reference Lu, M., et al.: Scalable news recommendation using multi-dimensional similarity and Jaccard-Kmeans clustering. J. Syst. Softw. 95, 242–251 (2014)CrossRef Lu, M., et al.: Scalable news recommendation using multi-dimensional similarity and Jaccard-Kmeans clustering. J. Syst. Softw. 95, 242–251 (2014)CrossRef
16.
go back to reference Ma, Y., et al.: News2vec: news network embedding with subnode information. In: Proceedigs of EMNLP/IJCNLP, pp. 4843–4852. Hong Kong, November 2019 Ma, Y., et al.: News2vec: news network embedding with subnode information. In: Proceedigs of EMNLP/IJCNLP, pp. 4843–4852. Hong Kong, November 2019
17.
go back to reference MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR. In: Proceedings of the 42nd International ACM SIGIR Conference, July 2019 MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR. In: Proceedings of the 42nd International ACM SIGIR Conference, July 2019
18.
go back to reference Naseri, S., Foley, J., Allan, J.: UMass at TREC 2018: CAR, Common Core and News Tracks. In: TREC [28] Naseri, S., Foley, J., Allan, J.: UMass at TREC 2018: CAR, Common Core and News Tracks. In: TREC [28]
19.
go back to reference Okura, S., et al.: Embedding-Based News Recommendation for Millions of Users. In: Proceedings of the 23rd ACM SIGKDD Conference, pp. 1933–1942. New York (2017) Okura, S., et al.: Embedding-Based News Recommendation for Millions of Users. In: Proceedings of the 23rd ACM SIGKDD Conference, pp. 1933–1942. New York (2017)
20.
go back to reference Qu, J., Wang, Y.: UNC SILS at TREC 2019 news track. In: TREC [29] Qu, J., Wang, Y.: UNC SILS at TREC 2019 news track. In: TREC [29]
23.
go back to reference Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of EMNLP/IJCNLP, pp. 3982–3992. Hong Kong, November 2019 Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proceedings of EMNLP/IJCNLP, pp. 3982–3992. Hong Kong, November 2019
24.
go back to reference Soboroff, I., Huang, S., Harman, D.: 2018 news track overview. In: TREC [28] Soboroff, I., Huang, S., Harman, D.: 2018 news track overview. In: TREC [28]
25.
go back to reference Soboroff, I., Huang, S., Harman, D.: 2019 news track overview. In: TREC [29] Soboroff, I., Huang, S., Harman, D.: 2019 news track overview. In: TREC [29]
26.
go back to reference Soboroff, I., Huang, S., Harman, D.: 2020 news track overview. In: TREC [30] Soboroff, I., Huang, S., Harman, D.: 2020 news track overview. In: TREC [30]
31.
go back to reference Yang, P., Lin, J.: Anserini at TREC 2018: CENTRE, Common Core, and News Tracks. In: TREC [28] Yang, P., Lin, J.: Anserini at TREC 2018: CENTRE, Common Core, and News Tracks. In: TREC [28]
32.
go back to reference Yang, Z., et al.: XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 1906.08237 (2020) Yang, Z., et al.: XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 1906.08237 (2020)
33.
go back to reference Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. arXiv 1912.08777 (2019) Zhang, J., Zhao, Y., Saleh, M., Liu, P.J.: PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. arXiv 1912.08777 (2019)
Metadata
Title
Using Document Embeddings for Background Linking of News Articles
Authors
Pavel Khloponin
Leila Kosseim
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-80599-9_28

Premium Partner