Skip to main content
Top

2023 | OriginalPaper | Chapter

8. NewsDeps: Visualizing the Origin of Information in News Articles

Authors : Felix Hamborg, Philipp Meschenmoser, Moritz Schubotz, Philipp Scharpf, Bela Gipp

Published in: Truth and Fake in the Post-Factual Digital Age

Publisher: Springer Fachmedien Wiesbaden

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In scientific publications, citations allow readers to assess the authenticity of the presented information and verify it in the original context. News articles, however, for various reasons do not contain citations and only rarely refer readers to further sources. As a result, readers often cannot assess the authenticity of the presented information as its origin is unclear. In times of “fake news,” echo chambers, and centralization of media ownership, the lack of transparency regarding origin, trustworthiness, and authenticity has become a pressing societal issue. We present NewsDeps, the first approach that analyzes and visualizes where information in news articles stems from. NewsDeps employs methods from natural language processing and plagiarism detection to measure article similarity. We devise a temporal-force-directed graph that places articles as nodes chronologically. The graph connects articles by edges varying in width depending on the articles’ similarity. We demonstrate our approach in a case study with two real-world scenarios. We find that NewsDeps increases efficiency and transparency in news consumption by revealing which previously published articles are the primary sources of each given article.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Of course, in academic publishing additional means are implemented to increase authenticity, for example especially the peer-review process.
 
2
News-please currently accesses the raw archive provided by the Common Crawl project. To speed up the search process, we are planning to preprocess the archive and import extracted articles in a database. A first step towards this goal has been implemented in the POLUSA dataset (Gebhard and Hamborg 2020).
 
3
This phenomenon is typically studied as media bias by source selection in the social sciences.
 
Literature
go back to reference Agirre E, Banea C, Cer D et al (2016) SemEval-2016 Task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 497–511 Agirre E, Banea C, Cer D et al (2016) SemEval-2016 Task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 497–511
go back to reference Alzahrani S, Salim N (2010) Fuzzy semantic-based string similarity for extrinsic plagiarism detection: lab report for PAN at CLEF 2010. In: CEUR workshop proceedings, pp 1–8 Alzahrani S, Salim N (2010) Fuzzy semantic-based string similarity for extrinsic plagiarism detection: lab report for PAN at CLEF 2010. In: CEUR workshop proceedings, pp 1–8
go back to reference Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet Project. In: Proceedings of the 36th annual meeting on Association for Computational Linguistics, Stroudsburg, pp 86–90 Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet Project. In: Proceedings of the 36th annual meeting on Association for Computational Linguistics, Stroudsburg, pp 86–90
go back to reference Bao J, Lyon C, Lane PCR et al (2007) Comparing Different Text Similarity Methods. Tech report. University of Hertfordshire Bao J, Lyon C, Lane PCR et al (2007) Comparing Different Text Similarity Methods. Tech report. University of Hertfordshire
go back to reference Christian D, Froke P, Jacobsen S, Minthorn D (2014) The Associated Press stylebook and briefing on media law. The Associated Press Christian D, Froke P, Jacobsen S, Minthorn D (2014) The Associated Press stylebook and briefing on media law. The Associated Press
go back to reference Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep Bidirectional transformers for language understanding. arXiv Prepr arXiv:181004805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep Bidirectional transformers for language understanding. arXiv Prepr arXiv:181004805
go back to reference Elhadi M, Al-Tobi A (2009) Duplicate detection in documents and webpages using improved longest common subsequence and documents syntactical structures. In: 2009 4th international conference on computer sciences and convergence information technology. IEEE, Seoul Elhadi M, Al-Tobi A (2009) Duplicate detection in documents and webpages using improved longest common subsequence and documents syntactical structures. In: 2009 4th international conference on computer sciences and convergence information technology. IEEE, Seoul
go back to reference Ferrero J, Agnes F, Besacier L, Schwab D (2017) Using Word Embedding for Cross-Language Plagiarism Detection. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, pp 415–421 Ferrero J, Agnes F, Besacier L, Schwab D (2017) Using Word Embedding for Cross-Language Plagiarism Detection. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, pp 415–421
go back to reference Gebhard L, Hamborg F (2020) The POLUSA dataset: 0.9M political news articles balanced by time and outlet popularity. In: Proceedings of the ACM/IEEE joint conference on digital libraries (JCDL), Virtual Event, pp 1–2 Gebhard L, Hamborg F (2020) The POLUSA dataset: 0.9M political news articles balanced by time and outlet popularity. In: Proceedings of the ACM/IEEE joint conference on digital libraries (JCDL), Virtual Event, pp 1–2
go back to reference Hamborg F (2020) Media bias, the social sciences, and NLP: automating frame analyses to identify bias by word choice and labeling. In: Proceedings of the 58th annual meeting of the association for computational linguistics: student research workshop. Association for Computational Linguistics, Stroudsburg, pp 79–87CrossRef Hamborg F (2020) Media bias, the social sciences, and NLP: automating frame analyses to identify bias by word choice and labeling. In: Proceedings of the 58th annual meeting of the association for computational linguistics: student research workshop. Association for Computational Linguistics, Stroudsburg, pp 79–87CrossRef
go back to reference Hamborg F, Meuschke N, Breitinger C, Gipp B (2017) News-please: a Generic News Crawler and Extractor. In: Proceedings of the 15th international symposium of information science. Verlag Werner Hülsbusch, pp 218–223 Hamborg F, Meuschke N, Breitinger C, Gipp B (2017) News-please: a Generic News Crawler and Extractor. In: Proceedings of the 15th international symposium of information science. Verlag Werner Hülsbusch, pp 218–223
go back to reference Hamborg F, Lachnit S, Schubotz M et al (2018a) Giveme5W: main event retrieval from news articles by extraction of the five journalistic W questions. In: Proceedings of the iConference 2018. Sheffield Hamborg F, Lachnit S, Schubotz M et al (2018a) Giveme5W: main event retrieval from news articles by extraction of the five journalistic W questions. In: Proceedings of the iConference 2018. Sheffield
go back to reference Hamborg F, Breitinger C, Gipp B (2019a) Giveme5W1H: a universal system for extracting main events from news articles. In: Proceedings of the 13th ACM conference on recommender systems, 7th international workshop on news recommendation and analytics (INRA 2019). Copenhagen Hamborg F, Breitinger C, Gipp B (2019a) Giveme5W1H: a universal system for extracting main events from news articles. In: Proceedings of the 13th ACM conference on recommender systems, 7th international workshop on news recommendation and analytics (INRA 2019). Copenhagen
go back to reference Kent CK, Salim N (2010) Web based cross language plagiarism detection. In: 2010 second international conference on Computational Intelligence, Modelling and Simulation (CIMSiM), pp 199–204CrossRef Kent CK, Salim N (2010) Web based cross language plagiarism detection. In: 2010 second international conference on Computational Intelligence, Modelling and Simulation (CIMSiM), pp 199–204CrossRef
go back to reference Kienreich W, Granitzer M, Sabol V, Klieber W (2006) Plagiarism Detection in Large Sets of Press Agency News Articles. In: 17th International Workshop on Database and Expert Systems Applications 2006. DEXA ’06 Kienreich W, Granitzer M, Sabol V, Klieber W (2006) Plagiarism Detection in Large Sets of Press Agency News Articles. In: 17th International Workshop on Database and Expert Systems Applications 2006. DEXA ’06
go back to reference Kim JW, Candan KS, Tatemura J (2009) Efficient overlap and content reuse detection in blogs and online news articles. In: Proceedings of the 18th international conference on World wide web, pp 81–90CrossRef Kim JW, Candan KS, Tatemura J (2009) Efficient overlap and content reuse detection in blogs and online news articles. In: Proceedings of the 18th international conference on World wide web, pp 81–90CrossRef
go back to reference Moreau E, Jayapal A, Lynch G, Vogel C (2015) Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners. In: CEUR Workshop Proceedings Moreau E, Jayapal A, Lynch G, Vogel C (2015) Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners. In: CEUR Workshop Proceedings
go back to reference Osman AH, Salim N, Binwahlan MS et al (2012a) Plagiarism detection scheme based on Semantic Role Labeling. In: International conference on information retrieval & knowledge management (CAMP). IEEE, Kuala Lumpur, pp 30–33 Osman AH, Salim N, Binwahlan MS et al (2012a) Plagiarism detection scheme based on Semantic Role Labeling. In: International conference on information retrieval & knowledge management (CAMP). IEEE, Kuala Lumpur, pp 30–33
go back to reference Rychalska B, Pakulska K, Chodorowska K et al (2016) Samsung Poland NLP Team at SemEval-2016 Task 1: necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 602–608 Rychalska B, Pakulska K, Chodorowska K et al (2016) Samsung Poland NLP Team at SemEval-2016 Task 1: necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 602–608
go back to reference Ryu C-K, Kim H-J, Cho H-G (2009) A detecting and tracing algorithm for unauthorized internet-news plagiarism using spatio-temporal document evolution model. In: Proceedings of the 2009 ACM symposium on applied computing, pp 863–868CrossRef Ryu C-K, Kim H-J, Cho H-G (2009) A detecting and tracing algorithm for unauthorized internet-news plagiarism using spatio-temporal document evolution model. In: Proceedings of the 2009 ACM symposium on applied computing, pp 863–868CrossRef
go back to reference Scheufele DA (2000) Agenda-setting, priming, and framing revisited: another look at cognitive effects of political communication. Mass Commun Soc 3:297–316CrossRef Scheufele DA (2000) Agenda-setting, priming, and framing revisited: another look at cognitive effects of political communication. Mass Commun Soc 3:297–316CrossRef
go back to reference Schuler KK (2005) VerbNet: a broad-coverage, comprehensive verb lexicon. University of Pennsylvania Schuler KK (2005) VerbNet: a broad-coverage, comprehensive verb lexicon. University of Pennsylvania
go back to reference Sharma S, Kumar R, Bhadana P, Gupta S (2013) News event extraction using 5W1H approach & its analysis. Int J Sci Eng Res 4:2064–2068 Sharma S, Kumar R, Bhadana P, Gupta S (2013) News event extraction using 5W1H approach & its analysis. Int J Sci Eng Res 4:2064–2068
go back to reference Thompson V, Bowerman C (2017) Detecting cross-lingual plagiarism using simulated word embeddings. CoRR abs/1712.1 Thompson V, Bowerman C (2017) Detecting cross-lingual plagiarism using simulated word embeddings. CoRR abs/1712.1
go back to reference Tsatsaronis G, Varlamis I, Giannakoulopoulos A, Kanellopoulos N (2010) Identifying free text plagiarism based on semantic similarity. In: Proceedings of the 4th International Plagiarism Conference. Citeseer, Newcastle upon Tyne Tsatsaronis G, Varlamis I, Giannakoulopoulos A, Kanellopoulos N (2010) Identifying free text plagiarism based on semantic similarity. In: Proceedings of the 4th International Plagiarism Conference. Citeseer, Newcastle upon Tyne
go back to reference Uzuner O, Katz B (2005) Capturing expression using linguistic information. In: Proceedings of the 20th national conference on Artificial intelligence. AAAI Press, Pittsburgh, pp 1124–1129 Uzuner O, Katz B (2005) Capturing expression using linguistic information. In: Proceedings of the 20th national conference on Artificial intelligence. AAAI Press, Pittsburgh, pp 1124–1129
go back to reference Vu HH, Villaneau J, Saïd F, Marteau PF (2014) Sentence similarity by combining explicit semantic analysis and overlapping n-grams. In: Sojka P et al (eds) Text, speech and dialogue. Lecture notes in computer science, vol 8655. Springer, pp 201–208CrossRef Vu HH, Villaneau J, Saïd F, Marteau PF (2014) Sentence similarity by combining explicit semantic analysis and overlapping n-grams. In: Sojka P et al (eds) Text, speech and dialogue. Lecture notes in computer science, vol 8655. Springer, pp 201–208CrossRef
go back to reference Weber-Wulff D (2010) Test cases for plagiarism detection software. In: Proceedings of the 4th international plagiarism conference Weber-Wulff D (2010) Test cases for plagiarism detection software. In: Proceedings of the 4th international plagiarism conference
go back to reference Yang Z, Dai Z, Yang Y et al (2019) XLNet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Proces Syst 23:5753–5763 Yang Z, Dai Z, Yang Y et al (2019) XLNet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Proces Syst 23:5753–5763
Metadata
Title
NewsDeps: Visualizing the Origin of Information in News Articles
Authors
Felix Hamborg
Philipp Meschenmoser
Moritz Schubotz
Philipp Scharpf
Bela Gipp
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-658-40406-2_8