Skip to main content

2021 | OriginalPaper | Buchkapitel

8. NewsDeps: Visualizing the Origin of Information in News Articles

verfasst von : Felix Hamborg, Philipp Meschenmoser, Moritz Schubotz, Philipp Scharpf, Bela Gipp

Erschienen in: Wahrheit und Fake im postfaktisch-digitalen Zeitalter

Verlag: Springer Fachmedien Wiesbaden

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In scientific publications, citations allow readers to assess the authenticity of the presented information and verify it in the original context. News articles, however, for various reasons do not contain citations and only rarely refer readers to further sources. As a result, readers often cannot assess the authenticity of the presented information as its origin is unclear. In times of “fake news,” echo chambers, and centralization of media ownership, the lack of transparency regarding origin, trustworthiness, and authenticity has become a pressing societal issue. We present NewsDeps, the first approach that analyzes and visualizes where information in news articles stems from. NewsDeps employs methods from natural language processing and plagiarism detection to measure article similarity. We devise a temporal-force-directed graph that places articles as nodes chronologically. The graph connects articles by edges varying in width depending on the articles’ similarity. We demonstrate our approach in a case study with two real-world scenarios. We find that NewsDeps increases efficiency and transparency in news consumption by revealing which previously published articles are the primary sources of each given article.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Of course, in academic publishing additional means are implemented to increase authenticity, for example especially the peer-review process.
 
2
News-please currently accesses the raw archive provided by the Common Crawl project. To speed up the search process, we are planning to preprocess the archive and import extracted articles in a database. A first step towards this goal has been implemented in the POLUSA dataset (Gebhard and Hamborg 2020).
 
3
This phenomenon is typically studied as media bias by source selection in the social sciences.
 
Literatur
Zurück zum Zitat Agirre E, Banea C, Cer D et al (2016) SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th international workshop on Semantic Evaluation (SemEval-2016), pp. 497–511 Agirre E, Banea C, Cer D et al (2016) SemEval-2016 task 1: semantic textual similarity, monolingual and cross-lingual evaluation. In: Proceedings of the 10th international workshop on Semantic Evaluation (SemEval-2016), pp. 497–511
Zurück zum Zitat Alzahrani S, Salim N (2010) Fuzzy semantic-based string similarity for extrinsic plagiarism detection: lab report for PAN at CLEF 2010. In: CEUR Workshop Proceedings, pp. 1–8 Alzahrani S, Salim N (2010) Fuzzy semantic-based string similarity for extrinsic plagiarism detection: lab report for PAN at CLEF 2010. In: CEUR Workshop Proceedings, pp. 1–8
Zurück zum Zitat Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceedings of the 36th annual meeting on Association for Computational Linguistics. Stroudsburg, PA, USA, pp 86–90 Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley FrameNet project. In: Proceedings of the 36th annual meeting on Association for Computational Linguistics. Stroudsburg, PA, USA, pp 86–90
Zurück zum Zitat Bao J, Lyon C, Lane PCR et al (2007) Comparing different text similarity methods. Tech report, University of Hertfordshire Bao J, Lyon C, Lane PCR et al (2007) Comparing different text similarity methods. Tech report, University of Hertfordshire
Zurück zum Zitat Christian D, Froke P, Jacobsen S, Minthorn D (2014) The Associated Press stylebook and briefing on media law. The Associated Press, Christian D, Froke P, Jacobsen S, Minthorn D (2014) The Associated Press stylebook and briefing on media law. The Associated Press,
Zurück zum Zitat Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv Prepr arXiv 181004805 Devlin J, Chang M-W, Lee K, Toutanova K (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv Prepr arXiv 181004805
Zurück zum Zitat Elhadi M, Al-Tobi A (2009) Duplicate detection in documents and webpages using improved longest common subsequence and documents syntactical structures. In: 2009 4th international conference on Computer Sciences and Convergence Information Technology. IEEE, Seoul, South Korea Elhadi M, Al-Tobi A (2009) Duplicate detection in documents and webpages using improved longest common subsequence and documents syntactical structures. In: 2009 4th international conference on Computer Sciences and Convergence Information Technology. IEEE, Seoul, South Korea
Zurück zum Zitat Ferrero J, Agnes F, Besacier L, Schwab D (2017) Using word embedding for cross-language plagiarism detection. In: Proceedings of the 15th conference of the European Chapter of the Association for Computational Linguistics, pp 415–421 Ferrero J, Agnes F, Besacier L, Schwab D (2017) Using word embedding for cross-language plagiarism detection. In: Proceedings of the 15th conference of the European Chapter of the Association for Computational Linguistics, pp 415–421
Zurück zum Zitat Gebhard L, Hamborg F (2020) The POLUSA dataset: 0.9M political news articles balanced by time and outlet popularity. In: Proceedings of the ACM/IEEE joint conference on Digital Libraries (JCDL). Virtual Event, CN, pp 1–2 Gebhard L, Hamborg F (2020) The POLUSA dataset: 0.9M political news articles balanced by time and outlet popularity. In: Proceedings of the ACM/IEEE joint conference on Digital Libraries (JCDL). Virtual Event, CN, pp 1–2
Zurück zum Zitat Gipp B (2014) Citation-based plagiarism detection. Springer Vieweg, WiesbadenCrossRef Gipp B (2014) Citation-based plagiarism detection. Springer Vieweg, WiesbadenCrossRef
Zurück zum Zitat Hamborg F (2020) Media bias, the social sciences, and NLP: automating frame analyses to identify bias by word choice and labeling. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 79–87 Hamborg F (2020) Media bias, the social sciences, and NLP: automating frame analyses to identify bias by word choice and labeling. In: Proceedings of the 58th annual meeting of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 79–87
Zurück zum Zitat Hamborg F, Meuschke N, Breitinger C, Gipp B (2017) News-please: a generic news crawler and extractor. In: Proceedings of the 15th international symposium of Information Science. Verlag Werner Hülsbusch, pp 218–223 Hamborg F, Meuschke N, Breitinger C, Gipp B (2017) News-please: a generic news crawler and extractor. In: Proceedings of the 15th international symposium of Information Science. Verlag Werner Hülsbusch, pp 218–223
Zurück zum Zitat Hamborg F, Lachnit S, Schubotz M et al (2018a) Giveme5W: main event retrieval from news articles by extraction of the five journalistic W questions. In: Proceedings of the iConference 2018. Sheffield, UK Hamborg F, Lachnit S, Schubotz M et al (2018a) Giveme5W: main event retrieval from news articles by extraction of the five journalistic W questions. In: Proceedings of the iConference 2018. Sheffield, UK
Zurück zum Zitat Hamborg F, Breitinger C, Gipp B (2019a) Giveme5W1H: a universal system for extracting main events from news articles. In: Proceedings of the 13th ACM conference on Recommender Systems, 7th International Workshop on News Recommendation and Analytics (INRA 2019). Copenhagen, Denmark Hamborg F, Breitinger C, Gipp B (2019a) Giveme5W1H: a universal system for extracting main events from news articles. In: Proceedings of the 13th ACM conference on Recommender Systems, 7th International Workshop on News Recommendation and Analytics (INRA 2019). Copenhagen, Denmark
Zurück zum Zitat Kent CK, Salim N (2010) Web based cross language plagiarism detection. In: 2010 second international conference on Computational Intelligence, Modelling and Simulation (CIMSiM), pp 199–204 Kent CK, Salim N (2010) Web based cross language plagiarism detection. In: 2010 second international conference on Computational Intelligence, Modelling and Simulation (CIMSiM), pp 199–204
Zurück zum Zitat Kienreich W, Granitzer M, Sabol V, Klieber W (2006) Plagiarism detection in large sets of press agency news articles. In: 17th international workshop on Database and Expert Systems Applications 2006. DEXA ’06 Kienreich W, Granitzer M, Sabol V, Klieber W (2006) Plagiarism detection in large sets of press agency news articles. In: 17th international workshop on Database and Expert Systems Applications 2006. DEXA ’06
Zurück zum Zitat Kim JW, Candan KS, Tatemura J (2009) Efficient overlap and content reuse detection in blogs and online news articles. In: Proceedings of the 18th international conference on World wide web, pp 81–90 Kim JW, Candan KS, Tatemura J (2009) Efficient overlap and content reuse detection in blogs and online news articles. In: Proceedings of the 18th international conference on World wide web, pp 81–90
Zurück zum Zitat Moreau E, Jayapal A, Lynch G, Vogel C (2015) Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners. In: CEUR Workshop Proceedings Moreau E, Jayapal A, Lynch G, Vogel C (2015) Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners. In: CEUR Workshop Proceedings
Zurück zum Zitat Osman AH, Salim N, Binwahlan MS et al (2012a) Plagiarism detection scheme based on semantic role labeling. In: International conference on Information Retrieval & Knowledge Management (CAMP). IEEE, Kuala Lumpur, Malaysia, pp 30–33 Osman AH, Salim N, Binwahlan MS et al (2012a) Plagiarism detection scheme based on semantic role labeling. In: International conference on Information Retrieval & Knowledge Management (CAMP). IEEE, Kuala Lumpur, Malaysia, pp 30–33
Zurück zum Zitat Rychalska B, Pakulska K, Chodorowska K et al (2016) Samsung Poland NLP team at SemEval-2016 task 1: necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In: Proceedings of the 10th international workshop on Semantic Evaluation (SemEval-2016), pp 602–608 Rychalska B, Pakulska K, Chodorowska K et al (2016) Samsung Poland NLP team at SemEval-2016 task 1: necessity for diversity; combining recursive autoencoders, WordNet and ensemble methods to measure semantic similarity. In: Proceedings of the 10th international workshop on Semantic Evaluation (SemEval-2016), pp 602–608
Zurück zum Zitat Ryu C-K, Kim H-J, Cho H-G (2009) A detecting and tracing algorithm for unauthorized internet-news plagiarism using spatio-temporal document evolution model. In: Proceedings of the 2009 ACM symposium on Applied Computing, pp 863–868 Ryu C-K, Kim H-J, Cho H-G (2009) A detecting and tracing algorithm for unauthorized internet-news plagiarism using spatio-temporal document evolution model. In: Proceedings of the 2009 ACM symposium on Applied Computing, pp 863–868
Zurück zum Zitat Scheufele DA (2000) Agenda-setting, priming, and framing revisited: another look at cognitive effects of political communication. Mass Commun Soc 3:297–316CrossRef Scheufele DA (2000) Agenda-setting, priming, and framing revisited: another look at cognitive effects of political communication. Mass Commun Soc 3:297–316CrossRef
Zurück zum Zitat Schuler KK (2005) VerbNet: a broad-coverage, comprehensive verb Lexicon. University of Pennsylvania Schuler KK (2005) VerbNet: a broad-coverage, comprehensive verb Lexicon. University of Pennsylvania
Zurück zum Zitat Sharma S, Kumar R, Bhadana P, Gupta S (2013) News event extraction using 5W1H approach & its analysis. Int J Sci Eng Res 4:2064–2068 Sharma S, Kumar R, Bhadana P, Gupta S (2013) News event extraction using 5W1H approach & its analysis. Int J Sci Eng Res 4:2064–2068
Zurück zum Zitat Thompson V, Bowerman C (2017) Detecting cross-lingual plagiarism using simulated word embeddings. CoRR abs/1712.1 Thompson V, Bowerman C (2017) Detecting cross-lingual plagiarism using simulated word embeddings. CoRR abs/1712.1
Zurück zum Zitat Tsatsaronis G, Varlamis I, Giannakoulopoulos A, Kanellopoulos N (2010) Identifying free text plagiarism based on semantic similarity. In: Proceedings of the 4th international plagiarism conference. Citeseer, Newcastle upon Tyne, UK Tsatsaronis G, Varlamis I, Giannakoulopoulos A, Kanellopoulos N (2010) Identifying free text plagiarism based on semantic similarity. In: Proceedings of the 4th international plagiarism conference. Citeseer, Newcastle upon Tyne, UK
Zurück zum Zitat Uzuner O, Katz B (2005) Capturing expression using linguistic information. In: Proceedings of the 20th national conference on Artificial intelligence. AAAI Press, Pittsburgh, Pennsylvania, USA, pp 1124–1129 Uzuner O, Katz B (2005) Capturing expression using linguistic information. In: Proceedings of the 20th national conference on Artificial intelligence. AAAI Press, Pittsburgh, Pennsylvania, USA, pp 1124–1129
Zurück zum Zitat Vu HH, Villaneau J, Saïd F, Marteau PF (2014) Sentence similarity by combining explicit semantic analysis and overlapping n-grams. In: Sojka P et al (Hrsg) Text, speech and dialogue, Lecture notes in Computer Science, Vol. 8655. Springer, S 201–208CrossRef Vu HH, Villaneau J, Saïd F, Marteau PF (2014) Sentence similarity by combining explicit semantic analysis and overlapping n-grams. In: Sojka P et al (Hrsg) Text, speech and dialogue, Lecture notes in Computer Science, Vol. 8655. Springer, S 201–208CrossRef
Zurück zum Zitat Weber-Wulff D (2010) Test cases for plagiarism detection software. In: Proceedings of the 4th International Plagiarism Conference Weber-Wulff D (2010) Test cases for plagiarism detection software. In: Proceedings of the 4th International Plagiarism Conference
Zurück zum Zitat Yang Z, Dai Z, Yang Y et al (2019) XLNet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 23:5753–5763 Yang Z, Dai Z, Yang Y et al (2019) XLNet: generalized autoregressive pretraining for language understanding. Adv Neural Inf Process Syst 23:5753–5763
Metadaten
Titel
NewsDeps: Visualizing the Origin of Information in News Articles
verfasst von
Felix Hamborg
Philipp Meschenmoser
Moritz Schubotz
Philipp Scharpf
Bela Gipp
Copyright-Jahr
2021
DOI
https://doi.org/10.1007/978-3-658-32957-0_8

Neuer Inhalt