nach oben

International Journal on Digital Libraries

Erschienen in:

28.10.2018

Assessing plausibility of scientific claims to support high-quality content in digital collections

verfasst von: José María González Pinto, Wolf-Tilo Balke

Erschienen in: International Journal on Digital Libraries | Ausgabe 1/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper presents a formalization and extension of a novel approach to support high-quality content in digital libraries. Building on the concept of plausibility used in cognitive sciences, we aim at judging the plausibility of new scientific papers in light of prior knowledge. In particular, our work proposes a novel assessment of scientific papers to qualitatively support the work of reviewers. To do this, our approach focuses on the key aspect of scientific papers: claims. Claims are sentences found in empirical scientific papers that state statistical associations between entities and correspond to the core contributions of the papers. We can find these types of claims, for instance, in medicine, chemistry, and biology, where the consumption of a drug, a substance, or a product causes an effect on some other type of entity such as a disease, or another drug or substance. To operationalize the notion of plausibility, we promote claims as first-class citizens for scientific digital libraries and exploit state-of-the-art neural embedding representations of text and topic models. As a proof of concept of the potential usefulness of this notion of plausibility, we study and report extensive experiments on documents with scientific papers from the PubMed digital library.

Vorheriger Artikel Towards extracting event-centric collections from Web archives

Nächster Artikel A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

PubMed comprises more than 28 million citations for biomedical literature from MEDLINE, life science journals, and online books.

More information about UMLS in https://www.nlm.nih.gov/research/umls/.

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Kaiser, L., Kudlur, M., Levenberg, J., Man, D., Monga, R., Moore, S., Murray, D., Shlens, J., Steiner, B., Sutskever, I., Tucker, P., Vanhoucke, V., Vasudevan, V., Vinyals, O., Warden, P., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467v2 p. 19 (2015). URLhttp://download.tensorflow.org/paper/whitepaper2015.pdf

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations, pp. 1–15 (2015). https://doi.org/10.1146/annurev.neuro.26.041002.131047 CrossRef

Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003). https://doi.org/10.1162/153244303322533223 CrossRefMATH

Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994). https://doi.org/10.1109/72.279181 CrossRef

Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont (1997)

Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77 (2012)CrossRef

Blei, D.M., Lafferty, J.D.: Topic models. In: Srivastava AN, Sahami M (eds) Text Mining: Classification, Clustering, and Applications, chap. 4. Data Mining and Knowledge Discovery Series, Chapman & Hall/CRC, pp. 71–89 (2009). https://doi.org/10.1145/1143844.1143859

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1162/jmlr.2003.3.4-5.993 CrossRefMATH

Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information 5, 135–146 (2016). DOI 1511.09249v1. arXiv:1607.04606

10.

Chollet, F.: Deep Learning with Python, 1st edn. Manning Publications, Shelter Island (2017)

11.

Chollet, F., others: Keras. (2015) https://github.com/keras-team/keras

12.

Ciccarese, P., Wu, E., Wong, G., Ocana, M., Kinoshita, J., Ruttenberg, A., Clark, T.: The SWAN biomedical discourse ontology. J. Biomed. Inform. 41(5), 739–751 (2008). https://doi.org/10.1016/j.jbi.2008.04.010 CrossRef

13.

Connell, L., Keane, M.T.: A model of plausibility. Cognit. Sci. 30(1), 95–120 (2006). https://doi.org/10.1207/s15516709cog0000_53 CrossRef

14.

Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009). https://doi.org/10.1145/1538788.1538810 CrossRef

15.

González Pinto J.M.; Balke, W.T.: Can plausibility help to support high quality content in digital libraries? In: TPDL 2017 21st International Conference on Theory and Practice of Digital Libraries. Thessaloniki, Greece (2017)CrossRef

16.

Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 521(7553). MIT Press, Cambridge (2016). https://doi.org/10.1038/nmeth.3707 CrossRefMATH

17.

Graves, a., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 38th International Conference on Acoustics, Speech, and Signal Processing, pp. 6645 – 6649 (2013). https://doi.org/10.1109/ICASSP.2013.6638947

18.

Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey (2016). https://doi.org/10.1109/TNNLS.2016.2582924 MathSciNetCrossRef

19.

Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). https://doi.org/10.3233/ISU-2010-0613 CrossRef

20.

Groth, P., Loizou, A., Gray, A.J.G., Goble, C., Harland, L., Pettifer, S.: API-centric linked data integration: the open PHACTS discovery platform case study. J. Web Semant. 29, 12–18 (2014). https://doi.org/10.1016/j.websem.2014.03.003 CrossRef

21.

Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv:1207.0580

22.

Hochreiter, S., Urgen Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735 CrossRef

23.

Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 1398, 137–142 (1998). https://doi.org/10.1007/s13928716 CrossRef

24.

Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158–3160 (2012). https://doi.org/10.1093/bioinformatics/bts591 CrossRef

25.

Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP, pp. 1746–1751 (2014). https://doi.org/10.3115/v1/D14-1181. arXiv:1408.5882

26.

Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. Int. Conf. Learn. Represent. 2015, 1–15 (2015)

27.

Kristal, A.R., Till, C., Platz, E.A., Song, X., King, I.B., Neuhouser, M.L., Ambrosone, C.B., Thompson, I.M.: Serum lycopene concentration and prostate cancer risk: results from the prostate cancer prevention trial. Cancer Epidemiol. Biomark. Prev. 20(4), 638–646 (2011). https://doi.org/10.1158/1055-9965.EPI-10-1221 CrossRef

28.

Kuhn, T., Barbano, P.E., Nagy, M.L., Krauthammer, M.: Broadening the scope of nanopublications. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7882 LNCS, pp. 487–501 (2013). https://doi.org/10.1007/978-3-642-38288-8-33

29.

Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: Proceedings of The 32nd international conference on machine learning vol. 37, pp. 957–966 (2015)

30.

Le, Q., Mikolov, T.: Distributed representations of sentences and documents. International Conference on Machine Learning - ICML 2014, vol. 32, pp. 1188–1196 (2014). https://doi.org/10.1145/2740908.2742760

31.

Manning, C.D., Raghavan, P.: An introduction to information retrieval (2009). https://doi.org/10.1109/LPT.2009.2020494. URLhttp://dspace.cusat.ac.in/dspace/handle/123456789/2538

32.

Mikolov, T., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Nips pp. 1–9 (2013). https://doi.org/10.1162/jmlr.2003.3.4-5.951

33.

Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013) pp. 1–12 (2013). https://doi.org/10.1162/153244303322533223. arXiv:1301.3781v3.pdf

34.

Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT, June, pp. 746–751 (2013)

35.

Palangi, H., Deng, L., Shen, Y., Gao, J., He, X., Chen, J., Song, X., Ward, R.: Deep Sentence embedding using long short-term memory networks: analysis and application to information retrieval. IEEE/ACM Trans. Audio Speech and Language Process. 24(4), 694–707 (2016). https://doi.org/10.1109/TASLP.2016.2520371 CrossRef

36.

Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 460–467 (2009). https://doi.org/10.1109/ICCV.2009.5459199

37.

Peleteiro, B., Lopes, C., Figueiredo, C., Lunet, N.: Salt intake and gastric cancer risk according to Helicobacter pylori infection, smoking, tumour site and histological type. British Journal of Cancer 104(1), 198–207 (2011). https://doi.org/10.1038/sj.bjc.6605993. URLhttp://www.nature.com/doifinder/10.1038/sj.bjc.6605993 CrossRef

38.

Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162. URLhttp://aclweb.org/anthology/D14-1162

39.

Price, B.Y.S., Flach, P.A.: Computational support for academic peer review: a perspective from artificial intelligence. Commun. ACM 60(3), 70–79 (2017)CrossRef

40.

Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks pp. 45–50 (2010). https://doi.org/10.13140/2.1.2393.1847

41.

Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inform. 36(6), 462–477 (2003). https://doi.org/10.1016/j.jbi.2003.11.003 CrossRef

42.

Schoenfeld, J.D., Ioannidis, J.P.A.: Is everything we eat associated with cancer? A systematic cookbook review. Am. J. Clin. Nutr. 97(1), 127–134 (2013). https://doi.org/10.3945/ajcn.112.047142 CrossRef

43.

Toulmin, S.: The uses of argument. Ethics 70(1), vi, 264 (1958). https://doi.org/10.2307/2183556 CrossRef

44.

Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 384–394 (2010)

45.

Velterop, J.: Nanopublications: the future of coping with information overload. LOGOS: J. World Book Community 21, 3–4 (2010)CrossRef

46.

Verheij, B.: The toulmin argument model in artificial intelligence. In: Rahwan I (ed) Argumentation in Artificial Intelligence, pp. 219–238. Springer (2009). https://doi.org/10.1007/978-0-387-98197-0

47.

Wang, P., Xu, J., Xu, B., Liu, C.l., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings ACL 2015 pp. 352–357 (2015). https://doi.org/10.1016/j.neucom.2015.09.096 CrossRef

48.

Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the The 8th International Joint Conference on Natural Language Processing, pp. 253–263 (2017). arXiv:1510.03820

49.

Zhao, J., Stockwell, T., Roemer, A., Chikritzhs, T., Bostwick, Dea: Is alcohol consumption a risk factor for prostate cancer? A systematic review and metaanalysis. BMC Cancer 16(1), 845 (2016). https://doi.org/10.1186/s12885-016-2891-z CrossRef

Titel: Assessing plausibility of scientific claims to support high-quality content in digital collections
verfasst von: José María González Pinto
Wolf-Tilo Balke
Publikationsdatum: 28.10.2018
Verlag: Springer Berlin Heidelberg
Erschienen in: International Journal on Digital Libraries / Ausgabe 1/2020
Print ISSN: 1432-5012
Elektronische ISSN: 1432-1300
DOI: https://doi.org/10.1007/s00799-018-0256-8

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2020

Characterising online museum users: a study of the National Museums Liverpool museum website

Current research on theory and practice of digital libraries: best papers from TPDL 2017

Cultural heritage metadata aggregation using web technologies: IIIF, Sitemaps and Schema.org

Tracking the history and evolution of entities: entity-centric temporal analysis of large social media archives

A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information

Towards extracting event-centric collections from Web archives