Skip to main content

2018 | OriginalPaper | Buchkapitel

A Deep Learning Approach for Scientific Paper Semantic Ranking

verfasst von : Francesco Gargiulo, Stefano Silvestri, Mariarosaria Fontanella, Mario Ciampi, Giuseppe De Pietro

Erschienen in: Intelligent Interactive Multimedia Systems and Services 2017

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper we proposed a novel Deep Learning approach to realize a Word Embeddings (WEs) similarity based search tool, considering the medical literature as case study. Using the compositional properties of the WEs we defined a methodology to aggregate the information coming from each word to obtain a vector corresponding to the abstracts of each PubMed article. Through this paradigm it is possible to capture the semantic content of the papers and, consequently, to evaluate and rank the similarity among them. The preliminary results with the proposed approach are obtained analysing a subset of the whole the PubMed collection. The results correctness has been verified by human domain experts, showing that the methodology is promising.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Alicante, A., Corazza, A., Isgrò, F., Silvestri, S.: Semantic cluster labeling for medical relations. Innov. Med. Healthcare 2016(60), 183–193 (2016) Alicante, A., Corazza, A., Isgrò, F., Silvestri, S.: Semantic cluster labeling for medical relations. Innov. Med. Healthcare 2016(60), 183–193 (2016)
2.
Zurück zum Zitat Amato, F., Gargiulo, F., Mazzeo, A., Romano, S., Sansone, C.: Combining syntactic and semantic vector space models in the health domain by using a clustering ensemble. In: Proceedings of the International Conference on Health Informatics, pp. 382–385 (2013) Amato, F., Gargiulo, F., Mazzeo, A., Romano, S., Sansone, C.: Combining syntactic and semantic vector space models in the health domain by using a clustering ensemble. In: Proceedings of the International Conference on Health Informatics, pp. 382–385 (2013)
3.
Zurück zum Zitat Beel, J., Gipp, B.: Google scholar’s ranking algorithm: an introductory overview. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 1, pp. 230–241 (2009) Beel, J., Gipp, B.: Google scholar’s ranking algorithm: an introductory overview. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 1, pp. 230–241 (2009)
4.
Zurück zum Zitat Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: the Konstanz information miner. In: Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer (2007) Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: the Konstanz information miner. In: Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer (2007)
6.
Zurück zum Zitat Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. Procedia Comput. Sci. 48, 45–50 (2015)CrossRef Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. Procedia Comput. Sci. 48, 45–50 (2015)CrossRef
7.
Zurück zum Zitat Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. J. Mach. Learn. Res. 13, 307–361 (2012)MathSciNetMATH Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. J. Mach. Learn. Res. 13, 307–361 (2012)MathSciNetMATH
8.
Zurück zum Zitat Huang, W., Wu, Z., Chen, L., Mitra, P., Giles, C.L.: A neural probabilistic model for context based citation recommendation. In: AAAI, pp. 2404–2410 (2015) Huang, W., Wu, Z., Chen, L., Mitra, P., Giles, C.L.: A neural probabilistic model for context based citation recommendation. In: AAAI, pp. 2404–2410 (2015)
9.
Zurück zum Zitat Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1411–1420. ACM (2015) Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1411–1420. ACM (2015)
10.
Zurück zum Zitat Krebs, A., Paperno, D.: When hyperparameters help: beneficial parameter combinations in distributional semantic models. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM 2016), pp. 97–101 (2016) Krebs, A., Paperno, D.: When hyperparameters help: beneficial parameter combinations in distributional semantic models. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM 2016), pp. 97–101 (2016)
11.
Zurück zum Zitat Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q., et al.: From word embeddings to document distances. ICML 15, 957–966 (2015) Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q., et al.: From word embeddings to document distances. ICML 15, 957–966 (2015)
12.
Zurück zum Zitat Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. ICML 14, 1188–1196 (2014) Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. ICML 14, 1188–1196 (2014)
13.
Zurück zum Zitat Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 14th International Conference on Cognitive Informatics and Cognitive Computing, pp. 136–140. IEEE (2015) Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 14th International Conference on Cognitive Informatics and Cognitive Computing, pp. 136–140. IEEE (2015)
14.
Zurück zum Zitat Ma, W., Suel, T.: Structural sentence similarity estimation for short texts. In: FLAIRS Conference, pp. 232–237 (2016) Ma, W., Suel, T.: Structural sentence similarity estimation for short texts. In: FLAIRS Conference, pp. 232–237 (2016)
15.
Zurück zum Zitat Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014) Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)
16.
Zurück zum Zitat Martín, G.H., Schockaert, S., Cornelis, C., Naessens, H.: Using semi-structured data for assessing research paper similarity. Inf. Sci. 221, 245–261 (2013)CrossRefMATH Martín, G.H., Schockaert, S., Cornelis, C., Naessens, H.: Using semi-structured data for assessing research paper similarity. Inf. Sci. 221, 245–261 (2013)CrossRefMATH
17.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
18.
Zurück zum Zitat Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of 27th Annual Conference on Neural Information Processing Systems 2013, pp. 3111–3119 (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of 27th Annual Conference on Neural Information Processing Systems 2013, pp. 3111–3119 (2013)
19.
Zurück zum Zitat Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 83–84 (2016) Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 83–84 (2016)
20.
Zurück zum Zitat Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, May 2010 Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, May 2010
21.
Zurück zum Zitat Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: HLT-NAACL, pp. 977–983 (2015) Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: HLT-NAACL, pp. 977–983 (2015)
22.
Zurück zum Zitat Sayers, E., Miller, V.: Entrez programming utilities help [internet]. The E-utilities in-depth: parameters, syntax and more (2014) Sayers, E., Miller, V.: Entrez programming utilities help [internet]. The E-utilities in-depth: parameters, syntax and more (2014)
23.
Zurück zum Zitat Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382. ACM (2015) Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382. ACM (2015)
24.
Zurück zum Zitat Song, Y., Roth, D.: Unsupervised sparse vector densification for short text similarity. In: HLT-NAACL, pp. 1275–1280 (2015) Song, Y., Roth, D.: Unsupervised sparse vector densification for short text similarity. In: HLT-NAACL, pp. 1275–1280 (2015)
25.
Zurück zum Zitat Xing, C., Wang, D., Zhang, X., Liu, C.: Document classification with distributions of word vectors. In: Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 1–5. IEEE (2014) Xing, C., Wang, D., Zhang, X., Liu, C.: Document classification with distributions of word vectors. In: Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 1–5. IEEE (2014)
Metadaten
Titel
A Deep Learning Approach for Scientific Paper Semantic Ranking
verfasst von
Francesco Gargiulo
Stefano Silvestri
Mariarosaria Fontanella
Mario Ciampi
Giuseppe De Pietro
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-59480-4_47

Premium Partner