DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

Rybiński, Maciej; Aldana Montes, José Francisco

doi:10.1007/s10844-017-0442-y

DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

Published: 13 January 2017

Volume 49, pages 315–331, (2017)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

510 Accesses
2 Citations
Explore all metrics

Abstract

Being able to correctly model semantic relatedness between texts, and consequently the concepts represented by these texts, has become an important part of many intelligent information retrieval and knowledge processing systems. The need for such systems is especially evident within the biomedical domain, where the sheer amount of scientific publishing contributes to an information overflow. In this paper we present a novel method to approximate semantic relatedness in domain-focused settings. The approach is an extension to a well-known ESA (Explicit Semantic Analysis) method. Our extension successfully leverages the semantics of a domain-specific document corpus. We present the evaluation of the proposed method on a set of reference datasets, that are a de facto reference standard for the task of approximating biomedical semantic relatedness. The proposed method is evaluated in comparison with other state-of-the-art methods, as well as the baselines established with the original ESA method. The results of the experiments suggest that the proposed method combines the semantics of a general and domain-specific corpora to provide significant improvements over the original method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

tESA: a distributional measure for calculating semantic relatedness

Article Open access 28 December 2016

Domain-Specific Semantic Relatedness from Wikipedia Structure: A Case Study in Biomedical Text

Calculating semantic relatedness for biomedical use in a knowledge-poor environment

Article Open access 27 November 2014

Notes

http://lucene.apache.org/core/
We have evaluated the algorithm with the values of k between 1 and 15 and the method seems to work well within this range. In the evaluation presented here we only discuss results for k = 1 and k = 10 for illustrational purposes.
See https://seriousstats.wordpress.com/2012/02/05/comparing-correlations/ for discussion and code.

References

Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In Proceedings of the 16th conference on computational linguistics-volume 1, association for computational linguistics (pp. 16–22).
Asooja, N.A.K., Bordea, G., & Buitelaar, P. (2015). Non-orthogonal explicit semantic analysis. Lexical and Computational Semantics (* SEM 2015).
Barzilay, R., & Elhadad, M. (1997). Using lexical chains for text summarization. In Proceedings of the ACL workshop on intelligent scalable text summarization: July 1997; Madrid, Spain, Association for Computational Linguistics (pp. 10–17).
Dumais, S.T. (2004). Latent semantic analysis. Annual Review of Information Science and Technology, 38(1), 188–230.
Article Google Scholar
Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI, (Vol. 7 pp. 1606–1611).
Guo, X., Liu, R., Shriver, C.D., Hu, H., & Liebman, M.N. (2006). Assessing semantic similarity measures for the characterization of human regulatory pathways. Bioinformatics, 22(8), 967–973.
Article Google Scholar
Haralambous, Y., & Klyuev, V. (2013). Thematically reinforced explicit semantic analysis. International Journal of Computational Linguistics and Applications, 4(1), 79.
Google Scholar
Kusner, M.J., Sun, Y., Kolkin, N.I., & Weinberger, K.Q. (2015). From word embeddings to document distances. In Proceedings of the 32nd international conference on machine learning (ICML 2015) (pp. 957–966).
Liu, Y., McInnes, B.T., Pedersen, T., Melton-Meaux, G., & Pakhomov, S. (2012). Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, umls and wordnet. In Proceedings of the 2nd ACM SIGHIT international health informatics symposium, ACM (pp. 363–372).
Martinez-Gil, J. (2016). Accurate semantic similarity measurement of biomedical nomenclature by means of fuzzy logic. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 24(02), 291–305.
Article Google Scholar
Mathur, S., & Dinakarpandian, D. (2012). Finding disease similarity based on implicit semantic similarity. Journal of Biomedical Informatics, 45(2), 363–371.
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G.S., & Dean, J. (2013). Efficient estimation of word representations in vector space. In International Conference on Learning Representations. arXiv:1301.3781.
Muneeb, T., Sahu, S.K., & Anand, A. (2015). Evaluating distributed word representations for capturing semantics of biomedical concepts. In ACL-IJCNLP, (Vol. 2015 p. 158).
Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., & Melton, G.B. (2010). Semantic similarity and relatedness between clinical terms: an experimental study. In AMIA Annual symposium proceedings, american medical informatics association, (Vol. 2010 p. 572).
Pakhomov, S.V., Pedersen, T., McInnes, B., Melton, G.B., Ruggieri, A., & Chute, C.G. (2011). Towards a framework for developing semantic relatedness reference standards. Journal of Biomedical Informatics, 44(2), 251–265.
Article Google Scholar
Pedersen, T., Pakhomov, S.V.S., Patwardhan, S., & Chute, C.G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3), 288–299.
Article Google Scholar
Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global vectors for word representation. In EMNLP, (Vol. 14 pp. 1532–43).
Pesaranghader, A., Rezaei, A., & Pesaranghader, A. (2014). Adapting gloss vector semantic relatedness measure for semantic similarity estimation: an evaluation in the biomedical domain. In Semantic technology (pp. 129–145). New York: Springer.
Chapter Google Scholar
Pesquita, C., Faria, D., Falcao, A.O., Lord, P., & Couto, F.M. (2009). Semantic similarity in biomedical ontologies. PLoS Computational Biology, 5(7), e1000,443.
Article MathSciNet Google Scholar
Polajnar, T., Aggarwal, N., Asooja, K., & Buitelaar, P. (2013). Improving esa with document similarity. In Advances in information retrieval (pp. 582–593). New York: Springer.
Chapter Google Scholar
Potthast, M., Stein, B., & Anderka, M. (2008). A wikipedia-based multilingual retrieval model. In European conference on information retrieval, (pp. 522–530). Springer.
Google Scholar
Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1), 17–30.
Article Google Scholar
Rybiński, M., & Aldana-Montes, J.F. (2016). TESA: a distributional measure for calculating semantic relatedness. BMC Journal of Biomedical Semantics – accepted for publication.
Rybiński, M., del Mar Roldán-Garcıa, M., Garcıa-Nieto, J., & Aldana-Montes, J.F. (2016). Dismatch results for OAEI. In OM. http://disi.unitn.it/~pavel/om2016/papers/oaei16_paper5.pdf.
Sahay, S., & Ram, A. (2011). Socio-semantic health information access. In AAAI spring symposium: AI and health communication, AAAI.
Google Scholar
Sajadi, A., Milios, E.E., Kešelj, V., & Janssen, J.C. (2015). Domain-specific semantic relatedness from wikipedia structure: a case study in biomedical text. In International conference on intelligent text processing and computational linguistics, (pp. 347–360). Springer.
Google Scholar
Sánchez, D., & Batet, M. (2011). Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. Journal of Biomedical Informatics, 44(5), 749–759.
Article Google Scholar
Scholl, P., Böhnstedt, D, García, R.D., Rensing, C., & Steinmetz, R. (2010). Extended explicit semantic analysis for calculating semantic relatedness of web resources. In Sustaining TEL: from innovation to learning and practice (pp. 324–339). New York: Springer.
Chapter Google Scholar
Strube, M., & Ponzetto, S.P. (2006). Wikirelate! computing semantic relatedness using wikipedia. In AAAI, (Vol. 6 pp. 1419–1424).
Google Scholar
Virginia, G., & Nguyen, H.S. (2015). A semantic text retrieval for Indonesian using tolerance rough sets models. In Transactions on rough sets XIX, (pp. 138–224). Springer.
Google Scholar
Zhang, R., Pakhomov, S., McInnes, B.T., & Melton, G.B. (2011). Evaluating measures of redundancy in clinical texts. In AMIA annual symposium proceedings, american medical informatics association, (Vol. 2011 p. 1612).
Google Scholar
Zhang, Z., Gentile, A.L., & Ciravegna, F. (2012). Recent advances in methods of lexical semantic relatedness–a survey. Natural Language Engineering, 1(1), 1–69.
Google Scholar
Zou, G.Y. (2007). Toward using confidence intervals to compare correlations. Psychological Methods, 12(4), 399.
Article Google Scholar

Download references

Acknowledgments

We would like to thank the anonymous referees for their invaluable contributions towards improving the manuscript.

The work presented in this paper was partially supported by grants TIN2014-58304-R (Ministerio de Ciencia e Innovación), P11-TIC-7529 and P12-TIC-1519 (Plan Andaluz de Investigación, Desarrollo e Innovación).

Author information

Authors and Affiliations

Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga, Malaga, Spain
Maciej Rybiński & José Francisco Aldana Montes

Authors

Maciej Rybiński
View author publications
You can also search for this author in PubMed Google Scholar
José Francisco Aldana Montes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maciej Rybiński.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rybiński, M., Aldana Montes, J.F. DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics. J Intell Inf Syst 49, 315–331 (2017). https://doi.org/10.1007/s10844-017-0442-y

Download citation

Received: 03 December 2016
Revised: 01 January 2017
Accepted: 04 January 2017
Published: 13 January 2017
Issue Date: December 2017
DOI: https://doi.org/10.1007/s10844-017-0442-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

Abstract

Access this article

Similar content being viewed by others

tESA: a distributional measure for calculating semantic relatedness

Domain-Specific Semantic Relatedness from Wikipedia Structure: A Case Study in Biomedical Text

Calculating semantic relatedness for biomedical use in a knowledge-poor environment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

Abstract

Access this article

Similar content being viewed by others

tESA: a distributional measure for calculating semantic relatedness

Domain-Specific Semantic Relatedness from Wikipedia Structure: A Case Study in Biomedical Text

Calculating semantic relatedness for biomedical use in a knowledge-poor environment

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation