Skip to main content
Log in

DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Being able to correctly model semantic relatedness between texts, and consequently the concepts represented by these texts, has become an important part of many intelligent information retrieval and knowledge processing systems. The need for such systems is especially evident within the biomedical domain, where the sheer amount of scientific publishing contributes to an information overflow. In this paper we present a novel method to approximate semantic relatedness in domain-focused settings. The approach is an extension to a well-known ESA (Explicit Semantic Analysis) method. Our extension successfully leverages the semantics of a domain-specific document corpus. We present the evaluation of the proposed method on a set of reference datasets, that are a de facto reference standard for the task of approximating biomedical semantic relatedness. The proposed method is evaluated in comparison with other state-of-the-art methods, as well as the baselines established with the original ESA method. The results of the experiments suggest that the proposed method combines the semantics of a general and domain-specific corpora to provide significant improvements over the original method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://lucene.apache.org/core/

  2. We have evaluated the algorithm with the values of k between 1 and 15 and the method seems to work well within this range. In the evaluation presented here we only discuss results for k = 1 and k = 10 for illustrational purposes.

  3. See https://seriousstats.wordpress.com/2012/02/05/comparing-correlations/ for discussion and code.

References

  • Agirre, E., & Rigau, G. (1996). Word sense disambiguation using conceptual density. In Proceedings of the 16th conference on computational linguistics-volume 1, association for computational linguistics (pp. 16–22).

  • Asooja, N.A.K., Bordea, G., & Buitelaar, P. (2015). Non-orthogonal explicit semantic analysis. Lexical and Computational Semantics (* SEM 2015).

  • Barzilay, R., & Elhadad, M. (1997). Using lexical chains for text summarization. In Proceedings of the ACL workshop on intelligent scalable text summarization: July 1997; Madrid, Spain, Association for Computational Linguistics (pp. 10–17).

  • Dumais, S.T. (2004). Latent semantic analysis. Annual Review of Information Science and Technology, 38(1), 188–230.

    Article  Google Scholar 

  • Gabrilovich, E., & Markovitch, S. (2007). Computing semantic relatedness using wikipedia-based explicit semantic analysis. In IJCAI, (Vol. 7 pp. 1606–1611).

  • Guo, X., Liu, R., Shriver, C.D., Hu, H., & Liebman, M.N. (2006). Assessing semantic similarity measures for the characterization of human regulatory pathways. Bioinformatics, 22(8), 967–973.

    Article  Google Scholar 

  • Haralambous, Y., & Klyuev, V. (2013). Thematically reinforced explicit semantic analysis. International Journal of Computational Linguistics and Applications, 4(1), 79.

    Google Scholar 

  • Kusner, M.J., Sun, Y., Kolkin, N.I., & Weinberger, K.Q. (2015). From word embeddings to document distances. In Proceedings of the 32nd international conference on machine learning (ICML 2015) (pp. 957–966).

  • Liu, Y., McInnes, B.T., Pedersen, T., Melton-Meaux, G., & Pakhomov, S. (2012). Semantic relatedness study using second order co-occurrence vectors computed from biomedical corpora, umls and wordnet. In Proceedings of the 2nd ACM SIGHIT international health informatics symposium, ACM (pp. 363–372).

  • Martinez-Gil, J. (2016). Accurate semantic similarity measurement of biomedical nomenclature by means of fuzzy logic. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 24(02), 291–305.

    Article  Google Scholar 

  • Mathur, S., & Dinakarpandian, D. (2012). Finding disease similarity based on implicit semantic similarity. Journal of Biomedical Informatics, 45(2), 363–371.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G.S., & Dean, J. (2013). Efficient estimation of word representations in vector space. In International Conference on Learning Representations. arXiv:1301.3781.

  • Muneeb, T., Sahu, S.K., & Anand, A. (2015). Evaluating distributed word representations for capturing semantics of biomedical concepts. In ACL-IJCNLP, (Vol. 2015 p. 158).

  • Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., & Melton, G.B. (2010). Semantic similarity and relatedness between clinical terms: an experimental study. In AMIA Annual symposium proceedings, american medical informatics association, (Vol. 2010 p. 572).

  • Pakhomov, S.V., Pedersen, T., McInnes, B., Melton, G.B., Ruggieri, A., & Chute, C.G. (2011). Towards a framework for developing semantic relatedness reference standards. Journal of Biomedical Informatics, 44(2), 251–265.

    Article  Google Scholar 

  • Pedersen, T., Pakhomov, S.V.S., Patwardhan, S., & Chute, C.G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3), 288–299.

    Article  Google Scholar 

  • Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global vectors for word representation. In EMNLP, (Vol. 14 pp. 1532–43).

  • Pesaranghader, A., Rezaei, A., & Pesaranghader, A. (2014). Adapting gloss vector semantic relatedness measure for semantic similarity estimation: an evaluation in the biomedical domain. In Semantic technology (pp. 129–145). New York: Springer.

    Chapter  Google Scholar 

  • Pesquita, C., Faria, D., Falcao, A.O., Lord, P., & Couto, F.M. (2009). Semantic similarity in biomedical ontologies. PLoS Computational Biology, 5(7), e1000,443.

    Article  MathSciNet  Google Scholar 

  • Polajnar, T., Aggarwal, N., Asooja, K., & Buitelaar, P. (2013). Improving esa with document similarity. In Advances in information retrieval (pp. 582–593). New York: Springer.

    Chapter  Google Scholar 

  • Potthast, M., Stein, B., & Anderka, M. (2008). A wikipedia-based multilingual retrieval model. In European conference on information retrieval, (pp. 522–530). Springer.

    Google Scholar 

  • Rada, R., Mili, H., Bicknell, E., & Blettner, M. (1989). Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1), 17–30.

    Article  Google Scholar 

  • Rybiński, M., & Aldana-Montes, J.F. (2016). TESA: a distributional measure for calculating semantic relatedness. BMC Journal of Biomedical Semantics – accepted for publication.

  • Rybiński, M., del Mar Roldán-Garcıa, M., Garcıa-Nieto, J., & Aldana-Montes, J.F. (2016). Dismatch results for OAEI. In OM. http://disi.unitn.it/~pavel/om2016/papers/oaei16_paper5.pdf.

  • Sahay, S., & Ram, A. (2011). Socio-semantic health information access. In AAAI spring symposium: AI and health communication, AAAI.

    Google Scholar 

  • Sajadi, A., Milios, E.E., Kešelj, V., & Janssen, J.C. (2015). Domain-specific semantic relatedness from wikipedia structure: a case study in biomedical text. In International conference on intelligent text processing and computational linguistics, (pp. 347–360). Springer.

    Google Scholar 

  • Sánchez, D., & Batet, M. (2011). Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. Journal of Biomedical Informatics, 44(5), 749–759.

    Article  Google Scholar 

  • Scholl, P., Böhnstedt, D, García, R.D., Rensing, C., & Steinmetz, R. (2010). Extended explicit semantic analysis for calculating semantic relatedness of web resources. In Sustaining TEL: from innovation to learning and practice (pp. 324–339). New York: Springer.

    Chapter  Google Scholar 

  • Strube, M., & Ponzetto, S.P. (2006). Wikirelate! computing semantic relatedness using wikipedia. In AAAI, (Vol. 6 pp. 1419–1424).

    Google Scholar 

  • Virginia, G., & Nguyen, H.S. (2015). A semantic text retrieval for Indonesian using tolerance rough sets models. In Transactions on rough sets XIX, (pp. 138–224). Springer.

    Google Scholar 

  • Zhang, R., Pakhomov, S., McInnes, B.T., & Melton, G.B. (2011). Evaluating measures of redundancy in clinical texts. In AMIA annual symposium proceedings, american medical informatics association, (Vol. 2011 p. 1612).

    Google Scholar 

  • Zhang, Z., Gentile, A.L., & Ciravegna, F. (2012). Recent advances in methods of lexical semantic relatedness–a survey. Natural Language Engineering, 1(1), 1–69.

    Google Scholar 

  • Zou, G.Y. (2007). Toward using confidence intervals to compare correlations. Psychological Methods, 12(4), 399.

    Article  Google Scholar 

Download references

Acknowledgments

We would like to thank the anonymous referees for their invaluable contributions towards improving the manuscript.

The work presented in this paper was partially supported by grants TIN2014-58304-R (Ministerio de Ciencia e Innovación), P11-TIC-7529 and P12-TIC-1519 (Plan Andaluz de Investigación, Desarrollo e Innovación).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maciej Rybiński.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rybiński, M., Aldana Montes, J.F. DomESA: a novel approach for extending domain-oriented lexical relatedness calculations with domain-specific semantics. J Intell Inf Syst 49, 315–331 (2017). https://doi.org/10.1007/s10844-017-0442-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-017-0442-y

Keywords

Navigation