Skip to main content
main-content
Top

Hint

Swipe to navigate through the chapters of this book

2021 | OriginalPaper | Chapter

Constructing VeSNet: Mapping LOD Thesauri onto Princeton WordNet and Polish WordNet

Authors : Arkadiusz Janz, Grzegorz Kostkowski, Marek Maziarz

Published in: Advances in Computational Collective Intelligence

Publisher: Springer International Publishing

share
SHARE

Abstract

Lexical resources are crucial in many modern applications of Natural Language Processing and Artificial Intelligence. We present VeSNet – a network of lexical resources resulting from the merge of Polish-English WordNet (PEWN) with several existing large electronic thesauri from the Linked Open Data cloud (DBpedia, Wikipedia, GeoWordNet, Agrovoc, Eurovoc, Gemet and MeSH). We describe the procedure of making the resource and depict its elementary properties, as well as, evaluate its quality. The created lexical network is characterised both by great coverage and high precision: nearly 1.3M new exactMatch links were created, including 85K to PEWN, with the estimated precision of 94%.
Appendix
Available only for authorised users
Footnotes
8
Thus, we did not distinguish between them. Equaling eM and cM could be justified by the fact that “skos:exactMatch, defined as a transitive subproperty of skos:closeMatch, was intended to express a degree of similarity close enough to justify (...) propagation” [2].
 
9
That is not only the shared ones.
 
10
Including also other thesauri.
 
11
Calculated with the normality assumption and with t-Student distribution for unknown deviance, \(n=5\) observations (i.e. lexical resources).
 
12
The eM ratio measure is important, since it shows how ‘compatible’ a thesaurus is when compared to wordnets. Lower eM ratios mean more specific terms in thesauri. This might inform us on how difficult finding a proper equivalent of a thesaurus concept in a wordnet could be. On the other hand, the analysis of correlation between recall and the labelling language number leads to identical conclusions, suggesting that this could also be an important factor.
 
13
The data and our code are available at https://​github.​com/​CLARIN-PL/​vesnet.
 
Literature
1.
go back to reference Bai, X., Ramos, M.R., Fiske, S.T.: As diversity increases, people paradoxically perceive social groups as more similar. Proc. Nat. Acad. Sci. 117(23), 12741–12749 (2020) CrossRef Bai, X., Ramos, M.R., Fiske, S.T.: As diversity increases, people paradoxically perceive social groups as more similar. Proc. Nat. Acad. Sci. 117(23), 12741–12749 (2020) CrossRef
2.
go back to reference Baker, T., Bechhofer, S., Isaac, A., Miles, A., Schreiber, G., Summers, E.: Key choices in the design of simple knowledge organization system (SKOS). Web Seman. Sci. Serv. Agents World Wide Web 20, 35–49 (2013) CrossRef Baker, T., Bechhofer, S., Isaac, A., Miles, A., Schreiber, G., Summers, E.: Key choices in the design of simple knowledge organization system (SKOS). Web Seman. Sci. Serv. Agents World Wide Web 20, 35–49 (2013) CrossRef
4.
go back to reference Bauer, F., Kaltenböck, M.: Linked Open Data: The essentials: A Quick Start Guide for Decision Makers. Edition mono/monochrom, Vienna, Austria (2011) Bauer, F., Kaltenböck, M.: Linked Open Data: The essentials: A Quick Start Guide for Decision Makers. Edition mono/monochrom, Vienna, Austria (2011)
6.
go back to reference Bond, F., Foster, R.: Linking and extending an open multilingual wordnet. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1352–1362 (2013) Bond, F., Foster, R.: Linking and extending an open multilingual wordnet. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1352–1362 (2013)
7.
go back to reference Bond, F., Paik, K.: A survey of wordnets and their licenses. Small, vol. 8, no. 4 (2012) Bond, F., Paik, K.: A survey of wordnets and their licenses. Small, vol. 8, no. 4 (2012)
8.
go back to reference Buchan, N.R., Grimalda, G., Wilson, R., Brewer, M., Fatas, E., Foddy, M.: Globalization and human cooperation. Proc. Nat. Acad. Sci. 106(11), 4138–4142 (2009) CrossRef Buchan, N.R., Grimalda, G., Wilson, R., Brewer, M., Fatas, E., Foddy, M.: Globalization and human cooperation. Proc. Nat. Acad. Sci. 106(11), 4138–4142 (2009) CrossRef
9.
go back to reference Calzolari, N., Soria, C.: Preparing the field for an open and distributed resource infrastructure: The role of the FLaReNet network. LREC2010 (2010) Calzolari, N., Soria, C.: Preparing the field for an open and distributed resource infrastructure: The role of the FLaReNet network. LREC2010 (2010)
10.
go back to reference Caracciolo, C., et al.: The AGROVOC linked dataset. Seman. Web 4(3), 341–348 (2013) CrossRef Caracciolo, C., et al.: The AGROVOC linked dataset. Seman. Web 4(3), 341–348 (2013) CrossRef
11.
go back to reference Cieri, C., et al.: A road map for interoperable language resource metadata (2010) Cieri, C., et al.: A road map for interoperable language resource metadata (2010)
12.
go back to reference Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960) CrossRef Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960) CrossRef
13.
go back to reference De Melo, G., Weikum, G.: Towards a universal wordnet by learning from combined evidence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 513–522 (2009) De Melo, G., Weikum, G.: Towards a universal wordnet by learning from combined evidence. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 513–522 (2009)
14.
go back to reference Fellbaum, C., Miller, G. (eds.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998) Fellbaum, C., Miller, G. (eds.): WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)
15.
go back to reference Hripcsak, G., Heitjan, D.F.: Measuring agreement in medical informatics reliability studies. J. Biomed. Inform. 35(2), 99–110 (2002) CrossRef Hripcsak, G., Heitjan, D.F.: Measuring agreement in medical informatics reliability studies. J. Biomed. Inform. 35(2), 99–110 (2002) CrossRef
16.
go back to reference Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005) CrossRef Hripcsak, G., Rothschild, A.S.: Agreement, the f-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005) CrossRef
17.
go back to reference Krippendorff, K.: Content Analysis: An Introduction to its Methodology. Sage Publications, New York (2018) Krippendorff, K.: Content Analysis: An Introduction to its Methodology. Sage Publications, New York (2018)
18.
go back to reference Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics, pp. 159–174 (1977) Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics, pp. 159–174 (1977)
19.
go back to reference Maziarz, M., Piasecki, M.: Towards mapping thesauri onto plWordNet. In: Proceedings of Global Wordnet Conference GWC-2018, pp. 45–53 (2018) Maziarz, M., Piasecki, M.: Towards mapping thesauri onto plWordNet. In: Proceedings of Global Wordnet Conference GWC-2018, pp. 45–53 (2018)
20.
go back to reference Maziarz, M., Piasecki, M., Rudnicka, E., Szpakowicz, S., Kędzia, P.: plWordNet 3.0 - a comprehensive lexical-semantic resource. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2259–2268 (2016) Maziarz, M., Piasecki, M., Rudnicka, E., Szpakowicz, S., Kędzia, P.: plWordNet 3.0 - a comprehensive lexical-semantic resource. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2259–2268 (2016)
21.
go back to reference McCrae, J.P., Buitelaar, P.: Linking datasets using semantic textual similarity. Cybern. Inform. Technol. 18(1), 109–123 (2018) MathSciNet McCrae, J.P., Buitelaar, P.: Linking datasets using semantic textual similarity. Cybern. Inform. Technol. 18(1), 109–123 (2018) MathSciNet
22.
go back to reference McCrae, J.P., Cillessen, D.: Towards a linking between WordNet and Wikidata. In: Proceedings of the 11th Global Wordnet Conference, pp. 252–257. Global Wordnet Association, University of South Africa (UNISA), January 2021 McCrae, J.P., Cillessen, D.: Towards a linking between WordNet and Wikidata. In: Proceedings of the 11th Global Wordnet Conference, pp. 252–257. Global Wordnet Association, University of South Africa (UNISA), January 2021
23.
go back to reference Miles, A., Matthews, B., Wilson, M., Brickley, D.: SKOS core: simple knowledge organisation for the Web. In: International Conference on Dublin Core and Metadata Applications, pp. 3–10 (2005) Miles, A., Matthews, B., Wilson, M., Brickley, D.: SKOS core: simple knowledge organisation for the Web. In: International Conference on Dublin Core and Metadata Applications, pp. 3–10 (2005)
24.
go back to reference Morshed, A., Caracciolo, C., Johannsen, G., Keizer, J.: Thesaurus alignment for Linked Data publishing. In: Proceedings of the International Conference on Dublin Core and Metadata Applications 2011, pp. 37–46. Dublin Core Metadata Initiative (2011) Morshed, A., Caracciolo, C., Johannsen, G., Keizer, J.: Thesaurus alignment for Linked Data publishing. In: Proceedings of the International Conference on Dublin Core and Metadata Applications 2011, pp. 37–46. Dublin Core Metadata Initiative (2011)
25.
go back to reference Reidsma, D., Carletta, J.: Reliability measurement without limits. Comput. Linguist. 34(3), 319–326 (2008) CrossRef Reidsma, D., Carletta, J.: Reliability measurement without limits. Comput. Linguist. 34(3), 319–326 (2008) CrossRef
26.
go back to reference Rudnicka, E., Witkowski, W., Piasecki, M.: A (non)-perfect match: mapping plWordNet onto princetonwordnet. In: Proceedings of the 11th Global Wordnet Conference, pp. 137–146 (2021) Rudnicka, E., Witkowski, W., Piasecki, M.: A (non)-perfect match: mapping plWordNet onto princetonwordnet. In: Proceedings of the 11th Global Wordnet Conference, pp. 137–146 (2021)
27.
go back to reference Tracey, J., Strassel, S.: Basic language resources for 31 languages (plus English): the LORELEI representative and incident language packs. In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 277–284 (2020) Tracey, J., Strassel, S.: Basic language resources for 31 languages (plus English): the LORELEI representative and incident language packs. In: Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL), pp. 277–284 (2020)
28.
go back to reference Watson, P., Petrie, A.: Method agreement analysis: a review of correct methodology. Theriogenology 73(9), 1167–1179 (2010) CrossRef Watson, P., Petrie, A.: Method agreement analysis: a review of correct methodology. Theriogenology 73(9), 1167–1179 (2010) CrossRef
Metadata
Title
Constructing VeSNet: Mapping LOD Thesauri onto Princeton WordNet and Polish WordNet
Authors
Arkadiusz Janz
Grzegorz Kostkowski
Marek Maziarz
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-88113-9_49

Premium Partner