COVER: a linguistic resource combining common sense and lexicographic information

Mensa, Enrico; Radicioni, Daniele P.; Lieto, Antonio

doi:10.1007/s10579-018-9417-z

COVER: a linguistic resource combining common sense and lexicographic information

Original Paper
Published: 21 June 2018

Volume 52, pages 921–948, (2018)
Cite this article

Language Resources and Evaluation Aims and scope Submit manuscript

Enrico Mensa¹,
Daniele P. Radicioni¹ &
Antonio Lieto¹

489 Accesses
9 Citations
2 Altmetric
Explore all metrics

Abstract

Lexical resources are fundamental to tackle many tasks that are central to present and prospective research in Text Mining, Information Retrieval, and connected to Natural Language Processing. In this article we introduce COVER, a novel lexical resource, along with COVERAGE, the algorithm devised to build it. In order to describe concepts, COVER proposes a compact vectorial representation that combines the lexicographic precision characterizing BabelNet and the rich common-sense knowledge featuring ConceptNet. We propose COVER as a reliable and mature resource, that has been employed in as diverse tasks as conceptual categorization, keywords extraction, and conceptual similarity. The experimental assessment is performed on the last task: we report and discuss the obtained results, pointing out future improvements. We conclude that COVER can be directly exploited to build applications, and coupled with existing resources, as well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semantic Similarity Reasoning

Common-Sense Knowledge for Natural Language Understanding: Experiments in Unsupervised and Supervised Settings

The Role of Common-Sense Knowledge in Assessing Semantic Association

Article 24 September 2018

Notes

“When people communicate with each other, they rely on shared background knowledge to understand each other: knowledge about the way objects relate to each other in the world, people’s goals in their daily lives, the emotional content of events or situations. This ‘taken for granted’ information is what we call common sense—obvious things people normally know and usually leave unstated” (Cambria et al. 2010, p. 15).
The representational limitation of this ontological resource has also led to the development of hybrid knowledge representation systems, such as, e.g., \(\textsc {Dual{-}PECCS}\) (Lieto et al. 2017a), that adopts OpenCyc to encode taxonomic information and resorts to different integrated frameworks the task of representing common-sense knowledge.
http://commoncrawl.org.
Of course, not all information available in ConceptNet can be directly mapped onto BSIs (e.g., the compound word “Something you find inside” has no counterpart in BabelNet/NASARI).
InstanceOf, RelatedTo, IsA, AtLocation, dbpedia/genre, Synonym, DerivedFrom, Causes, UsedFor, MotivatedByGoal, HasSubevent, Antonym, CapableOf, Desires, CausesDesire, PartOf, HasProperty, HasPrerequisite, MadeOf, CompoundDerivedFrom, HasFirstSubevent, dbpedia/field, dbpedia/knownFor, dbpedia/influencedBy, dbpedia/influenced, DefinedAs, HasA, MemberOf, ReceivesAction, SimilarTo, dbpedia/influenced, SymbolOf, HasContext, NotDesires, ObstructedBy, HasLastSubevent, NotUsedFor, NotCapableOf, DesireOf, NotHasProperty, CreatedBy, Attribute, Entails, LocationOfAction, LocatedNear.
http://corpus.byu.edu/full-text/.
The parameter \(\beta \) has been set to 2 to build the released resource.
Presently set to 0.6.
The parameters \(\alpha \) and \(\beta \) were set to .8 and .2 for the experimentation.
Publicly available at the URL http://www.seas.upenn.edu/~hansens/conceptSim/.
Namely, the 34 domains available in BabelDomains, http://lcl.uniroma1.it/babeldomains/.

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., & Soroa, A. (2009). A study on similarity and relatedness using distributional and WordNet-based approaches. In Proceedings of NAACL, NAACL ’09 (pp. 19–27). Association for Computational Linguistics.
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). DBpedia: A nucleus for a web of open data. In The semantic web (pp. 722–735).
Chapter Google Scholar
Baker, C. F., Fillmore, C. J., & Lowe, J. B. (1998). The Berkeley framenet project. In Proceedings of the 17th international conference on computational linguistics (Vol. 1, pp. 86–90). Association for Computational Linguistics.
Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! a systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL (Vol. 1, pp. 238–247).
Bosco, C., Patti, V., & Bolioli, A. (2013). Developing corpora for sentiment analysis: The case of irony and Senti-TUT. IEEE Intelligent Systems, 28(2), 55–63.
Article Google Scholar
Budanitsky, A., & Hirst, G. (2006). Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguists, 32(1), 13–47.
Article Google Scholar
Camacho-Collados, J., Pilehvar, M. T., Collier, N., & Navigli, R. (2017). Semeval-2017 task 2: Multilingual and cross-lingual semantic word similarity. In Proceedings of the 11th international workshop on semantic evaluation (SemEval 2017), Vancouver, Canada.
Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2015). A unified multilingual semantic representation of concepts. In Proceedings of ACL, Beijing, China.
Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2015). NASARI: A novel approach to a semantically-aware representation of items. In Proceedings of NAACL (pp. 567–577).
Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2016). NASARI: Integrating explicit knowledge and corpus statistics for a multilingual representation of concepts and entities. Artificial Intelligence, 240, 36–64.
Article Google Scholar
Cambria, E., Schuller, B., Liu, B., Wang, H., & Havasi, C. (2013). Knowledge-based approaches to concept-level sentiment analysis. IEEE Intelligent Systems, 28(2), 12–14.
Article Google Scholar
Cambria, E., Speer, R., Havasi, C., & Hussain, A. (2010). Senticnet: A publicly available semantic resource for opinion mining. In AAAI fall symposium: Commonsense knowledge (Vol. 10).
Ciaramita, M., & Johnson, M. (2003). Supersense tagging of unknown nouns in wordnet. In Proceedings of the 2003 conference on empirical methods in natural language processing (pp. 168–175). Association for Computational Linguistics.
Colla, D., Mensa, E., & Radicioni, D. P. (2017). Semantic measures for keywords extraction. In AI*IA 2017: Advances in artificial intelligence. Lecture notes for artificial intelligence. Springer.
Colla, D., Mensa, E., Radicioni, D. P., & Lieto, A. (2018). Tell me why: Computational explanation of conceptual similarity judgments. In Proceedings of the 17th international conference on information processing and management of uncertainty in knowledge-based systems (IPMU), special session on advances on explainable artificial intelligence, communications in computer and information science (CCIS). Springer, Cham.
Google Scholar
Denecke, K. (2008). Using sentiwordnet for multilingual sentiment analysis. In IEEE 24th international conference on data engineering workshop, 2008. ICDEW 2008 (pp. 507–512). IEEE.
Derrac, J., & Schockaert, S. (2015). Inducing semantic relations from conceptual spaces: A data-driven approach to plausible reasoning. Artificial Intelligence, 228, 66–94.
Article Google Scholar
Devitt, A., & Ahmad, K. (2013). Is there a language of sentiment? An analysis of lexical resources for sentiment analysis. Language Resources and Evaluation, 47(2), 475–511.
Article Google Scholar
Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2014). Retrofitting word vectors to semantic lexicons. arXiv preprint arXiv:1411.4166.
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2001). Placing search in context: The concept revisited. In Proceedings of the 10th international conference on world wide web (pp. 406–414). ACM.
Francopoulo, G., Bel, N., George, M., Calzolari, N., Monachini, M., Pet, M., et al. (2009). Multilingual resources for NLP in the lexical markup framework (LMF). Language Resources and Evaluation, 43(1), 57–70.
Article Google Scholar
Ganitkevitch, J., Van Durme, B., & Callison-Burch, C. (2013). PPDB: The paraphrase database. In Proceedings of NAACL-HLT (pp. 758–764).
Gärdenfors, P. (2014). The geometry of meaning: Semantics based on conceptual spaces. Cambridge: MIT Press.
Google Scholar
Gînscă, A.-L., Boroş, E., Iftene, A., Trandabăţ, D., Toader, M., Corîci, M., Perez, C.-A., & Cristea, D. (2011). Sentimatrix: Multilingual sentiment analysis service. In Proceedings of the 2nd workshop on computational approaches to subjectivity and sentiment analysis (pp. 189–195). Association for Computational Linguistics.
Harabagiu, S., & Moldovan, D. (2003). Question answering. In The Oxford handbook of computational linguistics. Oxford University Press.
Harris, Z. S. (1954). Distributional structure. Word, 10(2–3), 146–162.
Article Google Scholar
Havasi, C., Speer, R., & Alonso, J. (2007). ConceptNet: A lexical resource for common sense knowledge. In Recent advances in natural language processing V: Selected papers from RANLP (Vol. 309, p. 269).
Google Scholar
Hovy, E. (2003). Text summarization. In The Oxford handbook of computational linguistics (2nd edn.). Oxford University Press.
Jean-Louis, L., Zouaq, A., Gagnon, M., & Ensan, F. (2014). An assessment of online semantic annotators for the keyword extraction task. In Pacific Rim international conference on artificial intelligence (pp. 548–560). Springer.
Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical taxonomy. arXiv preprint cmp-lg/9709008.
Jimenez, S., Becerra, C., Gelbukh, A, Bátiz, A. J. D., & Mendizábal, A. (2013). Softcardinality-core: Improving text overlap with distributional measures for semantic textual similarity. In Proceedings of *SEM 2013 (Vol. 1, pp. 194–201).
Langley, P. (2012). The cognitive systems paradigm. Advances in Cognitive Systems, 1, 3–13.
Google Scholar
Leacock, C., Miller, G. A., & Chodorow, M. (1998). Using corpus statistics and WordNet relations for sense identification. Computational Linguistics, 24(1), 147–165.
Google Scholar
Lenat, D. B., Prakash, M., & Shepherd, M. (1985). CYC: Using common sense knowledge to overcome brittleness and knowledge acquisition bottlenecks. AI Magazine, 6(4), 65.
Google Scholar
Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press.
Google Scholar
Lieto, A., Minieri, A., Piana, A., Radicioni, D. P., & Frixione, M. (2014). A dual process architecture for ontology-based systems. In 6th international conference on knowledge engineering and ontology development, KEOD 2014 (pp. 48–55). INSTICC Press.
Lieto, A., Lebiere, C., & Oltramari, A. (2018). The knowledge level in cognitive architectures: Current limitations and possible developments. Cognitive Systems Research, 48, 39–55.
Article Google Scholar
Lieto, A., Mensa, E., & Radicioni, D. P. (2016). A resource-driven approach for anchoring linguistic resources to conceptual spaces. In Proceedings of the XVth international conference of the italian association for artificial intelligence, Genova, Italy, November 29–December 1, 2016, volume 10037 of lecture notes in artificial intelligence (pp. 435–449). Springer.
Lieto, A., Mensa, E., & Radicioni, D. P. (2016). Taming sense sparsity: A common-sense approach. In Proceedings of third Italian conference on computational linguistics (CLiC-it 2016) and fifth evaluation campaign of natural language processing and speech tools for Italian.
Lieto, A., Minieri, A., Piana, A., & Radicioni, D. P. (2015). A knowledge-based system for prototypical reasoning. Connection Science, 27(2), 137–152.
Article Google Scholar
Lieto, A., & Radicioni, D. P. (2016). From human to artificial cognition and back: New perspectives on cognitively inspired ai systems. Cognitive Systems Research, 39, 1–3.
Article Google Scholar
Lieto, A., Radicioni, D. P., & Rho, V. (2015). A common-sense conceptual categorization system integrating heterogeneous proxytypes and the dual process of reasoning. In Proceedings of the international joint conference on artificial intelligence (IJCAI) (pp. 875–881), Buenos Aires, July 2015. AAAI Press.
Lieto, Antonio, Radicioni, Daniele P., & Rho, Valentina. (2017). Dual PECCS: A cognitive system for conceptual representation and categorization. Journal of Experimental and Theoretical Artificial Intelligence, 29(2), 433–452.
Article Google Scholar
Lieto, A., Radicioni, D. P., Rho, V., & Mensa, E. (2017). Towards a unifying framework for conceptual represention and reasoning in cognitive systems. Intelligenza Artificiale, 11(2), 139–153.
Article Google Scholar
Liu, H., & Singh, P. (2004). Conceptnet: A practical commonsense reasoning tool-kit. BT Technology Journal, 22(4), 211–226.
Article Google Scholar
Marujo, L., Ribeiro, R., de Matos, D. M., Neto, J. P., Gershman, A., & Carbonell, J. (2012). Key phrase extraction of lightly filtered broadcast news. In Proceedings of 15th international conference on text, speech and dialogue (TSD 2012). Springer.
McCrae, J., Aguado-de Cea, G., Buitelaar, P., Cimiano, P., Declerck, T., Gómez-Pérez, A., et al. (2012). Interchanging lexical resources on the semantic web. Language Resources and Evaluation, 46(4), 701–719.
Article Google Scholar
Mensa, E., Radicioni, D. P., & Lieto, A. (2017). MeRaLi at Semeval-2017 task 2 subtask 1: A cognitively inspired approach. In Proceedings of the international workshop on semantic evaluation (SemEval 2017). Association for Computational Linguistics.
Mikolov, T., Chen, K., Corrado, G., & Dean, J (2013). Efficient estimation of word representations in vector space. CoRR abs/1301.3781.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39–41.
Article Google Scholar
Miller, G. A., & Charles, W. G. (1991). Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1), 1–28.
Article Google Scholar
Miller, G. A., & Fellbaum, C. (2007). Wordnet then and now. Language Resources and Evaluation, 41(2), 209–214.
Article Google Scholar
Mimno, D. M., Wallach, H. M., Talley, E. M., Leenders, M., & McCallum, A. (2011). Optimizing semantic coherence in topic models. In EMNLP (pp. 262–272). ACL.
Minsky, M. (2000). Commonsense-based interfaces. Communications of the ACM, 43(8), 66–73.
Article Google Scholar
Moro, A., Cecconi, F., & Navigli, R. (2014). Multilingual word sense disambiguation and entity linking for everybody. In Proceedings of the 2014 international conference on posters and demonstrations track (Vol. 1272, pp. 25–28). CEUR-WS. org.
Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys (CSUR), 41(2), 10.
Article Google Scholar
Navigli, R., & Ponzetto, S. P. (2010). BabelNet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 216–225). Association for Computational Linguistics.
Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250.
Article Google Scholar
Newman, D., Noh, Y., Talley, E., Karimi, S., & Baldwin, T. (2010). Evaluating topic models for digital libraries. In The ACM/IEEE joint conference on digital libraries (JCDL2010), Gold Coast, Australia. ACM.
Palmer, M., Babko-Malaya, O., & Dang, H. T. (2004). Different sense granularities for different applications. In Proceedings of workshop on scalable natural language understanding.
Pedersen, T., Banerjee, S., & Patwardhan, S. (2005). Maximizing semantic relatedness to perform word sense disambiguation. University of Minnesota supercomputing institute research report UMSI, 25, 2005.
Google Scholar
Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). Wordnet:: Similarity: Measuring the relatedness of concepts. In Demonstration papers at HLT-NAACL 2004 (pp. 38–41). Association for Computational Linguistics.
Pennington, Jeffrey, Socher, Richard, & Manning, Christopher D. (2014). Glove: Global Vectors for Word Representation. In EMNLP (Vol. 14, pp. 1532–1543).
Pilehvar, M. T., & Navigli, R. (2015). From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artificial Intelligence, 228, 95–128.
Article Google Scholar
Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. arXiv preprint cmp-lg/9511007.
Resnik, P. (1998). Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11(1), 95–130.
Google Scholar
Richardson, R., Smeaton, A. F., & Murphy, J. (1994). Using wordnet as a knowledge base for measuring semantic similarity between words. In Proceedings of AICS conference (pp. 1–15).
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192–233.
Article Google Scholar
Rubenstein, H., & Goodenough, J. B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10), 627–633.
Article Google Scholar
Schwartz, H. A., & Gomez, F. (2008). Acquiring knowledge from the web to be used as selectors for noun sense disambiguation. In Proceedings of the twelfth conference on computational natural language learning (pp. 105–112). ACL.
Schwartz, H. A., & Gomez, F.. (2011). Evaluating semantic metrics on tasks of concept similarity. In Proceedings of the international florida artificial intelligence research society conference (FLAIRS) (p. 324).
Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.
Article Google Scholar
Speer, R., & Chin, J. (2016). An ensemble method to produce high-quality word embeddings. arXiv preprint arXiv:1604.01692.
Speer, R., Chin, J., & Havasi, C. (2017). Conceptnet 5.5: An open multilingual graph of general knowledge. In AAAI (pp. 4444–4451).
Speer, R., & Havasi, C. (2012). Representing general relational Knowledge in ConceptNet 5. In LREC (pp. 3679–3686).
Speer, R., & Lowry-Duda, J. (2017). Conceptnet at semeval-2017 task 2: Extending word embeddings with multilingual relational knowledge. CoRR abs/1704.03560.
Turney, P. D. (2006). Similarity of semantic relations. Computational Linguistics, 32(3), 379–416.
Article Google Scholar
Tversky, A. (1977). Features of similarity. Psychological Review, 84(4), 327.
Article Google Scholar
Vossen, P., & Fellbaum, C (2009). Multilingual framenets in computational lexicography: Methods and applications, chapter Universals and idiosyncrasies in multilingual WordNets. Trends in linguistics/Studies and monographs: Studies and monographs. Mouton de Gruyter.
Wu, Z., & Palmer, M. (1994). Verbs semantics and lexical selection. In Proceedings of the 32nd annual meeting on association for computational linguistics (pp. 133–138). ACL.
Yampolskiy, R. (2013). Turing test as a defining feature of ai-completeness. In Artificial intelligence, evolutionary computing and metaheuristics (pp. 3–17).
Google Scholar
Yarlett, D., & Ramscar, M. (2008). Language learning through similarity-based generalization. Unpublished Ph.D. thesis, Stanford University.

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Turin, Turin, Italy
Enrico Mensa, Daniele P. Radicioni & Antonio Lieto

Authors

Enrico Mensa
View author publications
You can also search for this author in PubMed Google Scholar
Daniele P. Radicioni
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Lieto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniele P. Radicioni.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mensa, E., Radicioni, D.P. & Lieto, A. COVER: a linguistic resource combining common sense and lexicographic information. Lang Resources & Evaluation 52, 921–948 (2018). https://doi.org/10.1007/s10579-018-9417-z

Download citation

Published: 21 June 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10579-018-9417-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

COVER: a linguistic resource combining common sense and lexicographic information

Abstract

Access this article

Similar content being viewed by others

Semantic Similarity Reasoning

Common-Sense Knowledge for Natural Language Understanding: Experiments in Unsupervised and Supervised Settings

The Role of Common-Sense Knowledge in Assessing Semantic Association

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

COVER: a linguistic resource combining common sense and lexicographic information

Abstract

Access this article

Similar content being viewed by others

Semantic Similarity Reasoning

Common-Sense Knowledge for Natural Language Understanding: Experiments in Unsupervised and Supervised Settings

The Role of Common-Sense Knowledge in Assessing Semantic Association

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation