Skip to main content
Top
Published in: Knowledge and Information Systems 8/2020

20-03-2020 | Regular Paper

Lexifield: a system for the automatic building of lexicons by semantic expansion of short word lists

Authors: Suzanne Mpouli, Michel Beigbeder, Christine Largeron

Published in: Knowledge and Information Systems | Issue 8/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We present Lexifield, a fully automatic language-independent system for building domain-specific lexicons from a short list of terms defining the domain. Lexifield relies on a pre-trained word embedding model, a definition dictionary and a dictionary of synonyms. To evaluate this system, four lexicons have been generated: one lexicon in French for the topic “son” (“sound”) and three lexicons in English for the topics “sound”, “taste” and “odour”. As compared to other word embedding-based systems and a state-of-the-art sensorial lexicon, Sensicon, our system achieves better precision and recall on reference lists extracted from manually created resources such as Roget’s Thesaurus.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Al-Shalabi R, Kanaan G (2004) Constructing an automatic lexicon for arabic language. Int J Comput Inf Sci 2(2):114–128 Al-Shalabi R, Kanaan G (2004) Constructing an automatic lexicon for arabic language. Int J Comput Inf Sci 2(2):114–128
2.
go back to reference Amsler RA (1981) A taxonomy for English nouns and verbs. In: Proceedings of the 19th annual meeting, Association for Computational Linguistics, pp 133–138 Amsler RA (1981) A taxonomy for English nouns and verbs. In: Proceedings of the 19th annual meeting, Association for Computational Linguistics, pp 133–138
3.
go back to reference Azad HK, Deepak A (2019) Query expansion techniques for information retrieval: a survey. Inf Process Manag 56(5):1698–1735CrossRef Azad HK, Deepak A (2019) Query expansion techniques for information retrieval: a survey. Inf Process Manag 56(5):1698–1735CrossRef
4.
go back to reference Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley framenet project. In: Proceedings of the 17th international conference on computational linguistics, vol1, Association for Computational Linguistics, pp 86–90 Baker CF, Fillmore CJ, Lowe JB (1998) The Berkeley framenet project. In: Proceedings of the 17th international conference on computational linguistics, vol1, Association for Computational Linguistics, pp 86–90
5.
go back to reference Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef Bojanowski P, Grave E, Joulin A, Mikolov T (2017) Enriching word vectors with subword information. Trans Assoc Comput Linguist 5:135–146CrossRef
6.
go back to reference Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL pp 31–40 Bouma G (2009) Normalized (pointwise) mutual information in collocation extraction. In: Proceedings of GSCL pp 31–40
7.
go back to reference Calzolari N (1984) Detecting patterns in a lexical data base. In: Proceedings of the 10th international conference on computational linguistics, COLING ’84, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 170–173. https://doi.org/10.3115/980431.980527 Calzolari N (1984) Detecting patterns in a lexical data base. In: Proceedings of the 10th international conference on computational linguistics, COLING ’84, Association for Computational Linguistics, Stroudsburg, PA, USA, pp 170–173. https://​doi.​org/​10.​3115/​980431.​980527
8.
go back to reference Chodorow MS, Byrd RJ, Heidorn GE (1985) Extracting semantic hierarchies from a large on-line dictionary. In: Proceedings of the 23rd annual meeting, Association for Computational Linguistics, pp 299–304 Chodorow MS, Byrd RJ, Heidorn GE (1985) Extracting semantic hierarchies from a large on-line dictionary. In: Proceedings of the 23rd annual meeting, Association for Computational Linguistics, pp 299–304
9.
go back to reference Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29 Church KW, Hanks P (1990) Word association norms, mutual information, and lexicography. Comput Linguist 16(1):22–29
10.
go back to reference Copestake A (1990) An approach to building the hierarchical element of a lexical knowledge base from a machine readable dictionary. In: First international workshop on inheritance in NLP Copestake A (1990) An approach to building the hierarchical element of a lexical knowledge base from a machine readable dictionary. In: First international workshop on inheritance in NLP
11.
go back to reference Dubois J, Dubois-Charlier F (2010) La combinatoire lexico-syntaxique dans le dictionnaire électronique des mots. les termes du domaine de la musique à titre d’illustration. Langages 179–180(3):31–56CrossRef Dubois J, Dubois-Charlier F (2010) La combinatoire lexico-syntaxique dans le dictionnaire électronique des mots. les termes du domaine de la musique à titre d’illustration. Langages 179–180(3):31–56CrossRef
12.
go back to reference Dubois J, Dubois-Charlier F (1997) Les Verbes français. Larousse, Paris Dubois J, Dubois-Charlier F (1997) Les Verbes français. Larousse, Paris
13.
go back to reference Fang H (2008) A re-examination of query expansion using lexical resources. In: Proceedings of ACL-08: HLT, pp 139–147 Fang H (2008) A re-examination of query expansion using lexical resources. In: Proceedings of ACL-08: HLT, pp 139–147
14.
go back to reference Fast E, Chen B, Bernstein MS (2016) Empath: understanding topic signals in large-scale text. In: Proceedings of the 2016 CHI conference on human factors in computing systems, ACM, pp 4647–4657 Fast E, Chen B, Bernstein MS (2016) Empath: understanding topic signals in large-scale text. In: Proceedings of the 2016 CHI conference on human factors in computing systems, ACM, pp 4647–4657
15.
go back to reference Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books, CambridgeCrossRef Fellbaum C (1998) WordNet: an electronic lexical database. Bradford Books, CambridgeCrossRef
16.
go back to reference Globerson A, Chechik G, Pereira F, Tishby N (2007) Euclidean embedding of co-occurrence data. J Mach Learn Res 8:2265–2295MathSciNetMATH Globerson A, Chechik G, Pereira F, Tishby N (2007) Euclidean embedding of co-occurrence data. J Mach Learn Res 8:2265–2295MathSciNetMATH
17.
go back to reference Jakubíček M, Kilgarriff A, Kovář V, Rychlỳ P, Suchomel V (2013) The tenten corpus family. In: 7th International corpus linguistics conference, CL, pp 125–127 Jakubíček M, Kilgarriff A, Kovář V, Rychlỳ P, Suchomel V (2013) The tenten corpus family. In: 7th International corpus linguistics conference, CL, pp 125–127
18.
go back to reference Kotov A, Zhai C (2012) Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, pp 403–412 Kotov A, Zhai C (2012) Tapping into knowledge base for concept feedback: leveraging conceptnet to improve search results for difficult queries. In: Proceedings of the fifth ACM international conference on Web search and data mining, ACM, pp 403–412
19.
go back to reference Kuzi S, Shtok A, Kurland O (2016) Query expansion using word embeddings. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, pp 1929–1932 Kuzi S, Shtok A, Kurland O (2016) Query expansion using word embeddings. In: Proceedings of the 25th ACM international on conference on information and knowledge management, ACM, pp 1929–1932
20.
go back to reference Lavelli A, Sebastiani F, Zanoli R (2004) Distributional term representations: an experimental comparison. In: Proceedings of the thirteenth ACM international conference on information and knowledge management, pp 615–624 Lavelli A, Sebastiani F, Zanoli R (2004) Distributional term representations: an experimental comparison. In: Proceedings of the thirteenth ACM international conference on information and knowledge management, pp 615–624
21.
go back to reference Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Proceedings of the 27th international conference on neural information processing systems, vol. 2, NIPS’14, pp 2177–2185 Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. In: Proceedings of the 27th international conference on neural information processing systems, vol. 2, NIPS’14, pp 2177–2185
22.
go back to reference Liu S, Liu F, Yu C, Meng W (2004) An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 266–272 Liu S, Liu F, Yu C, Meng W (2004) An effective approach to document retrieval via utilizing WordNet and recognizing phrases. In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 266–272
23.
go back to reference Manguin JL (2004) Transitivité partielle de la synonymie: application aux dictionnaires de synonymes. Corela—cognition, représentation, langage Manguin JL (2004) Transitivité partielle de la synonymie: application aux dictionnaires de synonymes. Corela—cognition, représentation, langage
25.
go back to reference Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119 Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
26.
go back to reference Mitchell J, Lapata M (2010) Composition in distributional models of semantics. Cognit Sci 34(8):1388–1429CrossRef Mitchell J, Lapata M (2010) Composition in distributional models of semantics. Cognit Sci 34(8):1388–1429CrossRef
27.
go back to reference Park D, Kim S, Lee J, Choo J, Diakopoulos N, Elmqvist N (2018) Conceptvector: text visual analytics via interactive lexicon building using word embedding. IEEE Trans Vis Comput Gr 24(1):361–370CrossRef Park D, Kim S, Lee J, Choo J, Diakopoulos N, Elmqvist N (2018) Conceptvector: text visual analytics via interactive lexicon building using word embedding. IEEE Trans Vis Comput Gr 24(1):361–370CrossRef
28.
go back to reference Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: Liwc 2001, vol 71. Mahway: Lawrence Erlbaum Associates Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: Liwc 2001, vol 71. Mahway: Lawrence Erlbaum Associates
29.
go back to reference Riloff E, Shepherd J (1997) A corpus-based approach for building semantic lexicons. In: Proceedings of the second conference on empirical methods in natural language processing (EMNLP-2), pp 117–124 Riloff E, Shepherd J (1997) A corpus-based approach for building semantic lexicons. In: Proceedings of the second conference on empirical methods in natural language processing (EMNLP-2), pp 117–124
30.
go back to reference Riloff E, Shepherd J (1999) A corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction. Nat Lang Eng 5(2):147–156CrossRef Riloff E, Shepherd J (1999) A corpus-based bootstrapping algorithm for semi-automated semantic lexicon construction. Nat Lang Eng 5(2):147–156CrossRef
31.
go back to reference Roark B, Charniak E (1998) Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, vol 2, Association for Computational Linguistics, pp 1110–1116 Roark B, Charniak E (1998) Noun-phrase co-occurrence statistics for semiautomatic semantic lexicon construction. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, vol 2, Association for Computational Linguistics, pp 1110–1116
32.
go back to reference Sagot B (2005) Automatic acquisition of a Slovak lexicon from a raw corpus. In: International conference on text, speech and dialogue, Springer, pp 156–163 Sagot B (2005) Automatic acquisition of a Slovak lexicon from a raw corpus. In: International conference on text, speech and dialogue, Springer, pp 156–163
33.
go back to reference Tekiroglu SS, Özbal G, Strapparava C (2014) Sensicon: an automatically constructed sensorial lexicon. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1511–1521 Tekiroglu SS, Özbal G, Strapparava C (2014) Sensicon: an automatically constructed sensorial lexicon. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1511–1521
34.
go back to reference Tonelli S, Pighin D (2009) New features for framenet: WordNet mapping. In: Proceedings of the thirteenth conference on computational natural language learning, Association for Computational Linguistics, pp 219–227 Tonelli S, Pighin D (2009) New features for framenet: WordNet mapping. In: Proceedings of the thirteenth conference on computational natural language learning, Association for Computational Linguistics, pp 219–227
35.
go back to reference Verma N, Bhattacharyya P (2004) Automatic lexicon generation through WordNet. GWC 2004:226 Verma N, Bhattacharyya P (2004) Automatic lexicon generation through WordNet. GWC 2004:226
36.
go back to reference Voorhees EM (1994) Query expansion using lexical-semantic relations. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, pp 61–69 Voorhees EM (1994) Query expansion using lexical-semantic relations. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval, ACM Press, pp 61–69
37.
go back to reference Zhang J, Deng B, Li X (2009) Concept based query expansion using WordNet. In: Proceedings of the 2009 international e-conference on advanced science and technology, IEEE Computer Society, pp 52–55 Zhang J, Deng B, Li X (2009) Concept based query expansion using WordNet. In: Proceedings of the 2009 international e-conference on advanced science and technology, IEEE Computer Society, pp 52–55
38.
go back to reference Zhu M, Wu YFB (2014) Search by multiple examples. In: Proceedings of the 7th ACM international conference on Web search and data mining, ACM Press, pp 667–672 Zhu M, Wu YFB (2014) Search by multiple examples. In: Proceedings of the 7th ACM international conference on Web search and data mining, ACM Press, pp 667–672
Metadata
Title
Lexifield: a system for the automatic building of lexicons by semantic expansion of short word lists
Authors
Suzanne Mpouli
Michel Beigbeder
Christine Largeron
Publication date
20-03-2020
Publisher
Springer London
Published in
Knowledge and Information Systems / Issue 8/2020
Print ISSN: 0219-1377
Electronic ISSN: 0219-3116
DOI
https://doi.org/10.1007/s10115-020-01451-6

Other articles of this Issue 8/2020

Knowledge and Information Systems 8/2020 Go to the issue

Premium Partner