Skip to main content
Erschienen in: Soft Computing 21/2018

03.04.2018 | Focus

Constructing and validating word similarity datasets by integrating methods from psychology, brain science and computational linguistics

verfasst von: Yu Wan, Yidong Chen, Xiaodong Shi, Changle Zhou

Erschienen in: Soft Computing | Ausgabe 21/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Human-scored word similarity gold-standard datasets are normally composed of word pairs with corresponding similarity scores. These datasets are popular resources for evaluating word similarity models which are the essential components for many natural language processing tasks. This paper proposes a novel multidisciplinary method for constructing and validating word similarity gold-standard datasets. The proposed method is different from the previous ones in that it introduces methods from three different disciplines, i.e., psychology, brain science and computational linguistics to validate the soundness of the constructed datasets. Specifically, to the best of our knowledge, this is the first time event-related potentials experiments are incorporated to validate the word similarity datasets. Using the proposed method, we finally constructed a Chinese gold-standard word similarity dataset with 260 word pairs and showed its soundness using the interdisciplinary validating methods. It should be noted that, although the paper only focused on constructing Chinese standard dataset, the proposed method is applicable to other languages.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the north American chapter of the ACL, pp 19–27 Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the north American chapter of the ACL, pp 19–27
Zurück zum Zitat Bennett M, Duke P, Fuggetta G (2014) Event-related potential n270 delayed and enhanced by the conjunction of relevant and irrelevant perceptual mismatch. Psychophysiology 51(5):456–463CrossRef Bennett M, Duke P, Fuggetta G (2014) Event-related potential n270 delayed and enhanced by the conjunction of relevant and irrelevant perceptual mismatch. Psychophysiology 51(5):456–463CrossRef
Zurück zum Zitat Burgess C, Lund K (1997) Modelling parsing constraints with high-dimensional context space. Lang Cogn Process 12:177–210CrossRef Burgess C, Lund K (1997) Modelling parsing constraints with high-dimensional context space. Lang Cogn Process 12:177–210CrossRef
Zurück zum Zitat Chen C, Lee S, Stevenson HW (1995) Response style and cross-cultural comparisons of rating scales among East Asian and North American students. Psychol Sci 6(3):170–175CrossRef Chen C, Lee S, Stevenson HW (1995) Response style and cross-cultural comparisons of rating scales among East Asian and North American students. Psychol Sci 6(3):170–175CrossRef
Zurück zum Zitat Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167 Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167
Zurück zum Zitat Deacon D, Hewitt S, Yang CM, Nagata M (2000) Event-related potential indices of semantic priming using masked and unmasked words: evidence that the n400 does not reflect a post-lexical process. Cogn Brain Res 9(2):137–146CrossRef Deacon D, Hewitt S, Yang CM, Nagata M (2000) Event-related potential indices of semantic priming using masked and unmasked words: evidence that the n400 does not reflect a post-lexical process. Cogn Brain Res 9(2):137–146CrossRef
Zurück zum Zitat Dong Z, Dong Q (2006) HowNet and the computation of meaning, 1st edn. World Scientific, HackensackCrossRef Dong Z, Dong Q (2006) HowNet and the computation of meaning, 1st edn. World Scientific, HackensackCrossRef
Zurück zum Zitat Dong Z, Dong Q, Hao C (2010) Hownet and its computation of meaning. In: Proceedings of the 23rd international conference on computational linguistics, pp 53–56 Dong Z, Dong Q, Hao C (2010) Hownet and its computation of meaning. In: Proceedings of the 23rd international conference on computational linguistics, pp 53–56
Zurück zum Zitat Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: the concept revisited. In: Proceedings of the 10th international conference on world wide web, pp 406–414 Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2001) Placing search in context: the concept revisited. In: Proceedings of the 10th international conference on world wide web, pp 406–414
Zurück zum Zitat Harris Z (1968) Mathematical structures of language, 1st edn. Wiley, New YorkMATH Harris Z (1968) Mathematical structures of language, 1st edn. Wiley, New YorkMATH
Zurück zum Zitat Hauk O, Pulvermüller F (2004) Effects of word length and frequency on the human event-related potential. Clin Neurophysiol 115(5):1090–1103CrossRef Hauk O, Pulvermüller F (2004) Effects of word length and frequency on the human event-related potential. Clin Neurophysiol 115(5):1090–1103CrossRef
Zurück zum Zitat Hill F, Reichart R, Korhonen A (2015) Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput Linguist 41(2):665–695MathSciNetCrossRef Hill F, Reichart R, Korhonen A (2015) Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput Linguist 41(2):665–695MathSciNetCrossRef
Zurück zum Zitat Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, pp 873–882 Huang EH, Socher R, Manning CD, Ng AY (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, pp 873–882
Zurück zum Zitat Jin P, Wu YF (2012) Semeval-2012 task 4: evaluating chinese word similarity. In: Proceedings of the 6th international workshop on semantic evaluation, pp 374–377 Jin P, Wu YF (2012) Semeval-2012 task 4: evaluating chinese word similarity. In: Proceedings of the 6th international workshop on semantic evaluation, pp 374–377
Zurück zum Zitat Jurgens D, Stevens K (2010) The s-space package: an open source package for word space models. In: Proceedings of the ACL 2010 system demonstrations, pp 30–35 Jurgens D, Stevens K (2010) The s-space package: an open source package for word space models. In: Proceedings of the ACL 2010 system demonstrations, pp 30–35
Zurück zum Zitat Kiefer M (2002) The n400 is modulated by unconsciously perceived masked words: further evidence for an automatic spreading activation account of n400 priming effects. Cogn Brain Res 13(1):27–39CrossRef Kiefer M (2002) The n400 is modulated by unconsciously perceived masked words: further evidence for an automatic spreading activation account of n400 priming effects. Cogn Brain Res 13(1):27–39CrossRef
Zurück zum Zitat Kutas M, Federmeier KD (2011) Thirty years and counting: finding meaning in the n400 component of the event related brain potential (erp). Annu Rev Psychol 62:621CrossRef Kutas M, Federmeier KD (2011) Thirty years and counting: finding meaning in the n400 component of the event related brain potential (erp). Annu Rev Psychol 62:621CrossRef
Zurück zum Zitat Kutas M, Hillyard SA et al (1980) Reading senseless sentences: brain potentials reflect semantic incongruity. Science 207(4427):203–205CrossRef Kutas M, Hillyard SA et al (1980) Reading senseless sentences: brain potentials reflect semantic incongruity. Science 207(4427):203–205CrossRef
Zurück zum Zitat Liu Q, Li S (2002) Word similarity computing based on how-net. In: Proceedings of the 3rd Chinese lexical semantics workshop, pp 59–76 Liu Q, Li S (2002) Word similarity computing based on how-net. In: Proceedings of the 3rd Chinese lexical semantics workshop, pp 59–76
Zurück zum Zitat Liu Y (2009) A review of Chinese vocabulary statistic studies. Chin Lang Learn 1:62–69 Liu Y (2009) A review of Chinese vocabulary statistic studies. Chin Lang Learn 1:62–69
Zurück zum Zitat Mao W, Yuping W (2007) Various conflicts from ventral and dorsal streams are sequentially processed in a common system. Exp Brain Res 177:113–121CrossRef Mao W, Yuping W (2007) Various conflicts from ventral and dorsal streams are sequentially processed in a common system. Exp Brain Res 177:113–121CrossRef
Zurück zum Zitat Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: Proceedings of international conference of learning representations Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: Proceedings of international conference of learning representations
Zurück zum Zitat Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119 Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013b) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119
Zurück zum Zitat Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41CrossRef Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41CrossRef
Zurück zum Zitat Moss HE, Ostrin RK (1995) Accessing different types of lexical semantic information: evidence from priming. J Exp Psychol Learn Mem Cogn 21(4):863–883CrossRef Moss HE, Ostrin RK (1995) Accessing different types of lexical semantic information: evidence from priming. J Exp Psychol Learn Mem Cogn 21(4):863–883CrossRef
Zurück zum Zitat Rohde DLT, Gonnerman LM, Plaut DC (2006) An improved model of semantic similarity based on lexical co-occurrence. Commun ACM 8:627–633 Rohde DLT, Gonnerman LM, Plaut DC (2006) An improved model of semantic similarity based on lexical co-occurrence. Commun ACM 8:627–633
Zurück zum Zitat Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633CrossRef Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633CrossRef
Zurück zum Zitat Turian J, Ratinov LA, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 384–394 Turian J, Ratinov LA, Bengio Y (2010) Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th annual meeting of the association for computational linguistics, pp 384–394
Zurück zum Zitat Wang X, Jia Y, Zhou B, Ding ZY, Liang Z (2011) Computing semantic relatedness using chinese wikipedia links and taxonomy. J Chin Comput Syst 32(11):2237–2242 Wang X, Jia Y, Zhou B, Ding ZY, Liang Z (2011) Computing semantic relatedness using chinese wikipedia links and taxonomy. J Chin Comput Syst 32(11):2237–2242
Metadaten
Titel
Constructing and validating word similarity datasets by integrating methods from psychology, brain science and computational linguistics
verfasst von
Yu Wan
Yidong Chen
Xiaodong Shi
Changle Zhou
Publikationsdatum
03.04.2018
Verlag
Springer Berlin Heidelberg
Erschienen in
Soft Computing / Ausgabe 21/2018
Print ISSN: 1432-7643
Elektronische ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-018-3174-1

Weitere Artikel der Ausgabe 21/2018

Soft Computing 21/2018 Zur Ausgabe