Skip to main content
Top

2012 | OriginalPaper | Chapter

Calculating a Distributional Similarity Kernel using the Nyström Extension

Authors : Markus Arndt, Ulrich Arndt

Published in: Challenges at the Interface of Data Analysis, Computer Science, and Optimization

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The analysis of distributional similarities induced by word co-occurrences is an established tool for extracting semantically related words from a large text corpus. Based on the co-occurrence matrix C the basic kernel matrix K = CC T reflects word–word similarities. In order to considerably improve the results, a similarity kernel matrix is expressed as \(G\,=\,{U}_{k}{U}_{k}^{T}\), where U k are the first k eigenvectors of the eigendecomposition K = UΣU T . Clearly, the bottleneck of this technique is the high computational demand for calculating the eigendecomposition. In our study we speed up the calculation of the low-rank similarity kernel by means of the Nyström extension. We address in detail the inherent challenge of the Nyström method, namely selecting appropriate kernel matrix columns in such a way that the fast approximation process yields satisfactory results. To illustrate the effectiveness of our method, we have built a thesaurus containing 32,000 entries based on 0.5 billion corpus words (nouns, verbs, adjectives and adverbs) extracted from the Project Gutenberg text collection.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
Literature
go back to reference Drineas P, Mahoney MW (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetMATH Drineas P, Mahoney MW (2005) On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J Mach Learn Res 6:2153–2175MathSciNetMATH
go back to reference Fellbaum C (1998) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.MATH Fellbaum C (1998) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.MATH
go back to reference Kumar S, Mohri M, Talwalkar A (2009) Sampling techniques for the Nyström method. In: Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp 304–311 Kumar S, Mohri M, Talwalkar A (2009) Sampling techniques for the Nyström method. In: Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), pp 304–311
go back to reference Landauer TK, Dumais ST (1997) A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol Rev 104:211–240CrossRef Landauer TK, Dumais ST (1997) A solution to Plato’s problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychol Rev 104:211–240CrossRef
go back to reference Rapp R (2008) The automatic generation of thesauri of related words for English, French, German, and Russian. Int J Speech Technol 11:147–156CrossRef Rapp R (2008) The automatic generation of thesauri of related words for English, French, German, and Russian. Int J Speech Technol 11:147–156CrossRef
go back to reference Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRef Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, CambridgeCrossRef
go back to reference Turney PD, Pantel P (2010) From frequency to meaning: Vector space models of semantics. J Artif Intell Res 37:141–188MathSciNetMATH Turney PD, Pantel P (2010) From frequency to meaning: Vector space models of semantics. J Artif Intell Res 37:141–188MathSciNetMATH
Metadata
Title
Calculating a Distributional Similarity Kernel using the Nyström Extension
Authors
Markus Arndt
Ulrich Arndt
Copyright Year
2012
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-24466-7_34

Premium Partner