Skip to main content
Top

2012 | OriginalPaper | Chapter

HOMALS for Dimension Reduction in Information Retrieval

Authors : Kay F. Hildebrand, Ulrich Müller-Funk

Published in: Challenges at the Interface of Data Analysis, Computer Science, and Optimization

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The usual data base for multiple correspondence analysis/homogeneity analysis consists of objects, characterised by categorical attributes. Its aims and ends are visualisation, dimension reduction and, to some extent, factor analysis using alternating least squares. As for dimension reduction, there are strong parallels between vector-based methods in Information Retrieval (IR) like the Vector Space Model (VSM) or Latent Semantic Analysis (LSA). The latter uses singular value decomposition (SVD) to discard a number of the smallest singular values and that way generates a lower-dimensional retrieval space. In this paper, the HOMALS technique is exploited for use in IR by categorising metric term frequencies in term-document matrices. In this context, dimension reduction is achieved by minimising the difference in distances between objects in the dimensionally reduced space compared to the full-dimensional space. An exemplary set of documents will be submitted to the process and later used for retrieval.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Here, words that cannot discriminate between documents and do not carry any content like a or and are removed.
 
2
In stemming, certain endings are removed or merged in order to map words with identical stems to the same item.
 
Literature
go back to reference Berry MW, Browne M (1999) Understanding search engines: mathematical modeling and text retrieval. Society for industrial and applied mathematics. Philadelphia, PA, USAMATH Berry MW, Browne M (1999) Understanding search engines: mathematical modeling and text retrieval. Society for industrial and applied mathematics. Philadelphia, PA, USAMATH
go back to reference Berry MW, Dumais ST, O’Brien GW (1994) Using linear algebra for intelligent information retrieval. Tech. Rep. UT-CS-94-270, University of Tennessee Berry MW, Dumais ST, O’Brien GW (1994) Using linear algebra for intelligent information retrieval. Tech. Rep. UT-CS-94-270, University of Tennessee
go back to reference Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407CrossRef Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391–407CrossRef
go back to reference Dumais ST (1991) Improving the retrieval of information from external sources. Behav Res Meth Instrum Comput 23(2):229–236CrossRef Dumais ST (1991) Improving the retrieval of information from external sources. Behav Res Meth Instrum Comput 23(2):229–236CrossRef
go back to reference Dumais ST (2007) LSA and Information retrieval: Getting back to basics, Lawrence Erlbaum associates. Mahwah, NJ, Chap. 16, pp 293–321 Dumais ST (2007) LSA and Information retrieval: Getting back to basics, Lawrence Erlbaum associates. Mahwah, NJ, Chap. 16, pp 293–321
go back to reference Dumais ST, Furnas GW, Landauer TK, Deerwester SC, Harshman RA (1988) Using latent semantic analysis to improve access to textual information. In: CHI ’88: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press, New York, NY, pp 281–285 Dumais ST, Furnas GW, Landauer TK, Deerwester SC, Harshman RA (1988) Using latent semantic analysis to improve access to textual information. In: CHI ’88: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM Press, New York, NY, pp 281–285
go back to reference Kolda TG, O’Leary DP (1998) A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Trans Inf Syst 16(4):322–346MathSciNetCrossRef Kolda TG, O’Leary DP (1998) A semidiscrete matrix decomposition for latent semantic indexing information retrieval. ACM Trans Inf Syst 16(4):322–346MathSciNetCrossRef
go back to reference Landauer TK, Foltz PW, Laham D (1998) Introduction to latent semantic analysis. Discourse Process 25:259–284CrossRef Landauer TK, Foltz PW, Laham D (1998) Introduction to latent semantic analysis. Discourse Process 25:259–284CrossRef
go back to reference Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge, MAMATH Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge, MAMATH
go back to reference Martin DI, Berry MW (2007) Mathematical foundations behind latent semantic analysis, Lawrence Erlbaum associates. Mahwah, NJ, Chap. 2, pp 35–55 Martin DI, Berry MW (2007) Mathematical foundations behind latent semantic analysis, Lawrence Erlbaum associates. Mahwah, NJ, Chap. 2, pp 35–55
go back to reference Michailidis G, Leeuw JD (2005) Homogeneity analysis using absolute deviations. Comput Stat Data Anal 48(3):587–603MATHCrossRef Michailidis G, Leeuw JD (2005) Homogeneity analysis using absolute deviations. Comput Stat Data Anal 48(3):587–603MATHCrossRef
go back to reference Salton G (1988) Automatic text processing: The transformation analysis and retrieval of information by computer. Addison-Wesley Salton G (1988) Automatic text processing: The transformation analysis and retrieval of information by computer. Addison-Wesley
Metadata
Title
HOMALS for Dimension Reduction in Information Retrieval
Authors
Kay F. Hildebrand
Ulrich Müller-Funk
Copyright Year
2012
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-642-24466-7_36

Premium Partner