Abstract
A major barrier to successful retrieval from external sources (e.g., electronic databases) is the tremendous variability in the words that people use to describe objects of interest. The fact that different authors use different words to describe essentially the same idea means that relevant objects will be missed; conversely, the fact that the same word can be used to refer to many different things means that irrelevant objects will be retrieved. We describe a statistical method called latent semantic indexing, which models the implicit higher order structure in the association of words and objects and improves retrieval performance by up to 30%. Additional large performance improvements of 40% and 67% can be achieved through the use of differential term weighting and iterative retrieval methods.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Atherton, P., &Borko, H. (1965).A test of factor-analytically derived automated classification methods (Rep. AIP-DRP 65-1).
Baker, F. B. (1962). Information retrieval based on latent class analysis.Journal of the ACM,9, 512–521.
Bates, M.J. (1986). Subject access in online catalogs: A design model.Journal of the American Society for Information Science,37, 357–376.
Blair, D. C., &Maron, M. E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system.Communications of the ACM,28, 289–299.
Borko, H., &Bernick, M. D. (1963). Automatic document classification.Journal of the ACM,10, 151–162.
Cullum, J. K., &Willoughby, R. A. (1985).Lanczos algorithms for large symmetric eigenvalue computations: Vol. I. Theory. Boston: Birkhauser.
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., &Harshman, R. A. (1990). Indexing by latent semantic analysis.Journal of the American Society for Information Science,41, 391–407.
Dumais, S. T., Furnas, G. W., Landauer, T. K., &Deerwester, S. (1988, May). Using latent semantic analysis to improve information retrieval. InCHI ’88 Conference Proceedings: Human Factors in Computing Systems (pp. 281–285). New York: ACM.
Dumais, S. T., &Littman, M. L. (1990, April). InfoSearch: A program for iterative information retrieval using LSI [Poster].CHI ’90 Conference Proceedings: Human Factors in Computing Systems. New York: ACM.
Fidel, R. (1985, October). Individual variability in online searching behavior. In C. A. Parkhurst (Ed.),ASIS ’85: Proceedings of the ASIS 48th Annual Meeting (pp. 69–72). White Plains, NY: Knowledge Industry Publications.
Forsythe, G. E., Malcolm, M. A., &Moler, C. B. (1977).Computer methods for mathematical computations. Englewood Cliffs, NJ: Prentice-Hall.
Furnas, G. W., Landauer, T. K., Gomez, L. M., &Dumais, S. T. (1987). The vocabulary problem in humansystem communication.Communications of the ACM,30, 964–971.
Jardin, N., &van Rusbergen, C. J. (1971). The use of hierarchic clustering in information retrieval.Information Storage & Retrieval,7, 217–240.
Kane-Esrig, Y., Casella, G., Streeter, L. A., &Dumais, S. T. (1989, August). Ranking documents for retrieval by modeling of a relevance density. In S. Boker (Ed.),Proceedings of the 12th IRIS (Information Systems Research Seminar in Scandinavia) (pp. 329–338). Aarhus, Denmark: Aarhus University.
Koll, M. (1979). An approach to concept-based information retrieval.ACM SIGIR Forum,13, 32–50.
Oddy, R. N. (1977). Information retrieval through man-machine dialogue.Journal of Documentation,33, 1–14.
Ossowo, P. G. (1966). Classification space: A multivariate procedure for automatic document indexing and retrieval.Multivariate Behavioral Research,1, 479–524.
Salton, G., &Buckley, C. (1990). Improving retrieval performance by relevance feedback.JASIS,41, 288–297.
Sparck Jones, K. (1971).Automatic keyword classification for information retrieval. London: Buttersworth.
Sparck Jones, K. (1972). A statistical interpretation of term specificity and its applications in retrieval.Journal of Documentation,28, 11–21.
Stanfill, C., &Kahle, B. (1986). Parallel free-text search on the connection machine system.Communications of the ACM,29, 1229–1239.
Swets, J. (1963). Information retrieval systems.Science,141, 245–250.
Tarr, D., &Borko, H. (1974, October). Factors influencing inter-indexer consistency. In P. Zünde (Ed.),Proceedings of the ASIS 37th Annual Meeting (pp. 50–55). Washington, DC: ASIS.
Voorhees, E. (1985, June). The cluster hypothesis revisited. InSIGIR ’85: Proceedings of the Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 188–196). New York: ACM.
Williams, M. D. (1984). What makes RABBIT run?International Journal of Man-Machine Studies,21, 333–352.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Dumais, S.T. Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers 23, 229–236 (1991). https://doi.org/10.3758/BF03203370
Issue Date:
DOI: https://doi.org/10.3758/BF03203370