Improving the retrieval of information from external sources

Dumais, Susan T.

doi:10.3758/BF03203370

Improving the retrieval of information from external sources

Session 9 Database Management
Published: June 1991

Volume 23, pages 229–236, (1991)
Cite this article

Download PDF

Behavior Research Methods, Instruments, & Computers Aims and scope Submit manuscript

Improving the retrieval of information from external sources

Download PDF

Susan T. Dumais¹

3235 Accesses
300 Citations
3 Altmetric
Explore all metrics

Abstract

A major barrier to successful retrieval from external sources (e.g., electronic databases) is the tremendous variability in the words that people use to describe objects of interest. The fact that different authors use different words to describe essentially the same idea means that relevant objects will be missed; conversely, the fact that the same word can be used to refer to many different things means that irrelevant objects will be retrieved. We describe a statistical method called latent semantic indexing, which models the implicit higher order structure in the association of words and objects and improves retrieval performance by up to 30%. Additional large performance improvements of 40% and 67% can be achieved through the use of differential term weighting and iterative retrieval methods.

References

Atherton, P., &Borko, H. (1965).A test of factor-analytically derived automated classification methods (Rep. AIP-DRP 65-1).
Baker, F. B. (1962). Information retrieval based on latent class analysis.Journal of the ACM,9, 512–521.
Article Google Scholar
Bates, M.J. (1986). Subject access in online catalogs: A design model.Journal of the American Society for Information Science,37, 357–376.
Google Scholar
Blair, D. C., &Maron, M. E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system.Communications of the ACM,28, 289–299.
Article Google Scholar
Borko, H., &Bernick, M. D. (1963). Automatic document classification.Journal of the ACM,10, 151–162.
Article Google Scholar
Cullum, J. K., &Willoughby, R. A. (1985).Lanczos algorithms for large symmetric eigenvalue computations: Vol. I. Theory. Boston: Birkhauser.
Google Scholar
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., &Harshman, R. A. (1990). Indexing by latent semantic analysis.Journal of the American Society for Information Science,41, 391–407.
Article Google Scholar
Dumais, S. T., Furnas, G. W., Landauer, T. K., &Deerwester, S. (1988, May). Using latent semantic analysis to improve information retrieval. InCHI ’88 Conference Proceedings: Human Factors in Computing Systems (pp. 281–285). New York: ACM.
Chapter Google Scholar
Dumais, S. T., &Littman, M. L. (1990, April). InfoSearch: A program for iterative information retrieval using LSI [Poster].CHI ’90 Conference Proceedings: Human Factors in Computing Systems. New York: ACM.
Google Scholar
Fidel, R. (1985, October). Individual variability in online searching behavior. In C. A. Parkhurst (Ed.),ASIS ’85: Proceedings of the ASIS 48th Annual Meeting (pp. 69–72). White Plains, NY: Knowledge Industry Publications.
Google Scholar
Forsythe, G. E., Malcolm, M. A., &Moler, C. B. (1977).Computer methods for mathematical computations. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Furnas, G. W., Landauer, T. K., Gomez, L. M., &Dumais, S. T. (1987). The vocabulary problem in humansystem communication.Communications of the ACM,30, 964–971.
Article Google Scholar
Jardin, N., &van Rusbergen, C. J. (1971). The use of hierarchic clustering in information retrieval.Information Storage & Retrieval,7, 217–240.
Article Google Scholar
Kane-Esrig, Y., Casella, G., Streeter, L. A., &Dumais, S. T. (1989, August). Ranking documents for retrieval by modeling of a relevance density. In S. Boker (Ed.),Proceedings of the 12th IRIS (Information Systems Research Seminar in Scandinavia) (pp. 329–338). Aarhus, Denmark: Aarhus University.
Google Scholar
Koll, M. (1979). An approach to concept-based information retrieval.ACM SIGIR Forum,13, 32–50.
Article Google Scholar
Oddy, R. N. (1977). Information retrieval through man-machine dialogue.Journal of Documentation,33, 1–14.
Article Google Scholar
Ossowo, P. G. (1966). Classification space: A multivariate procedure for automatic document indexing and retrieval.Multivariate Behavioral Research,1, 479–524.
Article Google Scholar
Salton, G., &Buckley, C. (1990). Improving retrieval performance by relevance feedback.JASIS,41, 288–297.
Article Google Scholar
Sparck Jones, K. (1971).Automatic keyword classification for information retrieval. London: Buttersworth.
Google Scholar
Sparck Jones, K. (1972). A statistical interpretation of term specificity and its applications in retrieval.Journal of Documentation,28, 11–21.
Article Google Scholar
Stanfill, C., &Kahle, B. (1986). Parallel free-text search on the connection machine system.Communications of the ACM,29, 1229–1239.
Article Google Scholar
Swets, J. (1963). Information retrieval systems.Science,141, 245–250.
Article PubMed Google Scholar
Tarr, D., &Borko, H. (1974, October). Factors influencing inter-indexer consistency. In P. Zünde (Ed.),Proceedings of the ASIS 37th Annual Meeting (pp. 50–55). Washington, DC: ASIS.
Google Scholar
Voorhees, E. (1985, June). The cluster hypothesis revisited. InSIGIR ’85: Proceedings of the Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 188–196). New York: ACM.
Chapter Google Scholar
Williams, M. D. (1984). What makes RABBIT run?International Journal of Man-Machine Studies,21, 333–352.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bellcore, 445 South St., Room 2L-371, 07962-1910, Morristown, NJ
Susan T. Dumais

Authors

Susan T. Dumais
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dumais, S.T. Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers 23, 229–236 (1991). https://doi.org/10.3758/BF03203370

Download citation

Issue Date: June 1991
DOI: https://doi.org/10.3758/BF03203370

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Improving the retrieval of information from external sources

Abstract

Article PDF

Similar content being viewed by others

Combining Linked Data and Statistical Information Retrieval

Semistructured Data Search

Search Engines: Applications of ML

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving the retrieval of information from external sources

Abstract

Article PDF

Similar content being viewed by others

Combining Linked Data and Statistical Information Retrieval

Semistructured Data Search

Search Engines: Applications of ML

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation