Abstract
We present an analysis of word senses that provides a fresh insight into the impact of word ambiguity on retrieval effectiveness with potential broader implications for other processes of information retrieval. Using a methodology of forming artifically ambiguous words, known as pseudowords, and through reference to other researchers' work, the analysis illustrates that the distribution of the frequency of occurrance of the senses of a word plays a strong role in ambiguity's impact of effectiveness. Further investigation shows that this analysis may also be applicable to other processes of retrieval, such as Cross Language Information Retrieval, query expansion, retrieval of OCR'ed texts, and stemming. The analysis appears to provide a means of explaining, at least in part, reasons for the processes' impact (or lack of it) on effectiveness.
- BALLESTEROS,L.AND CROFT, W. B. 1997. Phrasal translation and query expansion techniques for cross-langauge information retrieval. In Proceedings of the 20th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '97, Phila-delphia, PA, July 27-31), N. J. Belkin, A. D. Narasimhalu, P. Willett, W. Hersh, F. Can, and E. Voorhees, Eds, ACM Press, New York, NY, 84-91. Google Scholar
- BURNETT,J.E.,COOPER, D., LYNCH,M.F.,WILLETT, P., AND WYCHERLEY, M. 1979. Document retrieval experiments using indexing vocabularies of varying size. -1. Variety generation symbols assigned to the fronts of index terms. J. Doc. 35, 3, 197-206.Google Scholar
- CHURCH, K. W. 1995. One term or two?. In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '95, Seattle, WA, July 9-13), E. A. Fox, P. Ingwersen, and R. Fidel, Eds. ACM Press, New York, NY, 310-318. Google Scholar
- CRESTANI, F., SANDERSON, M., THEOPHYLACTOU, M., AND LALMAS, M. 1997. Short queries, natural language and spoken documents retrieval: Experiments at Glasgow University. In Proceedings of the 6th Text Retreival Conference (TREC-6, Nov.), E. Voorhees and D. Harman, Eds.Google Scholar
- GALE, W., CHURCH,K.W.,AND YAROWSKY, D. 1992a. Work on statistical methods for word sense disambiguation. In Intelligent Probabilistic Approaches to Natural Language Papers from the 1992 Fall Symposium. AAAI Press, Menlo Park, CA, 54-60.Google Scholar
- GALE, W., CHURCH,K.W.,AND YAROWSKY, D. 1992b. One sense per discourse. In Proceedings of the Workshop on Speech and Natural Language. U.S. Defense Advanced Research Project Agency, Washington, D.C. Google Scholar
- GALE, W., CHURCH,K.W.,AND YAROWSKY, D. 1992c. Estimating upper and lower bounds on the performance of word-sense disambiguation programs. In Proceedings of the 30th ACL Conference. 249-256. Google Scholar
- GREFENSTETTE, G. 1994. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, Hingham, MA. Google Scholar
- HARMAN, D. 1987. A failure analysis of the limitation of suffixing in an online environment. In Proceedings of the 10th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '87, New Orleans, LA, June 3-5, 1987), C. T. Yu and C. J. Van Rijsbergen, Eds. ACM Press, New York, NY, 102-107. Google Scholar
- HARMAN, D. 1992. Ranking algorithms. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Inc., Upper Saddle River, NJ, 363-392. Google Scholar
- HULL,D.A.AND GREFENSTETTE, G. 1996. Querying across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th Annual Interna-tional ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96, Zurich, Switzerland, Aug. 18-22), H.-P. Frei, D. Harman, P. Scha~bie, and R. Wilkinson, Eds. ACM Press, New York, NY, 49-57. Google Scholar
- KILGARRIFF, A. 1997. I don't believe in word senses. Comput. Hum. 31, 2, 91-113.Google Scholar
- KROVETZ, R. 1993. Viewing morphology as an inference process. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Re-trieval (SIGIR '93, Pittsburgh, PA, June 27-July 1), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 191-202. Google Scholar
- KROVETZ,R.AND CROFT, W. B. 1992. Lexical ambiguity and information retrieval. ACM Trans. Inf. Syst. 10, 2 (Apr.), 115-141. Google Scholar
- LESK, M. 1986. Automatic sense disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference. ACM, New York, NY, 24-26. Google Scholar
- LEWIS, D. D. 1992. Representation and learning in information retrieval. Ph.D. Dissertation. Department of Computer Science, University of Massachusetts, Amherst, MA. Google Scholar
- MILLER, G. A. 1995. WordNet: A lexical database for English. Commun. ACM 38, 11 (Nov.), 39-41. Google Scholar
- NG,H.T.AND LEE, H. B. 1996. Integrating multiple knowledge sources to disambiguate word sense: An Exemplar-based approach. In Proceedings of the 34th ACL Conference. 40-47. Google Scholar
- PORTER, M. F. 1980. An algorithm for suffix stripping. Program: Autom. Libr. Inf. Syst. 14,3, 130-137.Google Scholar
- SALTON, G., FOX,E.A.,AND WU, H. 1983. Extended Boolean information retrieval. Commun. ACM 26, 11 (Nov.), 1022-1036. Google Scholar
- SANDERSON, M. 1994. Word sense disambiguation and information retrieval. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Informa-tion Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 142-151. Google Scholar
- SANDERSON, M. 1996. Word sense disambiguaiton and information retrieval. Tech. Rep. TR-1997-7. Deparment of Computing Science, University of Glasgow, Glasgow, UK.Google Scholar
- SCH~TZE, H. 1992. Context space. In Intelligent Probabilistic Approaches to Natural Language Papers from the 1992 Fall Symposium. AAAI Press, Menlo Park, CA, 113-120.Google Scholar
- SCH~TZE,H.AND PEDERSEN, J. O. 1995. Information retrieval based on word senses. In Symposium on Document Analysis and Information Retrieval (Las Vegas, NV). 161-175.Google Scholar
- SMALL,S.AND RIEGER, C. 1982. Parsing and comprehending with word experts (a theory and its realisation). In Strategies for Natural Language Processing, W. G. Lehnert and M. H. Ringle, Eds. 89-148.Google Scholar
- SMEATON,A.F.AND QUIGLEY, I. 1996. Experiments on using semantic distances between words in image caption retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '96, Zurich, Switzerland, Aug. 18-22), H.-P. Frei, D. Harman, P. Scha~bie, and R. Wilkinson, Eds. ACM Press, New York, NY, 174-180. Google Scholar
- SMEATON,A.F.AND SPITZ, A. L. 1997. Using character shape coding for information retrieval. In Proceedings of the International Conference on Document Analysis and Recognition. Google Scholar
- SPARCK JONES,K.AND VAN RIJSBERGEN, C. J. 1976. Progress in documentation. J. Doc. 32,1 (Mar.), 59-75.Google Scholar
- SUSSNA, M. 1993. Word sense disambiguation for free-text indexing using a massive semantic network. In Proceedings of the 2nd International Conference on Information and Knowledge Management (CIKM '93, Washington, DC, Nov. 1-5), B. Bhargava, T. Finin, and Y. Yesha, Eds. ACM Press, New York, NY, 67-74. Google Scholar
- VAN RIJSBERGEN, C. J. 1979. Information Retrieval. 2nd ed. Butterworths, London, UK. Google Scholar
- VOORHEES, E. M. 1993. Using WordNet to disambiguate word senses for text retrieval. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '93, Pittsburgh, PA, June 27-July 1), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 171-180. Google Scholar
- VOORHEES, E. M. 1994. Query expansion using lexical-semantic relations. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Informa-tion Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 61-69. Google Scholar
- VOORHEES,E.AND HARMAN, D. 1996. Overview of the Fifth Text REtrieval Conference (TREC-5). In Proceedings of the 5th Text Retrieval Conference (TREC-5, Gaithersburg, MD, Nov.), E. M. Voorhees and D. K. Harman, Eds. National Institute of Standards and Technology, Gaithersburg, MD.Google Scholar
- WALLIS, P. 1993. Information retrieval based on paraphrase. In Proceedings of the 1st PACLING Conference.Google Scholar
- WEISS, S. F. 1973. Learning to disambiguate. Inf. Storage Retrieval 9, 33-41.Google Scholar
- WILKS, Y., FASS, D., GUO, C., MACDONALD,J.E.,PLATE, T., AND SLATOR, B. M. 1990. Providing machine tractable dictionary tools. Mach. Transl. 5, 2, 99-154.Google Scholar
- XU,J.AND CROFT, W. B. 1998. Corpus-based stemming using cooccurrence of word variants. ACM Trans. Inf. Syst. 16, 1, 61-81. Google Scholar
- YAROWSKY, D. 1993. One sense per collocation. In Proceedings of the ARPA Human Language Technology Workshop. Google Scholar
- ZIPF, G. K. 1949. Human Behavior and Principle of Least Effort: An Introduction to Human Ecology. Addison-Wesley, Reading, MA.Google Scholar
Index Terms
- The impact on retrieval effectiveness of skewed frequency distributions
Recommendations
Information retrieval using word senses: root sense tagging approach
SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrievalInformation retrieval using word senses is emerging as a good research challenge on semantic information retrieval. In this paper, we propose a new method using word senses in information retrieval: root sense tagging method. This method assigns coarse-...
Retrieving with Good Sense
AbstractAlthough always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation ...
Does word sense disambiguation improve information retrieval?
ESAIR '11: Proceedings of the fourth workshop on Exploiting semantic annotations in information retrievalA basic form of semantic annotation is to label a word in a document with its correct sense based on the context in which the word occurs, thus providing the disambiguated sense of the word. Performing this task automatically is known as word sense ...
Comments