Skip to main content
Top
Published in: Discover Computing 6/2006

01-12-2006

Swedish full text retrieval: Effectiveness of different combinations of indexing strategies with query terms

Authors: Per Ahlgren, Jaana Kekäläinen

Published in: Discover Computing | Issue 6/2006

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this paper, which treats Swedish full text retrieval, the problem of morphological variation of query terms in the document database is studied. The Swedish CLEF 2003 test collection was used, and the effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Four of the seven tested combinations involved indexing strategies that used normalization, a form of conflation. All of these four combinations employed compound splitting, both during indexing and at query phase. SWETWOL, a morphological analyzer for the Swedish language, was used for normalization and compound splitting. A fifth combination used stemming, while a sixth attempted to group related terms by right hand truncation of query terms. The truncation was performed by a search expert. These six combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. Both the truncation combination, the four combinations based on normalization and the stemming combination outperformed the baseline. Truncation had the best performance. The main conclusion of the paper is that truncation, normalization and stemming enhanced retrieval effectiveness in comparison to the baseline. Further, normalization and stemming were not far below truncation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Ahlgren, P. (2004). The effects of indexing strategy-query term combination on retrieval effectiveness in a Swedish full text database. PhD thesis, University College of Borås and Göteborg University. Ahlgren, P. (2004). The effects of indexing strategy-query term combination on retrieval effectiveness in a Swedish full text database. PhD thesis, University College of Borås and Göteborg University.
go back to reference Alkula, R. (2001). From plain character strings to meaningful words: producing better full text databases for Finnish with morphological analysis software. Information Retrieval, 4 (3/4), 195–208.CrossRefMATH Alkula, R. (2001). From plain character strings to meaningful words: producing better full text databases for Finnish with morphological analysis software. Information Retrieval, 4 (3/4), 195–208.CrossRefMATH
go back to reference Allan, J., Callan, J., Croft, W. B., Ballesteros, L., Byrd, D., Swan, R., & Xu, J. (1997). INQUERY does battle with TREC-6. In The Sixth Text Retrieval Conference (TREC-6), pp. 169–206. Allan, J., Callan, J., Croft, W. B., Ballesteros, L., Byrd, D., Swan, R., & Xu, J. (1997). INQUERY does battle with TREC-6. In The Sixth Text Retrieval Conference (TREC-6), pp. 169–206.
go back to reference Braschler, M., & Ripplinger, B. (2004). How effective is stemming and compounding for german text retrieval? Information Retrieval, 7 (3/4), 291–316.CrossRef Braschler, M., & Ripplinger, B. (2004). How effective is stemming and compounding for german text retrieval? Information Retrieval, 7 (3/4), 291–316.CrossRef
go back to reference Callan, J., Croft, W. B., & Broglio, J. (1995). TREC and TIPSTER experiments with INQUERY. Information Processing & Management, 31 (3), 327–343.CrossRef Callan, J., Croft, W. B., & Broglio, J. (1995). TREC and TIPSTER experiments with INQUERY. Information Processing & Management, 31 (3), 327–343.CrossRef
go back to reference Carlberger, J., Dalianis, H., Hassel, M., & Knutsson, O. (2001). Improving precision in information retrieval for Swedish using stemming. In Proceedings of NODALIDA '01 - 13th Nordic Conference on Computational Linguistics. Carlberger, J., Dalianis, H., Hassel, M., & Knutsson, O. (2001). Improving precision in information retrieval for Swedish using stemming. In Proceedings of NODALIDA '01 - 13th Nordic Conference on Computational Linguistics.
go back to reference Ekmekçioglu, F. C., & Willett, P. (2000). Effectiveness of stemming for Turkish text retrieval. Program, 34 (2), 195–200. Ekmekçioglu, F. C., & Willett, P. (2000). Effectiveness of stemming for Turkish text retrieval. Program, 34 (2), 195–200.
go back to reference Frakes, W. B. (1992). Stemming algorithms. In W. B. Frakes, and R. Baeza-Yates (Eds.), Information retrieval: data structures & algorithms (pp. 131–160). Prentice Hall, Englewood Cliffs. Frakes, W. B. (1992). Stemming algorithms. In W. B. Frakes, and R. Baeza-Yates (Eds.), Information retrieval: data structures & algorithms (pp. 131–160). Prentice Hall, Englewood Cliffs.
go back to reference Harman, D. (1991). How effective is suffixing? Journal of the American Society for Information Science, 42 (1), 7–15.CrossRef Harman, D. (1991). How effective is suffixing? Journal of the American Society for Information Science, 42 (1), 7–15.CrossRef
go back to reference Hull, D. (1996). Stemming algorithms: a case study for detailed evaluation. Journal of the American Society for Information Science, 47 (1), 70–84.CrossRef Hull, D. (1996). Stemming algorithms: a case study for detailed evaluation. Journal of the American Society for Information Science, 47 (1), 70–84.CrossRef
go back to reference Kalamboukis, T. Z. (1995). Suffix stripping with modern Greek. Program, 29 (3), 313–321. Kalamboukis, T. Z. (1995). Suffix stripping with modern Greek. Program, 29 (3), 313–321.
go back to reference Karlsson, F. (1992). SWETWOL: a comprehensive morphological analyser for Swedish. Nordic Journal of Linguistics, 15 (1), 1–45.MathSciNetCrossRef Karlsson, F. (1992). SWETWOL: a comprehensive morphological analyser for Swedish. Nordic Journal of Linguistics, 15 (1), 1–45.MathSciNetCrossRef
go back to reference Kettunen, K., Kunttu, T., & Järvelin, K. (2005) . To stem or lemmatize a highly inflectional language in a probabilistic IR environment? Journal of Documentation, 61 (4), 476–496.CrossRef Kettunen, K., Kunttu, T., & Järvelin, K. (2005) . To stem or lemmatize a highly inflectional language in a probabilistic IR environment? Journal of Documentation, 61 (4), 476–496.CrossRef
go back to reference Koskenniemi, K. (1983). Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki. Koskenniemi, K. (1983). Two-level morphology: a general computational model for word-form recognition and production. PhD thesis, University of Helsinki.
go back to reference Kraaij, W., & Pohlman, R. (1996). Viewing stemming as recall enhancement. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 40–48. Kraaij, W., & Pohlman, R. (1996). Viewing stemming as recall enhancement. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 40–48.
go back to reference Krovetz, R. (1993). Viewing morphology as an inference process. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202. Krovetz, R. (1993). Viewing morphology as an inference process. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202.
go back to reference Lennon, M., Pierce, D. S., Tarry, B. D., & Willett, P. (1981). An evaluation of some conflation algorithms for information retrieval. Journal of Information Science, 3 (4), 177–183.CrossRef Lennon, M., Pierce, D. S., Tarry, B. D., & Willett, P. (1981). An evaluation of some conflation algorithms for information retrieval. Journal of Information Science, 3 (4), 177–183.CrossRef
go back to reference Malmgren, S. (1994). Svensk lexikologi. Ord, ordbildning, ordböcker och orddatabaser [Swedish lexicology. Words, word formation, dictionaries and word databases]. Studentlitteratur, Lund. Malmgren, S. (1994). Svensk lexikologi. Ord, ordbildning, ordböcker och orddatabaser [Swedish lexicology. Words, word formation, dictionaries and word databases]. Studentlitteratur, Lund.
go back to reference Popoviˇ, M., & Willett, P. (1992). The effectiveness of stemming for natural-language access to Slovene textual data. Journal of the American Society for Information Science, 43 (5), 384–390.CrossRef Popoviˇ, M., & Willett, P. (1992). The effectiveness of stemming for natural-language access to Slovene textual data. Journal of the American Society for Information Science, 43 (5), 384–390.CrossRef
go back to reference Porter, M. (2001). Snowball: A language for stemming algorithms. URL http://snowball.tartarus.org/texts/introduction.html. Visited May 4th, 2006. Porter, M. (2001). Snowball: A language for stemming algorithms. URL http://​snowball.​tartarus.​org/​texts/​introduction.​html.​ Visited May 4th, 2006.
go back to reference Rajashekar, T. B., & Croft, W. B. (1995). Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 46 (4), 272–283.CrossRef Rajashekar, T. B., & Croft, W. B. (1995). Combining automatic and manual index representations in probabilistic retrieval. Journal of the American Society for Information Science, 46 (4), 272–283.CrossRef
go back to reference Savoy, J. (2003). Report on CLEF-2003 monolingual tracks: fusion of probabilistic models for effective monolingual retrieval. In Working Notes for the CLEF 2003 Workshop. Savoy, J. (2003). Report on CLEF-2003 monolingual tracks: fusion of probabilistic models for effective monolingual retrieval. In Working Notes for the CLEF 2003 Workshop.
go back to reference Savoy, J. (1999). A stemming procedure and stopword list for general French corpora. Journal of the American Society for Information Science, 50 (10), 944–952.CrossRef Savoy, J. (1999). A stemming procedure and stopword list for general French corpora. Journal of the American Society for Information Science, 50 (10), 944–952.CrossRef
go back to reference Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. 2nd edition, McGraw-Hill, New York, NY. Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the behavioral sciences. 2nd edition, McGraw-Hill, New York, NY.
go back to reference Thorell, O. (1977). Svensk grammatik [Swedish grammar]. Esselte studium, Stockholm. Thorell, O. (1977). Svensk grammatik [Swedish grammar]. Esselte studium, Stockholm.
go back to reference Tomlinson, S. (2002). Experiments in 8 European languages with Hummingbird Searchserver\(^{\textsc{tm}}\) at CLEF 2002. In Working Notes for the CLEF 2002 Workshop. Tomlinson, S. (2002). Experiments in 8 European languages with Hummingbird Searchserver\(^{\textsc{tm}}\) at CLEF 2002. In Working Notes for the CLEF 2002 Workshop.
go back to reference Tomlinson, S. (2003). Lexical and algorithmic stemming compared for 9 European languages with Hummingbird Searchserver\(^{\textsc{tm}}\) at CLEF 2003. In Working Notes for the CLEF 2003 Workshop. Tomlinson, S. (2003). Lexical and algorithmic stemming compared for 9 European languages with Hummingbird Searchserver\(^{\textsc{tm}}\) at CLEF 2003. In Working Notes for the CLEF 2003 Workshop.
go back to reference Turtle, H. R. (1990). Inference networks for document retrieval. PhD thesis, University of Massachusetts. Turtle, H. R. (1990). Inference networks for document retrieval. PhD thesis, University of Massachusetts.
go back to reference Turtle, H. R., & Croft, W. B. (1990). Inference networks for document retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1–24. Turtle, H. R., & Croft, W. B. (1990). Inference networks for document retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1–24.
go back to reference Turtle, H. R., & Croft, W. B. (1991). Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9 (3), 187–222.CrossRef Turtle, H. R., & Croft, W. B. (1991). Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9 (3), 187–222.CrossRef
go back to reference Voorhees, E. M. (2004). Overview of TREC 2003. In Proceedings of the Twelfth Text REtrieval Conference (TREC 2003), pp. 1–13. Voorhees, E. M. (2004). Overview of TREC 2003. In Proceedings of the Twelfth Text REtrieval Conference (TREC 2003), pp. 1–13.
Metadata
Title
Swedish full text retrieval: Effectiveness of different combinations of indexing strategies with query terms
Authors
Per Ahlgren
Jaana Kekäläinen
Publication date
01-12-2006
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 6/2006
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-006-9009-1

Other articles of this Issue 6/2006

Discover Computing 6/2006 Go to the issue

Premium Partner