Skip to main content
Top
Published in: Discover Computing 3/2009

01-06-2009

Current research issues and trends in non-English Web searching

Authors: Fotis Lazarinis, Jesús Vilares, John Tait, Efthimis N. Efthimiadis

Published in: Discover Computing | Issue 3/2009

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web documents and user queries are becoming major issues for search engines. The main aim of this review paper is to make researchers aware of the existing problems in monolingual non-English Web retrieval by providing an overview of open issues. A significant number of papers are reviewed and the research issues investigated in these studies are categorized in order to identify the research questions and solutions proposed in these papers. Further research is proposed at the end of each section.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
During the years, their activity has also extended to other languages out of their main sphere, as in the case of Arabic in TREC or Persian in CLEF. Moreover, they have also extended their scope to more specialized tasks such as speech retrieval or geographical retrieval, and to other information processing tasks such as question answering.
 
Literature
go back to reference Ahlgren, P., & Kekäläinen, J. (2006). Swedish full text retrieval: Effectiveness of different combinations of indexing strategies with query terms. Information Retrieval, 9(6), 681–697. doi:10.1007/s10791-006-9009-1.CrossRef Ahlgren, P., & Kekäläinen, J. (2006). Swedish full text retrieval: Effectiveness of different combinations of indexing strategies with query terms. Information Retrieval, 9(6), 681–697. doi:10.​1007/​s10791-006-9009-1.CrossRef
go back to reference Aho, A. V., Sethi, R., & Ullman, J. D. (1986). Compilers: Principles, techniques and tools. Addison-Wesley. Aho, A. V., Sethi, R., & Ullman, J. D. (1986). Compilers: Principles, techniques and tools. Addison-Wesley.
go back to reference Alemayehu, N., & Willett, P. (2003). The effectiveness of stemming for information retrieval in Amharic. Program: Electronic Library and Information Systems, 37(4), 254–259.CrossRef Alemayehu, N., & Willett, P. (2003). The effectiveness of stemming for information retrieval in Amharic. Program: Electronic Library and Information Systems, 37(4), 254–259.CrossRef
go back to reference Amaral, C., Laurent, D., Martins, A., Mendes, A., & Pinto, C. (2004). Design & implementation of a semantic search engine for Portuguese. In Proceedings of the fourth conference on language resources and evaluation. Amaral, C., Laurent, D., Martins, A., Mendes, A., & Pinto, C. (2004). Design & implementation of a semantic search engine for Portuguese. In Proceedings of the fourth conference on language resources and evaluation.
go back to reference Arampatzis, A., van der Weide, T. P., van Bommel, P., & Koster, C. H. A. (2000). Linguistically motivated information retrieval. In Encyclopedia of library and information science (Vol. 69, pp. 201–222). Marcel Dekker. Arampatzis, A., van der Weide, T. P., van Bommel, P., & Koster, C. H. A. (2000). Linguistically motivated information retrieval. In Encyclopedia of library and information science (Vol. 69, pp. 201–222). Marcel Dekker.
go back to reference Artemenko, O., Mandl, T., Shramko, M., & Womser-Hacker, C. (2006). Evaluation of a language identification system for mono- and multilingual text documents. In Proceedings of the 2006 ACM symposium on applied computing (pp. 859–860). ACM. doi:10.1145/1141277.1141473. Artemenko, O., Mandl, T., Shramko, M., & Womser-Hacker, C. (2006). Evaluation of a language identification system for mono- and multilingual text documents. In Proceedings of the 2006 ACM symposium on applied computing (pp. 859–860). ACM. doi:10.​1145/​1141277.​1141473.
go back to reference Asker, L., Argaw, A., Gambäck, B., Asfeha, S. E., & Habte, L. N. (2009, this issue). Classifying amharic webnews, information retrieval Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading, MA: Addison Wesley, ACM Press. Asker, L., Argaw, A., Gambäck, B., Asfeha, S. E., & Habte, L. N. (2009, this issue). Classifying amharic webnews, information retrieval Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading, MA: Addison Wesley, ACM Press.
go back to reference Baeza-Yates, R., Dupret, G., & Velasco, J. (2007). A study of mobile search queries in japan. In E. Amitay, C. G. Murray, & J. Teevan (Eds.), Query log analysis: Social and technological challenges. A workshop at the 16th international World Wide Web conference (WWW 2007). Baeza-Yates, R., Dupret, G., & Velasco, J. (2007). A study of mobile search queries in japan. In E. Amitay, C. G. Murray, & J. Teevan (Eds.), Query log analysis: Social and technological challenges. A workshop at the 16th international World Wide Web conference (WWW 2007).
go back to reference Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading, MA: Addison Wesley, ACM Press. Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Reading, MA: Addison Wesley, ACM Press.
go back to reference Barcala, F. M., Vilares, J., Alonso, M. A., Graña, J., & Vilares, M. (2002). Tokenization and proper noun recognition for information retrieval. In Proceedings of thirteen international workshop on database and expert systems applications (pp. 246–250). Barcala, F. M., Vilares, J., Alonso, M. A., Graña, J., & Vilares, M. (2002). Tokenization and proper noun recognition for information retrieval. In Proceedings of thirteen international workshop on database and expert systems applications (pp. 246–250).
go back to reference Berendt, B., & Kralisch, A. (2009, this issue). A user-centric approach to identifying best deployment strategies for language tools: The impact of content and access language on Web user behaviour and attitudes. Information Retrieval. Berendt, B., & Kralisch, A. (2009, this issue). A user-centric approach to identifying best deployment strategies for language tools: The impact of content and access language on Web user behaviour and attitudes. Information Retrieval.
go back to reference Bitirim, Y., Tonta, Y., & Sever, H. (2002). Information retrieval effectiveness of Turkish search engines. In Advances in information systems, Vol. 2457 of lecture notes in computer science (pp. 93–103). Bitirim, Y., Tonta, Y., & Sever, H. (2002). Information retrieval effectiveness of Turkish search engines. In Advances in information systems, Vol. 2457 of lecture notes in computer science (pp. 93–103).
go back to reference Blanco, R., & Barreiro, A. (2007). Static pruning of terms in inverted files. In Advances in information retrieval, Vol. 4425 of lecture notes in computer science (pp. 64–75). Blanco, R., & Barreiro, A. (2007). Static pruning of terms in inverted files. In Advances in information retrieval, Vol. 4425 of lecture notes in computer science (pp. 64–75).
go back to reference Blanco, R, & Lioma, C. (2009, this issue). Mixed monolingual homepage finding in 35 languages: The role of language script and search domain. Information Retrieval. Blanco, R, & Lioma, C. (2009, this issue). Mixed monolingual homepage finding in 35 languages: The role of language script and search domain. Information Retrieval.
go back to reference Brill, E., Kacmarcik, G., & Brockett, C. (2001). Automatically harvesting Katakana-English term pairs from search engine query log. In Proceedings of natural language processing pacific rim symposium (pp. 393–399). Brill, E., Kacmarcik, G., & Brockett, C. (2001). Automatically harvesting Katakana-English term pairs from search engine query log. In Proceedings of natural language processing pacific rim symposium (pp. 393–399).
go back to reference Cavnar, W. B., & Trenkle, J. M. (1994). N-Gram-based text categorization. In 3rd annual symposium on document analysis and information retrieval (pp. 161–176). Las Vegas, Nevada, USA. Cavnar, W. B., & Trenkle, J. M. (1994). N-Gram-based text categorization. In 3rd annual symposium on document analysis and information retrieval (pp. 161–176). Las Vegas, Nevada, USA.
go back to reference Chau, M., Fang, X., & Yang, C. (2007). Web searching in Chinese: A study of a search engine in Hong Kong. Journal of the American Society for Information Science American Society for Information Science, 58(7), 1044–1054. doi:10.1002/asi.20592.CrossRef Chau, M., Fang, X., & Yang, C. (2007). Web searching in Chinese: A study of a search engine in Hong Kong. Journal of the American Society for Information Science American Society for Information Science, 58(7), 1044–1054. doi:10.​1002/​asi.​20592.CrossRef
go back to reference Chen, A., & Gey, F. (2002). Building an Arabic stemmer for information retrieval. In TREC 2002 (pp. 631–639). Gaithersburg: NIST. Chen, A., & Gey, F. (2002). Building an Arabic stemmer for information retrieval. In TREC 2002 (pp. 631–639). Gaithersburg: NIST.
go back to reference Chen, K., & Liu, S. (1992). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on computational linguistics (pp. 101–107). doi:10.3115/992066.992085. Chen, K., & Liu, S. (1992). Word identification for Mandarin Chinese sentences. In Proceedings of the 14th conference on computational linguistics (pp. 101–107). doi:10.​3115/​992066.​992085.
go back to reference Chorós, K. (2005). Testing the effectiveness of retrieval to queries using polish words with diacritics. In AWIC 2005, Vol. 3528 of lecture notes in artificial intelligence (pp. 101–106). Chorós, K. (2005). Testing the effectiveness of retrieval to queries using polish words with diacritics. In AWIC 2005, Vol. 3528 of lecture notes in artificial intelligence (pp. 101–106).
go back to reference Darwish, K., & Oard, D. (2007). Adapting morphology for arabic information retrieval, In Adapting morphology for arabic information retrieval (pp. 245–262). Springer. 978-1-4020-6045-8. Darwish, K., & Oard, D. (2007). Adapting morphology for arabic information retrieval, In Adapting morphology for arabic information retrieval (pp. 245–262). Springer. 978-1-4020-6045-8.
go back to reference Demirci, R., Kismir, V., & Bitirim, Y. (2007). An evaluation of popular search engines on finding turkish documents. In 2nd IEEE international conference on Internet and Web applications and services (ICIW’07). doi:10.1109/ICIW.2007.15. Demirci, R., Kismir, V., & Bitirim, Y. (2007). An evaluation of popular search engines on finding turkish documents. In 2nd IEEE international conference on Internet and Web applications and services (ICIW’07). doi:10.​1109/​ICIW.​2007.​15.
go back to reference De Vries, A. P. (2001). A poor man’s approach to CLEF. In Cross-language information retrieval and evaluation, Vol. 2069 of lecture notes in computer science (pp. 149–155). De Vries, A. P. (2001). A poor man’s approach to CLEF. In Cross-language information retrieval and evaluation, Vol. 2069 of lecture notes in computer science (pp. 149–155).
go back to reference Di Nunzio, G. M., Ferro, N., Melucci, M., & Orio, N. (2004). Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation. In Comparative evaluation of multilingual information access systems, Vol. 3237 of lecture notes in computer science (pp. 220–235). Di Nunzio, G. M., Ferro, N., Melucci, M., & Orio, N. (2004). Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation. In Comparative evaluation of multilingual information access systems, Vol. 3237 of lecture notes in computer science (pp. 220–235).
go back to reference Dunning, T. (1994). Statistical identification of language. Technical Report MCCS, 94-273. New Mexico: New Mexico State University. Dunning, T. (1994). Statistical identification of language. Technical Report MCCS, 94-273. New Mexico: New Mexico State University.
go back to reference Efthimiadis, E. N. (2008). How do Greeks search the web? A query log analysis study. In Proceeding of the 2nd ACM workshop on improving non english web searching, Napa Valley, California, USA, October 30–30, 2008. iNEWS ’08 (pp. 81–84). New York, NY: ACM. doi:10.1145/1460027.1460041. Efthimiadis, E. N. (2008). How do Greeks search the web? A query log analysis study. In Proceeding of the 2nd ACM workshop on improving non english web searching, Napa Valley, California, USA, October 30–30, 2008. iNEWS ’08 (pp. 81–84). New York, NY: ACM. doi:10.​1145/​1460027.​1460041.
go back to reference Efthimiadis, E. N., Malevris, N., Kousaridas, A., Lepeniotou, A., & Loutas, N. (2008). An evaluation of how search engines respond to greek language queries. In Proceedings of the 41st annual Hawaii international conference on system sciences (HICSS 2008). doi:10.1109/HICSS.2008.52. Efthimiadis, E. N., Malevris, N., Kousaridas, A., Lepeniotou, A., & Loutas, N. (2008). An evaluation of how search engines respond to greek language queries. In Proceedings of the 41st annual Hawaii international conference on system sciences (HICSS 2008). doi:10.​1109/​HICSS.​2008.​52.
go back to reference Efthimiadis, E. N., Malevris, N., Kousaridas, A., Lepeniotou, A., & Loutas, N. (2009, this issue). Non-English Web search: An evaluation of indexing and searching the Greek Web. Information Retrieval. Efthimiadis, E. N., Malevris, N., Kousaridas, A., Lepeniotou, A., & Loutas, N. (2009, this issue). Non-English Web search: An evaluation of indexing and searching the Greek Web. Information Retrieval.
go back to reference Eguchi, K., & Croft, B. (2009, this issue). Query structuring and expansion with two-stage term dependence for Japanese Web retrieval. Information Retrieval. Eguchi, K., & Croft, B. (2009, this issue). Query structuring and expansion with two-stage term dependence for Japanese Web retrieval. Information Retrieval.
go back to reference Ekmekçioglu, Ç., & Willett, P. (2000). Effectiveness of stemming for Turkish text retrieval. Program, 34(2), 195–200. Ekmekçioglu, Ç., & Willett, P. (2000). Effectiveness of stemming for Turkish text retrieval. Program, 34(2), 195–200.
go back to reference Figuerola, C. G., Gómez, R., Zazo-Rodríguez, A. F., & Alonso-Berrocal, J. L. (2001). Stemming in Spanish: A first approach to its impact on information retrieval. In Working notes for the CLEF 2001 workshop. Figuerola, C. G., Gómez, R., Zazo-Rodríguez, A. F., & Alonso-Berrocal, J. L. (2001). Stemming in Spanish: A first approach to its impact on information retrieval. In Working notes for the CLEF 2001 workshop.
go back to reference Frakes, W., & Baeza-Yates, R. (1992). Information retrieval: Data structures and algorithms. Prentice Hall. Frakes, W., & Baeza-Yates, R. (1992). Information retrieval: Data structures and algorithms. Prentice Hall.
go back to reference Goldsmith, J., & Reutter, T. (1999). Automatic collection and analysis of German compounds. In F. Busa, I. Mani, & P. Saint-Dizier (Eds.), The computational treatment of nominals: Proceedings of the workshop COLING-ACL‘’98 (pp. 61–69), Montreal. Goldsmith, J., & Reutter, T. (1999). Automatic collection and analysis of German compounds. In F. Busa, I. Mani, & P. Saint-Dizier (Eds.), The computational treatment of nominals: Proceedings of the workshop COLING-ACL‘’98 (pp. 61–69), Montreal.
go back to reference Gonzalez, M., de Lima, V. L. S., & de Lima, J. V. (2005). Binary lexical relations for text representation in information retrieval. In Natural language processing and information systems, Vol. 3513 of lecture notes in computer science (pp. 21–31). Gonzalez, M., de Lima, V. L. S., & de Lima, J. V. (2005). Binary lexical relations for text representation in information retrieval. In Natural language processing and information systems, Vol. 3513 of lecture notes in computer science (pp. 21–31).
go back to reference Graña, J., Barcala, F. M., & Vilares, J. (2002). Formal methods of tokenization for part-of-speech tagging. In Computational linguistics and intelligent text processing, Vol. 2276 of l ecture notes in computer science (pp. 240–249). Graña, J., Barcala, F. M., & Vilares, J. (2002). Formal methods of tokenization for part-of-speech tagging. In Computational linguistics and intelligent text processing, Vol. 2276 of l ecture notes in computer science (pp. 240–249).
go back to reference Graña, J., Chappelier, J. C., & Vilares, M. (2001). Integrating external dictionaries into stochastic part-of-speech taggers. In Proceedings of EuroConference recent advances in natural language processing (RANLP 2001) (pp. 122–128). Graña, J., Chappelier, J. C., & Vilares, M. (2001). Integrating external dictionaries into stochastic part-of-speech taggers. In Proceedings of EuroConference recent advances in natural language processing (RANLP 2001) (pp. 122–128).
go back to reference Grefenstette, G. (1995). Comparing two language identification schemes. In 3rd international conference on the statistical analysis of textual data (JADT’95) (pp. 263–268), Rome Grefenstette, G. (1995). Comparing two language identification schemes. In 3rd international conference on the statistical analysis of textual data (JADT’95) (pp. 263–268), Rome
go back to reference Guzman, R., Montes-y-Gómez, M., Rosso, P., & Villaseñor-Pineda, L. (2009, this issue). Using the Spanish Web for self-training text classification tasks. Information Retrieval. Guzman, R., Montes-y-Gómez, M., Rosso, P., & Villaseñor-Pineda, L. (2009, this issue). Using the Spanish Web for self-training text classification tasks. Information Retrieval.
go back to reference Hammo, B. H. (2009, this issue). Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Information Retrieval. Hammo, B. H. (2009, this issue). Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents. Information Retrieval.
go back to reference Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. An introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall. Jurafsky, D., & Martin, J. H. (2000). Speech and language processing. An introduction to natural language processing, computational linguistics and speech recognition. Prentice Hall.
go back to reference Kalamboukis, T. Z. (1995). Suffix stripping with modern Greek. Program, 29(4), 313–321. Kalamboukis, T. Z. (1995). Suffix stripping with modern Greek. Program, 29(4), 313–321.
go back to reference Lazarinis, F. (2007b). Engineering and utilizing a stopword list in Greek Web retrieval. Journal of the American Society for Information Science and Technology, 58(11), 1645–1652. doi:10.1002/asi.20648.CrossRef Lazarinis, F. (2007b). Engineering and utilizing a stopword list in Greek Web retrieval. Journal of the American Society for Information Science and Technology, 58(11), 1645–1652. doi:10.​1002/​asi.​20648.CrossRef
go back to reference Lazarinis, F. (2008b). Retrieving non-Latin information in a Latin Web: The case of Greek. In Y.-F. B. Wu & M. Song (Eds.), Handbook of research on text and Web mining Ttchnologies (pp. 530–545). IDEA Publishing. Lazarinis, F. (2008b). Retrieving non-Latin information in a Latin Web: The case of Greek. In Y.-F. B. Wu & M. Song (Eds.), Handbook of research on text and Web mining Ttchnologies (pp. 530–545). IDEA Publishing.
go back to reference Lazarinis, F. (2008c). Towards a model for evaluating web retrieval systems in non English queries. In C. Calero, M. A. Moraga, & M. Piattini (Eds.), Handbook of research on Web information systems quality (pp. 510–527). USA: Idea Group Inc. Lazarinis, F. (2008c). Towards a model for evaluating web retrieval systems in non English queries. In C. Calero, M. A. Moraga, & M. Piattini (Eds.), Handbook of research on Web information systems quality (pp. 510–527). USA: Idea Group Inc.
go back to reference Lazarinis, F., & Efthimiadis, E. N. (2008). Measuring search engine quality in image queries in 10 non-English languages: An exploratory study. In Proceeding of the 2nd ACM workshop on improving non English Web searching, Napa Valley, California, USA, October 30–30, 2008. iNEWS ’08 (pp. 9–92). New York, NY: ACM. doi:10.1145/1460027.1460043. Lazarinis, F., & Efthimiadis, E. N. (2008). Measuring search engine quality in image queries in 10 non-English languages: An exploratory study. In Proceeding of the 2nd ACM workshop on improving non English Web searching, Napa Valley, California, USA, October 30–30, 2008. iNEWS ’08 (pp. 9–92). New York, NY: ACM. doi:10.​1145/​1460027.​1460043.
go back to reference Lazarinis, F., Efthimiadis, E. N., Vilares, J., & Tait, J. (2008). Improving non-English Web searching (iNEWS08). In Proceedings of ACM-CIKM workshop. Lazarinis, F., Efthimiadis, E. N., Vilares, J., & Tait, J. (2008). Improving non-English Web searching (iNEWS08). In Proceedings of ACM-CIKM workshop.
go back to reference Leturia, I., Gurrutxaga, A., Areta, N., Alegria, I., & Ezeiza, A. (2007). EusBila, a search service designed for the agglutinative nature of Basque, In F. Lazarinis, J. Vilares, & J. Tait (Eds.), Improving non-English Web searching (iNEWS07). SIGIR07 workshop (pp. 47–54). Leturia, I., Gurrutxaga, A., Areta, N., Alegria, I., & Ezeiza, A. (2007). EusBila, a search service designed for the agglutinative nature of Basque, In F. Lazarinis, J. Vilares, & J. Tait (Eds.), Improving non-English Web searching (iNEWS07). SIGIR07 workshop (pp. 47–54).
go back to reference Lewandowski, D. (2006). Query types and search topics of German Web search engine users. Information Services & Use, 26(4), 261–270. Lewandowski, D. (2006). Query types and search topics of German Web search engine users. Information Services & Use, 26(4), 261–270.
go back to reference Lo, R. T. W., He, B., & Ounis, I. (2005). Automatically building a stopword list for an information retrieval system. In Proceedings of 5th Dutch-Belgian information retrieval workshop (DIR’05). Lo, R. T. W., He, B., & Ounis, I. (2005). Automatically building a stopword list for an information retrieval system. In Proceedings of 5th Dutch-Belgian information retrieval workshop (DIR’05).
go back to reference Long, H., Lv, B., Zhao, T., & Liu, Y. (2007). Evaluate and compare Chinese internet search engines based on users’ experience. In Proceedings of IEEE wireless communications, networking and mobile computing conference (WiCom 2007) (pp. 6134–6137). doi:10.1109/WICOM.2007.1504. Long, H., Lv, B., Zhao, T., & Liu, Y. (2007). Evaluate and compare Chinese internet search engines based on users’ experience. In Proceedings of IEEE wireless communications, networking and mobile computing conference (WiCom 2007) (pp. 6134–6137). doi:10.​1109/​WICOM.​2007.​1504.
go back to reference Macdonald, C., Lioma, C., & Ounis, I. (2007). Terrier takes on the non-English Web, In F. Lazarinis, J. Vilares, & J. Tait (Eds.), Improving non-English Web searching (iNEWS07). ACM SIGIR07 Workshop (pp. 21–28). Macdonald, C., Lioma, C., & Ounis, I. (2007). Terrier takes on the non-English Web, In F. Lazarinis, J. Vilares, & J. Tait (Eds.), Improving non-English Web searching (iNEWS07). ACM SIGIR07 Workshop (pp. 21–28).
go back to reference Machill, M., Neuberger, C., Schweiger, W., & Wirth, W. (2004). Navigating the Internet: A Study of German-language search engines. European Journal of Communication, 19(3), 321–347. doi:10.1177/0267323104045258.CrossRef Machill, M., Neuberger, C., Schweiger, W., & Wirth, W. (2004). Navigating the Internet: A Study of German-language search engines. European Journal of Communication, 19(3), 321–347. doi:10.​1177/​0267323104045258​.CrossRef
go back to reference Makrehchi, M., & Kamel, M. S. (2008). Automatic extraction of domain-specific stopwords from labeled documents. In Advances in information retrieval, Vol. 4956 of l ecture notes in computer science (pp. 222–233). Makrehchi, M., & Kamel, M. S. (2008). Automatic extraction of domain-specific stopwords from labeled documents. In Advances in information retrieval, Vol. 4956 of l ecture notes in computer science (pp. 222–233).
go back to reference Mandl, T., & de la Cruz, T. (2009). International differences in web page evaluation guidelines, International Journal of Intercultural Information Management (to appear). Mandl, T., & de la Cruz, T. (2009). International differences in web page evaluation guidelines, International Journal of Intercultural Information Management (to appear).
go back to reference Martins, B., & Silva, M. J. (2005). Language identification in web pages. In SAC ’05: Proceedings of the 2005 ACM symposium on applied computing (pp. 764–768), New York, NY: ACM Press. Martins, B., & Silva, M. J. (2005). Language identification in web pages. In SAC ’05: Proceedings of the 2005 ACM symposium on applied computing (pp. 764–768), New York, NY: ACM Press.
go back to reference Monz, C., & de Rijke, M. (2002). Shallow morphological analysis in monolingual retrieval for Dutch, German, and Italian. In Accessing multilingual information repositories, Vol. 2406 of lecture notes in computer science (pp. 262–277). Monz, C., & de Rijke, M. (2002). Shallow morphological analysis in monolingual retrieval for Dutch, German, and Italian. In Accessing multilingual information repositories, Vol. 2406 of lecture notes in computer science (pp. 262–277).
go back to reference Moreau, F., Claveau, V., & Sébillot, P. (2007). Automatic morphological query expansion using analogy-based machine learning. In Advances in information retrieval, Vol. 4425 of l ecture notes in computer science (pp. 222–233). Moreau, F., Claveau, V., & Sébillot, P. (2007). Automatic morphological query expansion using analogy-based machine learning. In Advances in information retrieval, Vol. 4425 of l ecture notes in computer science (pp. 222–233).
go back to reference Otero, J., Vilares, J., & Vilares, M. (2008). Corrupted queries in Spanish text retrieval: Error correction vs. n-grams. In Workshop proceedings of the ACM 17th conference on information and knowledge management (CIKM 2008): 2nd ACM workshop on improving non-English Web searching (iNEWS’08) (pp. 39–46). ACM. doi:10.1145/1460027.1460034. Otero, J., Vilares, J., & Vilares, M. (2008). Corrupted queries in Spanish text retrieval: Error correction vs. n-grams. In Workshop proceedings of the ACM 17th conference on information and knowledge management (CIKM 2008): 2nd ACM workshop on improving non-English Web searching (iNEWS’08) (pp. 39–46). ACM. doi:10.​1145/​1460027.​1460034.
go back to reference Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of OSIR 2006. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma, C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of OSIR 2006.
go back to reference Palmer, D. D. (2000). Tokenisation and sentence segmentation, chapter 2. In R. Dale, H. Moisi, & H. Somers (Eds.), Handbook of natural language processing. Marcel Dekker. Palmer, D. D. (2000). Tokenisation and sentence segmentation, chapter 2. In R. Dale, H. Moisi, & H. Somers (Eds.), Handbook of natural language processing. Marcel Dekker.
go back to reference Peng, F., Ahmed, N., Li, X., & Lu, Y. (2007). Context sensitive stemming for web search. Proceedings of the 30th ACM SIGIR conference (pp. 639–646). Peng, F., Ahmed, N., Li, X., & Lu, Y. (2007). Context sensitive stemming for web search. Proceedings of the 30th ACM SIGIR conference (pp. 639–646).
go back to reference Peters, C., Gey, F. C., Gonzalo, J., Muller, H., Jones, G. J. F., Kluck, M., et al. (2006). Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science. Spinger-Verlag. Peters, C., Gey, F. C., Gonzalo, J., Muller, H., Jones, G. J. F., Kluck, M., et al. (2006). Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science. Spinger-Verlag.
go back to reference Pingali, P., Jagarlamudi, J., & Varma, V. (2006). WebKhoj: Indian language IR from multiple character encodings. Proceedings of the 15th international conference on World Wide Web (pp. 801–809). Pingali, P., Jagarlamudi, J., & Varma, V. (2006). WebKhoj: Indian language IR from multiple character encodings. Proceedings of the 15th international conference on World Wide Web (pp. 801–809).
go back to reference Piskorski, J. Wieloch, K., & Sydow, M. (2009, this issue). On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages. Information Retrieval. Piskorski, J. Wieloch, K., & Sydow, M. (2009, this issue). On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages. Information Retrieval.
go back to reference Pohlmann, R., & Kraaij, W. (1997). The effect of syntactic phrase indexing on retrieval performance for Dutch texts. In Proceedings of RIAO, ’97 (pp. 176–187). Pohlmann, R., & Kraaij, W. (1997). The effect of syntactic phrase indexing on retrieval performance for Dutch texts. In Proceedings of RIAO, ’97 (pp. 176–187).
go back to reference Porter, M. (1980). An algorithm for Suffix Stripping. Program, 14(3), 130–137. Porter, M. (1980). An algorithm for Suffix Stripping. Program, 14(3), 130–137.
go back to reference Schinke, R., Greengrass, M., Robertson, A. M., & Willett, P. (1996). A stemming algorithm for Latin text databases. Journal of Documentation, 52(2), 172–187.CrossRef Schinke, R., Greengrass, M., Robertson, A. M., & Willett, P. (1996). A stemming algorithm for Latin text databases. Journal of Documentation, 52(2), 172–187.CrossRef
go back to reference Sigurbjörnsson, B., Kamps, J., & de Rijke, M. (2006). EuroGOV: Engineering a multilingual Web corpus. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 825–836). Sigurbjörnsson, B., Kamps, J., & de Rijke, M. (2006). EuroGOV: Engineering a multilingual Web corpus. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 825–836).
go back to reference Spink, A., Wolfram, D., Jansen, B. J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science American Society for Information Science, 52(3), 226–234.CrossRef Spink, A., Wolfram, D., Jansen, B. J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science American Society for Information Science, 52(3), 226–234.CrossRef
go back to reference Sroka, M. (2000). Web search engines for Polish information retrieval: Questions of search capabilities and retrieval performance. The International Information & Library Review, 32(2), 87–98. doi:10.1006/iilr.2000.0128.CrossRef Sroka, M. (2000). Web search engines for Polish information retrieval: Questions of search capabilities and retrieval performance. The International Information & Library Review, 32(2), 87–98. doi:10.​1006/​iilr.​2000.​0128.CrossRef
go back to reference Tomlinson, S. (2006a). Bulgarian and Hungarian experiments with hummingbird searchserver at CLEF 2005. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 194–203). Tomlinson, S. (2006a). Bulgarian and Hungarian experiments with hummingbird searchserver at CLEF 2005. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 194–203).
go back to reference Tomlinson, S. (2006b). Danish and Greek Web search experiments with hummingbird searchserver at CLEF 2005. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 846–855). Tomlinson, S. (2006b). Danish and Greek Web search experiments with hummingbird searchserver at CLEF 2005. In Accessing multilingual information repositories, Vol. 4022 of lecture notes in computer science (pp. 846–855).
go back to reference Tongchim, S., Sornlertlamvanich, V., & Isahara, H. (2007). Improving search performance: A lesson learned from evaluating search engines using thai queries. IEICE Transactions on Information and Systems E (Norwalk, Connecticut), 90–D(10), 1557–1564. doi:10.1093/ietisy/e90-d.10.1557.CrossRef Tongchim, S., Sornlertlamvanich, V., & Isahara, H. (2007). Improving search performance: A lesson learned from evaluating search engines using thai queries. IEICE Transactions on Information and Systems E (Norwalk, Connecticut), 90–D(10), 1557–1564. doi:10.​1093/​ietisy/​e90-d.​10.​1557.CrossRef
go back to reference Tzoukermann, E., Klavans, J., & Jacquemin, C. (1997). Effective use of natural language processing techniques for automatic conflation of multi-word terms: The role of derivational morphology, part of speech tagging, and shallow parsing. In Proceedings of the 20th ACM SIGIR Conference (SIGIR’97). Tzoukermann, E., Klavans, J., & Jacquemin, C. (1997). Effective use of natural language processing techniques for automatic conflation of multi-word terms: The role of derivational morphology, part of speech tagging, and shallow parsing. In Proceedings of the 20th ACM SIGIR Conference (SIGIR’97).
go back to reference Vilares, J., Alonso, M. A., Ribadas, F. J., & Vilares, M. (2003). COLE experiments at CLEF 2002 Spanish monolingual track. In Advances in cross-language information retrieval, Vol. 2785 of lecture notes in computer science (pp. 265–278). Vilares, J., Alonso, M. A., Ribadas, F. J., & Vilares, M. (2003). COLE experiments at CLEF 2002 Spanish monolingual track. In Advances in cross-language information retrieval, Vol. 2785 of lecture notes in computer science (pp. 265–278).
go back to reference Vilares, J., Cabrero, D., & Alonso, M. A. (2001). Applying productive derivational morphology to term indexing of Spanish texts. In Computational linguistics and intelligent text processing, Vol. 2004 of lecture notes in computer science (pp. 336–348). Vilares, J., Cabrero, D., & Alonso, M. A. (2001). Applying productive derivational morphology to term indexing of Spanish texts. In Computational linguistics and intelligent text processing, Vol. 2004 of lecture notes in computer science (pp. 336–348).
go back to reference Zou, F., Wang, F. L., Deng, X., & Han, S. (2006). Automatic identification of Chinese stop words. Research on Computing Science: Special issue on Advances in Natural Language Processing, 18, 151–162. Zou, F., Wang, F. L., Deng, X., & Han, S. (2006). Automatic identification of Chinese stop words. Research on Computing Science: Special issue on Advances in Natural Language Processing, 18, 151–162.
Metadata
Title
Current research issues and trends in non-English Web searching
Authors
Fotis Lazarinis
Jesús Vilares
John Tait
Efthimis N. Efthimiadis
Publication date
01-06-2009
Publisher
Springer Netherlands
Published in
Discover Computing / Issue 3/2009
Print ISSN: 2948-2984
Electronic ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-009-9093-0

Other articles of this Issue 3/2009

Discover Computing 3/2009 Go to the issue

Premium Partner