Skip to main content
Erschienen in: Discover Computing 5/2006

01.11.2006

Spelling correction in the PubMed search engine

Erschienen in: Discover Computing | Ausgabe 5/2006

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

It is known that users of internet search engines often enter queries with misspellings in one or more search terms. Several web search engines make suggestions for correcting misspelled words, but the methods used are proprietary and unpublished to our knowledge. Here we describe the methodology we have developed to perform spelling correction for the PubMed search engine. Our approach is based on the noisy channel model for spelling correction and makes use of statistics harvested from user logs to estimate the probabilities of different types of edits that lead to misspellings. The unique problems encountered in correcting search engine queries are discussed and our solutions are outlined.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search enginne query log. In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, MA: ACM Press. Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search enginne query log. In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, MA: ACM Press.
Zurück zum Zitat Brill, R., & Moore, R.C. (2000). An improved error model for noisy channel spelling correction. ACL 2000. Brill, R., & Moore, R.C. (2000). An improved error model for noisy channel spelling correction. ACL 2000.
Zurück zum Zitat Church, K. W., & Gale, W. A. (1991). Probability scoring for spelling correction. Statistics and Computing, 1, 93–103.CrossRef Church, K. W., & Gale, W. A. (1991). Probability scoring for spelling correction. Statistics and Computing, 1, 93–103.CrossRef
Zurück zum Zitat Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), 171–176.CrossRef Damerau, F. J. (1964). A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3), 171–176.CrossRef
Zurück zum Zitat Gadd, T. N. (1990). PHONIX: The algorithm. Program: Automated Library and Information Systems, 24(4), 363–366.CrossRef Gadd, T. N. (1990). PHONIX: The algorithm. Program: Automated Library and Information Systems, 24(4), 363–366.CrossRef
Zurück zum Zitat Huang, C.-K., Chien, L.-F., et al. (2003). Relevant term suggestion in interactive web search based on contextual information in query session logs. Journal of the American Society for Information Science and Technology, 54(7), 638–649.CrossRef Huang, C.-K., Chien, L.-F., et al. (2003). Relevant term suggestion in interactive web search based on contextual information in query session logs. Journal of the American Society for Information Science and Technology, 54(7), 638–649.CrossRef
Zurück zum Zitat Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing. Upper Saddle River, New Jersey: Prentice Hall. Jurafsky, D., & Martin, J. H. (2000). Speech and Language Processing. Upper Saddle River, New Jersey: Prentice Hall.
Zurück zum Zitat Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4), 377–439.CrossRef Kukich, K. (1992). Techniques for automatically correcting words in text. ACM Computing Surveys, 24(4), 377–439.CrossRef
Zurück zum Zitat Larson, H. J. (1982). Introduction to probability theory and statistical inference. New York: Wiley & Sons.MATH Larson, H. J. (1982). Introduction to probability theory and statistical inference. New York: Wiley & Sons.MATH
Zurück zum Zitat Leroy, G., Lally, A. M., et al. (2003). The use of dynamic contexts to improve casual internet searching. ACM Transactions on Information Systems, 21(3), 229–253.CrossRef Leroy, G., Lally, A. M., et al. (2003). The use of dynamic contexts to improve casual internet searching. ACM Transactions on Information Systems, 21(3), 229–253.CrossRef
Zurück zum Zitat McEntyre, J., & Lipman, D. (2001). PubMed: Bridging the information gap. CMAJ, 164(9), 1317–1319. McEntyre, J., & Lipman, D. (2001). PubMed: Bridging the information gap. CMAJ, 164(9), 1317–1319.
Zurück zum Zitat Nordlie, R. (1999). “User revealment”—a comparison of initial queries and ensuing question development in online searching and in human reference interactions. In SIGIR'99: 22nd International Conference on Research and Development in Information Retrieval. University of California, Berkeley: ACM Press. Nordlie, R. (1999). “User revealment”—a comparison of initial queries and ensuing question development in online searching and in human reference interactions. In SIGIR'99: 22nd International Conference on Research and Development in Information Retrieval. University of California, Berkeley: ACM Press.
Zurück zum Zitat Philips, L. (1990). Hanging on the metaphone. Computer Language, 7(12). Philips, L. (1990). Hanging on the metaphone. Computer Language, 7(12).
Zurück zum Zitat Sedgewick, R. (1998). Algorithms in C (Parts 1–4). Boston: Addison-Wesley. Sedgewick, R. (1998). Algorithms in C (Parts 1–4). Boston: Addison-Wesley.
Zurück zum Zitat Silverstein, C., & Henzinger, M. (1999). Analysis of a very large web search engine query log. SIGIR Forum, 33(1), 6–12.CrossRef Silverstein, C., & Henzinger, M. (1999). Analysis of a very large web search engine query log. SIGIR Forum, 33(1), 6–12.CrossRef
Zurück zum Zitat Spink, A., Wolfram, D., et al. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.CrossRef Spink, A., Wolfram, D., et al. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226–234.CrossRef
Zurück zum Zitat Survey. (2000). NPD Search and Portal Site Survey. Retrieved September 26, 2005, from http://www.searchenginewatch.com/sereport/article.php/2162791. Survey. (2000). NPD Search and Portal Site Survey. Retrieved September 26, 2005, from http://​www.​searchenginewatc​h.​com/​sereport/​article.​php/​2162791.​
Zurück zum Zitat Wang, P., Berry, M. W., et al. (2003). Mining longitudinal web queries: Trends and patterns. Journal of the American Society for Information Science and Technology, 54(8), 743–758.CrossRef Wang, P., Berry, M. W., et al. (2003). Mining longitudinal web queries: Trends and patterns. Journal of the American Society for Information Science and Technology, 54(8), 743–758.CrossRef
Zurück zum Zitat Wen, J.-R., Nie, J.-Y., et al. (2002). Query clustering using user logs. ACM Transactions on Information Systems, 20(1), 59–81.CrossRef Wen, J.-R., Nie, J.-Y., et al. (2002). Query clustering using user logs. ACM Transactions on Information Systems, 20(1), 59–81.CrossRef
Zurück zum Zitat Zobel, J., & Dart, P. (1995). Finding approximate matches in large lexicons. Software-Practice and Experience, 25(3), 331–345.CrossRef Zobel, J., & Dart, P. (1995). Finding approximate matches in large lexicons. Software-Practice and Experience, 25(3), 331–345.CrossRef
Metadaten
Titel
Spelling correction in the PubMed search engine
Publikationsdatum
01.11.2006
Erschienen in
Discover Computing / Ausgabe 5/2006
Print ISSN: 2948-2984
Elektronische ISSN: 2948-2992
DOI
https://doi.org/10.1007/s10791-006-9002-8

Weitere Artikel der Ausgabe 5/2006

Discover Computing 5/2006 Zur Ausgabe