Skip to main content
Top

2023 | OriginalPaper | Chapter

3. How Search Engines Capture and Process Content from the Web

Author : Dirk Lewandowski

Published in: Understanding Search Engines

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This chapter describes the technical basis of search engines. This basis includes how the documents available on the Web are brought into the search engine and how they are made searchable, as well as how the link between a search query and the documents in the database is established. Details on the workings of the crawler, the indexer, and the searcher are given.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search. Addison Wesley. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval: The concepts and technology behind search. Addison Wesley.
go back to reference Bharat, K., & Broder, A. (1998). A technique for measuring the relative size and overlap of public Web search engines. Computer Networks and ISDN Systems, 30(1–7), 379–388.CrossRef Bharat, K., & Broder, A. (1998). A technique for measuring the relative size and overlap of public Web search engines. Computer Networks and ISDN Systems, 30(1–7), 379–388.CrossRef
go back to reference Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., et al. (2000). Graph structure in the web. Computer Networks, 33(1–6), 309–320.CrossRef Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., et al. (2000). Graph structure in the web. Computer Networks, 33(1–6), 309–320.CrossRef
go back to reference Chang, Y., & Deng, H. (Eds.). (2020). Query understanding for search engines. Springer. Chang, Y., & Deng, H. (Eds.). (2020). Query understanding for search engines. Springer.
go back to reference Croft, W. B., Metzler, D., & Strohman, T. (2009). Search engines: Information retrieval in practice. Pearson. Croft, W. B., Metzler, D., & Strohman, T. (2009). Search engines: Information retrieval in practice. Pearson.
go back to reference Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019 – 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – Proceedings of the Conference (pp. 4171–4186). Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL HLT 2019 – 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – Proceedings of the Conference (pp. 4171–4186).
go back to reference Gulli, A., & Signorini, A. (2005). The indexable web is more than 11.5 billion pages. In 14th International Conference on World Wide Web (pp. 902–903). ACM. Gulli, A., & Signorini, A. (2005). The indexable web is more than 11.5 billion pages. In 14th International Conference on World Wide Web (pp. 902–903). ACM.
go back to reference Lawrence, S., & Giles, C. L. (1999). Accessibility of information on the web. Nature, 400(8), 107–109.CrossRef Lawrence, S., & Giles, C. L. (1999). Accessibility of information on the web. Nature, 400(8), 107–109.CrossRef
go back to reference Levene, M. (2010). An introduction to search engines and web navigation. Wiley.CrossRef Levene, M. (2010). An introduction to search engines and web navigation. Wiley.CrossRef
go back to reference Lewandowski, D. (2011). Query understanding. In D. Lewandowski (Ed.), Handbuch Internet-Suchmaschinen 2: Neue Entwicklungen in der Web-Suche (pp. 55–75). Akademische Verlagsgesellschaft AKA. Lewandowski, D. (2011). Query understanding. In D. Lewandowski (Ed.), Handbuch Internet-Suchmaschinen 2: Neue Entwicklungen in der Web-Suche (pp. 55–75). Akademische Verlagsgesellschaft AKA.
go back to reference Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.CrossRefMATH Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press.CrossRefMATH
go back to reference Ntoulas, A., Cho, J., & Olston, C. (2004). What’s new on the web?: The evolution of the web from a search engine perspective. In Proceedings of the 13th international conference on World Wide Web (pp. 1–12). ACM. Ntoulas, A., Cho, J., & Olston, C. (2004). What’s new on the web?: The evolution of the web from a search engine perspective. In Proceedings of the 13th international conference on World Wide Web (pp. 1–12). ACM.
go back to reference Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks, 39(3), 289–302.CrossRef Risvik, K. M., & Michelsen, R. (2002). Search engines and web dynamics. Computer Networks, 39(3), 289–302.CrossRef
go back to reference Tyagi, V. (2017). Content-based image retrieval: Ideas, influences, and current trends. Springer.CrossRefMATH Tyagi, V. (2017). Content-based image retrieval: Ideas, influences, and current trends. Springer.CrossRefMATH
go back to reference Vaidhyanathan, S. (2011). The Googlization of everything (and why we should worry). University of California Press.CrossRef Vaidhyanathan, S. (2011). The Googlization of everything (and why we should worry). University of California Press.CrossRef
go back to reference Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias: Evidence and possible causes. Information Processing & Management, 40, 693–707.CrossRef Vaughan, L., & Thelwall, M. (2004). Search engine coverage bias: Evidence and possible causes. Information Processing & Management, 40, 693–707.CrossRef
go back to reference Vaughan, L., & Zhang, Y. (2007). Equal representation by search engines? A comparison of websites across countries and domains. Journal of Computer-Mediated Communication, 12, 888–909.CrossRef Vaughan, L., & Zhang, Y. (2007). Equal representation by search engines? A comparison of websites across countries and domains. Journal of Computer-Mediated Communication, 12, 888–909.CrossRef
Metadata
Title
How Search Engines Capture and Process Content from the Web
Author
Dirk Lewandowski
Copyright Year
2023
DOI
https://doi.org/10.1007/978-3-031-22789-9_3