Skip to main content
Top

2018 | OriginalPaper | Chapter

3. Using Beautiful Soup

Author : Gábor László Hajba

Published in: Website Scraping with Python

Publisher: Apress

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

In this chapter, you will learn how to use Beautiful Soup, a lightweight Python library, to extract and navigate HTML content easily and forget overly complex regular expressions and text parsing.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
Unless you are lucky. Once I encountered a site where all the links to the remaining pages were there in the HTML code but had been hidden with some JS-magic.
 
2
OOP: object-oriented programming
 
3
For example, the Builder or Factory patterns, a constructor with all arguments.
 
5
I have to admit, every time I write CSV files I use spamwriter as my variable’s name. I guess this gives me a global understanding on what’s happening.
 
9
Object-relational mapping
 
10
I have worked since 2007 with ORM tools, and I like the idea, but some queries can become quite complex.
 
12
Hard cache: Get all information from the cache, and if there are attempts to gather anything from the Internet, refuse it. This makes scraping a bit consistent between runs.
 
13
For more information, visit: https://blake2.net/
 
14
Alternatively, to be more consistent, you can create a downloader, which hides the cache from the users of your code.
 
Metadata
Title
Using Beautiful Soup
Author
Gábor László Hajba
Copyright Year
2018
Publisher
Apress
DOI
https://doi.org/10.1007/978-1-4842-3925-4_3

Premium Partner