Skip to main content
Top

2006 | OriginalPaper | Chapter

Hybrid Approach to Extracting Information from Web-Tables

Authors : Sung-won Jung, Mi-young Kang, Hyuk-chul Kwon

Published in: Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead

Publisher: Springer Berlin Heidelberg

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

This study concerns the extracting of information from tables in HTML documents. In our previous work, as a prerequisite for information extraction from tables in HTML, algorithms for separating meaningful tables and decorative tables were constructed, because only meaningful tables can be used to extract information and a preponderant proportion of decorative tables in training harms the learning result. In order to extract information, this study separated the head from the body in meaningful tables by extending the head extraction algorithm that was constructed in our previous work, using a machine learning algorithm, C4.5, and set up heuristics for table-schema extraction from meaningful tables by analyzing their head(s). In addition, table information in triples was extracted by determining the relation between the data and the extracted table schema. We obtained 71.2% accuracy in extracting table-schemata and information from the meaningful tables.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Metadata
Title
Hybrid Approach to Extracting Information from Web-Tables
Authors
Sung-won Jung
Mi-young Kang
Hyuk-chul Kwon
Copyright Year
2006
Publisher
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/11940098_11

Premium Partner