Skip to main content

2002 | OriginalPaper | Buchkapitel

Word and Sentence Extraction Using Irregular Pyramid

verfasst von : Poh Kok Loo, Chew Lim Tan

Erschienen in: Document Analysis Systems V

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

This paper presents the result of our continued work on a further enhancement to our previous proposed algorithm. Moving beyond the extraction of word groups and based on the same irregular pyramid structure the new proposed algorithm groups the extracted words into sentences. The uniqueness of the algorithm is in its ability to process text of a wide variation in terms of size, font, orientation and layout on the same document image. No assumption is made on any specified document type. The algorithm is based on the irregular pyramid structure with the application of four fundamental concepts. The first is the inclusion of background information. The second is the concept of closeness where text information within a group is close to each other, in terms of spatial distance, as compared to other text areas. The third is the “majority win” strategy that is more suitable under the greatly varying environment than a constant threshold value. The final concept is the uniformity and continuity among words belonging to the same sentence.

Metadaten
Titel
Word and Sentence Extraction Using Irregular Pyramid
verfasst von
Poh Kok Loo
Chew Lim Tan
Copyright-Jahr
2002
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/3-540-45869-7_36

Premium Partner