Skip to main content

2006 | OriginalPaper | Buchkapitel

Enabling Search over Large Collections of Telugu Document Images – An Automatic Annotation Based Approach

verfasst von : K. Pramod Sankar, C. V. Jawahar

Erschienen in: Computer Vision, Graphics and Image Processing

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

For the first time, search is enabled over a massive collection of 21 Million word images from digitized document images. This work advances the state-of-the-art on multiple fronts: i)

Indian language

document images are made searchable by textual queries, ii)

interactive

content-level access is provided to document

images

for search and retrieval, iii) a novel

recognition-free

approach, that does not require an OCR, is adapted and validated iv) a suite of image processing and pattern classification algorithms are proposed to efficiently

automate

the process and v) the scalability of the solution is demonstrated over a

large collection

of 500 digitised books consisting of 75,000 pages.

Character recognition based approaches yield poor results for developing search engines for Indian language document images, due to the complexity of the script and the poor quality of the documents. Recognition free approaches, based on word-spotting, are not directly scalable to large collections, due to the computational complexity of matching images in the feature space. For example, if it requires 1 mSec to match two images, the retrieval of documents to a single query, from a large collection like ours, would require close to a day’s time. In this paper we propose a novel automatic annotation based approach to provide textual description of document images. With a one time, offline computational effort, we are able to build a text-based retrieval system, over annotated images. This system has an interactive response time of about 0.01 second. However, we pay the price in the form of massive offline computation, which is performed on a cluster of 35 computers, for about a month. Our procedure is highly automatic, requiring minimal human intervention.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Metadaten
Titel
Enabling Search over Large Collections of Telugu Document Images – An Automatic Annotation Based Approach
verfasst von
K. Pramod Sankar
C. V. Jawahar
Copyright-Jahr
2006
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/11949619_75

Premium Partner