In the field of Optical Character Recognition(OCR), improving the recognition accuracy has been extensively studied in the past decades. In this paper, different from previously published model-based correction methods, Knowledge Base was applied to OCR correcting system from the perspective of linked knowledge. A pipelined method integrating selectivity-aware pre-filtering, text-level and image-level comparison was explored to identify the best candidate with better efficiency and accuracy. For more reliable comparison of company, the weighted coefficients derived from Wikipedia were applied to distinguish the different importance. Moreover, traditional Levenshtein distance was generalized to Image-based Levenshtein measure to better distinguish strings with similar text similarity. The experimental results demonstrated that the proposed system could perform more effectively than the baseline case.
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
- Web Knowledge Base Improved OCR Correction for Chinese Business Cards