ABSTRACT
Information leakage by way of paper documents has become such a serious matter that manufacturers of multifunction printers are providing a log capture device that captures document images whenever those printers are used. They are also offering a text search engine based on OCR text extraction from captured images. Since the accuracy rate of OCR is limited, this paper proposes a system increasing the accuracy of text searches in logs acquired in multifunction printers by numbering each page of paper documents and linking those ID numbers to the text data extracted from the original digital files. Experimental results show that this increases the accuracy rate of text searches from 52.4% to 98.0%.
- NPO Japan Network Security Association. 2008. Fiscal 2007 Information Security Incident Survey Report. DOI= http://www.jnsa.org/result/2007/pol/incident/2007incidentsurvey_e_v1.0.pdfGoogle Scholar
- Canon Inc. 2008. imageWare Secure Audit Manager. DOI= http://www.canon.com.sg/section/bis/iwsam.jspGoogle Scholar
- Ricoh Co., Ltd. 2008. Ridoc Document System Image Log Option. DOI= http://www.ricoh.co.jp/ridoc_ds/rds/option/logoptionGoogle Scholar
- Fuji Xerox Co., Ltd. 2008. ApeosWare Image Log Service Pro. DOI= http://www.fujixerox.co.jp/solution/security/secu_html/secu_10.htmlGoogle Scholar
- Fujii, Y., Ebisawa R., Togashi Y., Yamada T., Honda Y., and Susaki S. 2008. Third-party Approach to Controlling Digital Copiers, The 10th International Conference on Information Integration and Web Based Applications & Services. Google ScholarDigital Library
- Rice S. V., Jenkins F. R., and Nartker T. A. 1995. The Fourth Annual Test of OCR Accuracy, presented at the Fourth Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, Nevada.Google Scholar
- Japan Electronics and Information Technology Industries Association. 2003. Standards of Printer Evaluation Pattern (JEITA IT-3011). DOI= http://www.jeita.or.jp/cgi-bin/standard/pdfpage.cgi?jk_n=518Google Scholar
- Cox I. J., Miller M. L., and Bloom J. A. 2001. Digital Watermarking, Morgan Kauf-mann Publishers. Google ScholarDigital Library
- Fujii Y., Nakano K., Echizen I., Yoshiura H., and Tezuka S. 2003. A Method of Maintaining the Image Quality of Digital Watermarking for Binary Images Using Local Measures, Information Processing Society of Japan Vol. 44, No. 8, pp. 1872--1883.Google Scholar
- Hitachi INS software, Ltd. 2008. Digital Watermark Print Solution, e-Shimon II. DOI= http://www.hitachi-ins.com/product/ekami/index.htmlGoogle Scholar
Index Terms
- High-accuracy text search of hardcopy logs
Recommendations
Ancient text recognition: a review
AbstractOptical character recognition (OCR) is an important research area in the field of pattern recognition. A lot of research has been done on OCR in the last 60 years. There is a large volume of paper-based data in various libraries and offices. Also, ...
An Automatic Closed-Loop Methodology for Generating Character Groundtruth for Scanned Documents
Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for ...
An optical character recognition system for printed Telugu text
Telugu is one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. Not much work has been reported on the development of optical character recognition (OCR) systems for Telugu text. Therefore, ...
Comments