Skip to main content
Top
Published in: International Journal of Automation and Computing 5/2014

01-10-2014 | Regular paper

Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance

Authors: Sheng-Ye Yan, Xin-Xing Xu, Qing-Shan Liu

Published in: Machine Intelligence Research | Issue 5/2014

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper proposes a new two-phase approach to robust text detection by integrating the visual appearance and the geometric reasoning rules. In the first phase, geometric rules are used to achieve a higher recall rate. Specifically, a robust stroke width transform (RSWT) feature is proposed to better recover the stroke width by additionally considering the cross of two strokes and the continuousness of the letter border. In the second phase, a classification scheme based on visual appearance features is used to reject the false alarms while keeping the recall rate. To learn a better classifier from multiple visual appearance features, a novel classification method called double soft multiple kernel learning (DS-MKL) is proposed. DS-MKL is motivated by a novel kernel margin perspective for multiple kernel learning and can effectively suppress the influence of noisy base kernels. Comprehensive experiments on the benchmark ICDAR2005 competition dataset demonstrate the effectiveness of the proposed two-phase text detection approach over the state-of-the-art approaches by a performance gain up to 4.4% in terms of F-measure.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
[1]
go back to reference G. Sahoo, T. Kumar, B. L. Raina, C. M. Bhatia. Text extraction and enhancement of binary images using cellular automata. International Journal of Automation and Computing, vol. 6, no. 3, pp. 254–260, 2009.CrossRef G. Sahoo, T. Kumar, B. L. Raina, C. M. Bhatia. Text extraction and enhancement of binary images using cellular automata. International Journal of Automation and Computing, vol. 6, no. 3, pp. 254–260, 2009.CrossRef
[2]
go back to reference B. Epshtein, E. Ofek, Y. Wexler. Detecting text in natural scenes with stroke width transform. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp. 2963–2970, 2010. B. Epshtein, E. Ofek, Y. Wexler. Detecting text in natural scenes with stroke width transform. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp. 2963–2970, 2010.
[3]
go back to reference L. Neumann, J. Matas. A method for text localization and recognition in real-world images. In Proceedings of the 10th Asian Conference on Computer Vision, Lecture Notes in Corputer Science, vol. 6494, Springer, Queenstown, New Zealand, pp. 770–783, 2010. L. Neumann, J. Matas. A method for text localization and recognition in real-world images. In Proceedings of the 10th Asian Conference on Computer Vision, Lecture Notes in Corputer Science, vol. 6494, Springer, Queenstown, New Zealand, pp. 770–783, 2010.
[4]
go back to reference C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu. Detecting texts of arbitrary orientations in natural images. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 1083–1090, 2012. C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu. Detecting texts of arbitrary orientations in natural images. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 1083–1090, 2012.
[5]
go back to reference Y. C. Wei, C. H. Lin. A robust video text detection approach using SVM. Expert Systems with Applications, vol. 39, no. 12, pp. 10832–10840, 2012.CrossRef Y. C. Wei, C. H. Lin. A robust video text detection approach using SVM. Expert Systems with Applications, vol. 39, no. 12, pp. 10832–10840, 2012.CrossRef
[6]
go back to reference Y. Y. Qu, W. M. Liao, S. Lu, S. J. Wu. Hierarchical text detection: From word level to character level. In Proceedings of the 19th International Conference on Advances in Multimedia Modeling, Lecture Notes in Computer Science, Springer, Huangshan, China, vol. 7733 pp. 24–35, 2013. Y. Y. Qu, W. M. Liao, S. Lu, S. J. Wu. Hierarchical text detection: From word level to character level. In Proceedings of the 19th International Conference on Advances in Multimedia Modeling, Lecture Notes in Computer Science, Springer, Huangshan, China, vol. 7733 pp. 24–35, 2013.
[7]
go back to reference V. N. M. Aradhya, M. S. Pavithra. An application of K-means clustering for improving video text detection. In Proceedings of International Symposium on Intelligent Informatics, Advances in Intelligent Systems and Computer, Springer, Channai, India, vol. 182, pp. 41–47, 2013. V. N. M. Aradhya, M. S. Pavithra. An application of K-means clustering for improving video text detection. In Proceedings of International Symposium on Intelligent Informatics, Advances in Intelligent Systems and Computer, Springer, Channai, India, vol. 182, pp. 41–47, 2013.
[8]
go back to reference C. Z. Shi, C. H. Wang, B. H. Xiao, Y. Zhang, S. Gao. Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognition Letters, vol. 34, no. 2, pp. 107–116, 2013.CrossRef C. Z. Shi, C. H. Wang, B. H. Xiao, Y. Zhang, S. Gao. Scene text detection using graph model built upon maximally stable extremal regions. Pattern Recognition Letters, vol. 34, no. 2, pp. 107–116, 2013.CrossRef
[9]
go back to reference S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young. ICDAR 2003 robust reading competitions. In Proceedings of the 7th International Conference on Document Analysis and Recognition, IEEE, Edinburgh, Scotland, pp. 682–687, 2003. S. M. Lucas, A. Panaretos, L. Sosa, A. Tang, S. Wong, R. Young. ICDAR 2003 robust reading competitions. In Proceedings of the 7th International Conference on Document Analysis and Recognition, IEEE, Edinburgh, Scotland, pp. 682–687, 2003.
[10]
go back to reference J. Liang, D. Doermann, H. P. Li. Camera-based analysis of text and documents: A survey. International Journal of Document Analysis and Recognition, vol. 7, no. 2–3, pp. 83–104, 2005. J. Liang, D. Doermann, H. P. Li. Camera-based analysis of text and documents: A survey. International Journal of Document Analysis and Recognition, vol. 7, no. 2–3, pp. 83–104, 2005.
[11]
go back to reference H. G. Zhang, K. Zhao, Y. Z. Song, J. Guo. Text extraction from natural scene image: A survey. Neurocomputing, vol. 122, pp. 310–323, 2013.CrossRef H. G. Zhang, K. Zhao, Y. Z. Song, J. Guo. Text extraction from natural scene image: A survey. Neurocomputing, vol. 122, pp. 310–323, 2013.CrossRef
[12]
go back to reference A. K. Jain, B. Yu. Automatic text location in images and video frames. Pattern Recognition, vol. 31, no. 12, pp. 2055–2076, 1998.CrossRef A. K. Jain, B. Yu. Automatic text location in images and video frames. Pattern Recognition, vol. 31, no. 12, pp. 2055–2076, 1998.CrossRef
[13]
go back to reference X. R. Chen, A. L. Yuille. Detecting and reading text in natural scenes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Washington DC, USA, pp. 366–373, 2004. X. R. Chen, A. L. Yuille. Detecting and reading text in natural scenes. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Washington DC, USA, pp. 366–373, 2004.
[14]
go back to reference L. Neumann, R. Ewerth, B. Freisleben. Text detection in images based on unsupervised classification of high frequency wavelet coefficients. In Proceedings of International Conference on Pattern Recognition, IEEE, Cambridge, England, pp. 425–428, 2004. L. Neumann, R. Ewerth, B. Freisleben. Text detection in images based on unsupervised classification of high frequency wavelet coefficients. In Proceedings of International Conference on Pattern Recognition, IEEE, Cambridge, England, pp. 425–428, 2004.
[15]
go back to reference L. Neumann, J. Matas. Real-time scene text localization and recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 3538–3545, 2012. L. Neumann, J. Matas. Real-time scene text localization and recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, Providence, USA, pp. 3538–3545, 2012.
[16]
go back to reference G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, vol. 5, pp. 27–72, 2004.MATH G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, vol. 5, pp. 27–72, 2004.MATH
[17]
go back to reference F. R. Bach, G. R. G. Lanckriet, M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the 21st International Conference on Machine Learning, ACM, Banff, Alberta, Canada, 2004. F. R. Bach, G. R. G. Lanckriet, M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the 21st International Conference on Machine Learning, ACM, Banff, Alberta, Canada, 2004.
[18]
go back to reference S. Sonnenburg, G. Rätsch, C. Schäfer, B. Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, vol. 7, pp. 1531–1565, 2006.MATH S. Sonnenburg, G. Rätsch, C. Schäfer, B. Schölkopf. Large scale multiple kernel learning. Journal of Machine Learning Research, vol. 7, pp. 1531–1565, 2006.MATH
[19]
go back to reference A. Rakotomamonjy, F. Bach, S. Canu, Y. Grandvalet. Simple MKL. Journal of Machine Learning Research, vol. 9, pp. 2491–2521, 2008.MATHMathSciNet A. Rakotomamonjy, F. Bach, S. Canu, Y. Grandvalet. Simple MKL. Journal of Machine Learning Research, vol. 9, pp. 2491–2521, 2008.MATHMathSciNet
[20]
go back to reference C. Cortes, M. Mohri, A. Rostamizadeh. L2 regularization for learning kernels. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, USA, pp. 109–116, 2009. C. Cortes, M. Mohri, A. Rostamizadeh. L2 regularization for learning kernels. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, AUAI Press, Arlington, Virginia, USA, pp. 109–116, 2009.
[21]
go back to reference M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien. L p-norm multiple kernel learning. Journal of Machine Learning Research, vol. 12, pp. 953–997, 2011.MATHMathSciNet M. Kloft, U. Brefeld, S. Sonnenburg, A. Zien. L p-norm multiple kernel learning. Journal of Machine Learning Research, vol. 12, pp. 953–997, 2011.MATHMathSciNet
[22]
go back to reference X. Xu, I. W. Tsang, D. Xu. Soft margin multiple kernel learning. IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 5, pp. 749–761, 2013.CrossRef X. Xu, I. W. Tsang, D. Xu. Soft margin multiple kernel learning. IEEE Transactions on Neural Networks and Learning Systems, vol. 24, no. 5, pp. 749–761, 2013.CrossRef
[23]
go back to reference J. X. Xiao, J. Hays, K. A. Ehinger, A. Oliva, A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp. 3485–3492, 2010. J. X. Xiao, J. Hays, K. A. Ehinger, A. Oliva, A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, San Francisco, USA, pp. 3485–3492, 2010.
[24]
go back to reference T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002.CrossRef T. Ojala, M. Pietikainen, T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002.CrossRef
[25]
go back to reference D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.CrossRef D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.CrossRef
[26]
go back to reference E. Shechtman, M. Irani. Matching local self-similarities across images and videos. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Minneapolis, USA, pp. 1–8, 2007. E. Shechtman, M. Irani. Matching local self-similarities across images and videos. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Minneapolis, USA, pp. 1–8, 2007.
[27]
go back to reference C. Cortes, V. Vapnik. Support-vector networks. Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.MATH C. Cortes, V. Vapnik. Support-vector networks. Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.MATH
[28]
go back to reference B. E. Boser, I. M. Guyon, V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM, Pittsburgh, PA, USA, pp. 144–152, 1992. B. E. Boser, I. M. Guyon, V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory, ACM, Pittsburgh, PA, USA, pp. 144–152, 1992.
[29]
go back to reference Z. L. Xu, R. Jin, H. Q. Yang, I. King, M. R. Lyu. Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27th International Conference on Machine Learning, Omnipress, Haifa, Israel, pp. 1175–1182, 2010. Z. L. Xu, R. Jin, H. Q. Yang, I. King, M. R. Lyu. Simple and efficient multiple kernel learning by group lasso. In Proceedings of the 27th International Conference on Machine Learning, Omnipress, Haifa, Israel, pp. 1175–1182, 2010.
[30]
go back to reference M. Szafranski, Y. Grandvalet, A. Rakotomamonjy. Composite kernel learning. Machine Learning, vol. 79, no. 1–2, pp. 73–103, 2010.CrossRefMathSciNet M. Szafranski, Y. Grandvalet, A. Rakotomamonjy. Composite kernel learning. Machine Learning, vol. 79, no. 1–2, pp. 73–103, 2010.CrossRefMathSciNet
[31]
go back to reference S. Shalev-Shwartz, Y. Singer. Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, vol. 7, pp. 1567–1599, 2006.MATHMathSciNet S. Shalev-Shwartz, Y. Singer. Efficient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, vol. 7, pp. 1567–1599, 2006.MATHMathSciNet
[32]
go back to reference S. M. Lucas. Text locating competition results. In Proceedings of the 8th International Conference on Document Analysis and Recognition, IEEE, Seoul, Korea, pp. 80–85, 2005. S. M. Lucas. Text locating competition results. In Proceedings of the 8th International Conference on Document Analysis and Recognition, IEEE, Seoul, Korea, pp. 80–85, 2005.
[33]
go back to reference S. Y. Yan, X. X. Xu, D. Xu, S. Lin, X. L. Li. Beyond spatial pyramids: A new feature extraction framework with dense spatial sampling for image classification. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 464–478, 2012. S. Y. Yan, X. X. Xu, D. Xu, S. Lin, X. L. Li. Beyond spatial pyramids: A new feature extraction framework with dense spatial sampling for image classification. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, pp. 464–478, 2012.
[34]
go back to reference C. C. Chang, C. J. Lin. Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, Article 27, 2011. C. C. Chang, C. J. Lin. Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, Article 27, 2011.
[35]
go back to reference C. Yi, Y. L. Tian. Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2594–2605, 2011.CrossRefMathSciNet C. Yi, Y. L. Tian. Text string detection from natural scenes by structure-based partition and grouping. IEEE Transactions on Image Processing, vol. 20, no. 9, pp. 2594–2605, 2011.CrossRefMathSciNet
Metadata
Title
Robust Text Detection in Natural Scenes Using Text Geometry and Visual Appearance
Authors
Sheng-Ye Yan
Xin-Xing Xu
Qing-Shan Liu
Publication date
01-10-2014
Publisher
Springer-Verlag
Published in
Machine Intelligence Research / Issue 5/2014
Print ISSN: 2731-538X
Electronic ISSN: 2731-5398
DOI
https://doi.org/10.1007/s11633-014-0833-2

Other articles of this Issue 5/2014

International Journal of Automation and Computing 5/2014 Go to the issue

Premium Partner