Skip to main content
Top
Published in: Machine Vision and Applications 5-6/2017

09-05-2017 | Original Paper

Learning typographic style: from discrimination to synthesis

Author: Shumeet Baluja

Published in: Machine Vision and Applications | Issue 5-6/2017

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Typography is a ubiquitous art form that affects our understanding, perception and trust in what we read. Thousands of different font-faces have been created with enormous variations in the characters. In this paper, we learn the style of a font by analyzing a small subset of only four letters. From these four letters, we learn two tasks. The first is a discrimination task: given the four letters and a new candidate letter, does the new letter belong to the same font? Second, given the four basis letters, can we generate all of the other letters with the same characteristics as those in the basis set? We use deep neural networks to address both tasks, quantitatively and qualitatively measure the results in a variety of novel manners, and present a thorough investigation of the weaknesses and strengths of the approach. All of the experiments are conducted with publicly available font sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Footnotes
1
Wang et al. have studied retrieving fonts found in photographs [55]. Extraction from photographs is not addressed in this paper. However, the underlying task of font retrieval will be presented in the experimental section.
 
2
The subjective evaluation was conducted by an independent user experience researcher (UER) volunteer not affiliated with this project. The UER was given a paper copy of the input letters, the generated letters and the actual letter. The UER was asked to evaluate the ‘R’ along the 3 dimensions listed above. Additionally, for control, the UER was also given examples (not shown here) which included real ‘R’s in order to minimize bias. The UER was not paid for this experiment.
 
3
Other version of multi-task learning can incorporate different error metrics or a larger diversity of tasks. In this case, the multiple tasks are closely related, though still provide the benefit of task transfer.
 
4
For completeness, we also analyzed the ‘R’s generated by the one-letter-at-a-time networks. They had similar performance (when measured with D) to the ‘R’ row shown in Table 3, with (6%) higher SSE.
 
5
This line of inquiry was sparked by discussions with Zhangyang Wang.
 
6
The substantial processes of segmenting, cleaning, centering and pre-processing the fonts from photographs are beyond the scope of this paper. We solely address the retrieval portion of this task. We do this by assuming the target font can be cleaned and segmented to yield input grayscale images such as used in this study. For a review of character segmentation, please see [10].
 
Literature
2.
go back to reference Aucouturier, J.J., Pachet, F.: Representing musical genre: a state of the art. J. New Music Res. 32(1), 83–93 (2003)CrossRef Aucouturier, J.J., Pachet, F.: Representing musical genre: a state of the art. J. New Music Res. 32(1), 83–93 (2003)CrossRef
3.
go back to reference Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf. Learn. Chall. Mach Learn. 7, 19 (2012) Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf. Learn. Chall. Mach Learn. 7, 19 (2012)
5.
go back to reference Beymer, D., Russell, D., Orton, P.: An eye tracking study of how font size and type influence online reading. In: Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction-Volume 2, pp. 15–18. British Computer Society, (2008) Beymer, D., Russell, D., Orton, P.: An eye tracking study of how font size and type influence online reading. In: Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction-Volume 2, pp. 15–18. British Computer Society, (2008)
8.
go back to reference Campbell, N.D., Kautz, J.: Learning a manifold of fonts. ACM Trans. Graph. (TOG) 33(4), 91 (2014)CrossRef Campbell, N.D., Kautz, J.: Learning a manifold of fonts. ACM Trans. Graph. (TOG) 33(4), 91 (2014)CrossRef
10.
go back to reference Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)CrossRef Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)CrossRef
11.
go back to reference Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE, (2012) Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE, (2012)
12.
go back to reference Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH
14.
go back to reference Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV, pp. 184–199. Springer, (2014) Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV, pp. 184–199. Springer, (2014)
15.
go back to reference Dosovitskiy, A., Brox, T.: Generating Images with Perceptual Similarity Metrics based on Deep Networks. ArXiv e-prints arXiv:1602.02644 (2016) Dosovitskiy, A., Brox, T.: Generating Images with Perceptual Similarity Metrics based on Deep Networks. ArXiv e-prints arXiv:​1602.​02644 (2016)
16.
go back to reference Eck, D., Schmidhuber, J.: A first look at music composition using LSTM recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale 103 (2002) Eck, D., Schmidhuber, J.: A first look at music composition using LSTM recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale 103 (2002)
17.
go back to reference Feng, J.C., Tse, C., Qiu, Y.: Wavelet-transform-based strategy for generating new chinese fonts. In: Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS’03, vol. 4, pp. IV–IV . IEEE, (2003) Feng, J.C., Tse, C., Qiu, Y.: Wavelet-transform-based strategy for generating new chinese fonts. In: Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS’03, vol. 4, pp. IV–IV . IEEE, (2003)
18.
go back to reference Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129. (2013) Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129. (2013)
20.
go back to reference Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680. (2014) Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680. (2014)
23.
go back to reference Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: a recurrent neural network for image generation. CoRR arXiv:1502.04623 (2015) Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: a recurrent neural network for image generation. CoRR arXiv:​1502.​04623 (2015)
24.
go back to reference Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., Gool, L.: The interestingness of images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1633–1640 (2013) Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., Gool, L.: The interestingness of images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1633–1640 (2013)
26.
go back to reference Hassan, T., Hu, C., Hersch, R.D.: Next generation typeface representations: revisiting parametric fonts. In: Proceedings of the 10th ACM symposium on Document engineering, pp. 181–184. ACM, (2010) Hassan, T., Hu, C., Hersch, R.D.: Next generation typeface representations: revisiting parametric fonts. In: Proceedings of the 10th ACM symposium on Document engineering, pp. 181–184. ACM, (2010)
27.
go back to reference Hu, C., Hersch, R.D.: Parameterizable fonts based on shape components. IEEE Comput. Graph. Appl. 21(3), 70–85 (2001) Hu, C., Hersch, R.D.: Parameterizable fonts based on shape components. IEEE Comput. Graph. Appl. 21(3), 70–85 (2001)
28.
go back to reference Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7304–7308 . IEEE, (2013) Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7304–7308 . IEEE, (2013)
30.
go back to reference Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015) Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:​1502.​03167 (2015)
31.
go back to reference Im, J.D., Kim, C.D., Jiang, H., Memisevic, R.: Generating images with recurrent adversarial networks. arXiv e-prints arXiv:1602.05110 (2016) Im, J.D., Kim, C.D., Jiang, H., Memisevic, R.: Generating images with recurrent adversarial networks. arXiv e-prints arXiv:​1602.​05110 (2016)
32.
go back to reference Joachims, T.: Making large scale SVM learning practical. Universität Dortmund, Technical report (1999) Joachims, T.: Making large scale SVM learning practical. Universität Dortmund, Technical report (1999)
33.
34.
go back to reference Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732. (2014) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732. (2014)
35.
go back to reference Khosla, A., Das Sarma, A., Hamid, R.: What makes an image popular? In: Proceedings of the 23rd International Conference on World Wide Web, pp. 867–876. ACM, (2014) Khosla, A., Das Sarma, A., Hamid, R.: What makes an image popular? In: Proceedings of the 23rd International Conference on World Wide Web, pp. 867–876. ACM, (2014)
36.
go back to reference Kwok, K.W., Wong, S.M., Lo, K.W., Yam, Y.: Genetic algorithm-based brush stroke generation for replication of Chinese calligraphic character. In: IEEE Congress on Evolutionary Computation, 2006. CEC 2006, pp. 1057–1064 . IEEE, (2006) Kwok, K.W., Wong, S.M., Lo, K.W., Yam, Y.: Genetic algorithm-based brush stroke generation for replication of Chinese calligraphic character. In: IEEE Congress on Evolutionary Computation, 2006. CEC 2006, pp. 1057–1064 . IEEE, (2006)
37.
go back to reference LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef
38.
go back to reference Li, Y.M., Yeh, Y.S.: Increasing trust in mobile commerce through design aesthetics. Comput. Hum. Behav. 26(4), 673–684 (2010)CrossRef Li, Y.M., Yeh, Y.S.: Increasing trust in mobile commerce through design aesthetics. Comput. Hum. Behav. 26(4), 673–684 (2010)CrossRef
39.
go back to reference Lun, Z., Kalogerakis, E., Sheffer, A.: Elements of style: learning perceptual shape style similarity. ACM Trans. Graph. (TOG) 34(4), 84 (2015)CrossRef Lun, Z., Kalogerakis, E., Sheffer, A.: Elements of style: learning perceptual shape style similarity. ACM Trans. Graph. (TOG) 34(4), 84 (2015)CrossRef
40.
go back to reference Miyazaki, T., Tsuchiya, T., Sugaya, Y., Omachi, S., Iwamura, M., Uchida, S., Kise, K.: Automatic generation of typographic font from a small font subset. CoRR arXiv:1701.05703 (2017) Miyazaki, T., Tsuchiya, T., Sugaya, Y., Omachi, S., Iwamura, M., Uchida, S., Kise, K.: Automatic generation of typographic font from a small font subset. CoRR arXiv:​1701.​05703 (2017)
42.
go back to reference Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724. (2014) Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724. (2014)
43.
go back to reference Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 3. (2016) Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 3. (2016)
44.
go back to reference Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.: Imagenet large scale visual recognition challenge. CoRR arXiv:1409.0575 (2014) Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.: Imagenet large scale visual recognition challenge. CoRR arXiv:​1409.​0575 (2014)
45.
go back to reference Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)MathSciNetCrossRefMATH Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)MathSciNetCrossRefMATH
46.
go back to reference Suveeranont, R., Igarashi, T.: Example-based automatic font generation. In: International Symposium on Smart Graphics, pp. 127–138. Springer (2010) Suveeranont, R., Igarashi, T.: Example-based automatic font generation. In: International Symposium on Smart Graphics, pp. 127–138. Springer (2010)
47.
go back to reference Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015) Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
48.
go back to reference Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)CrossRef Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)CrossRef
49.
go back to reference Tschichold, J.: Treasury of alphabets and lettering: a source book of the best letter forms of past and present for sign painters, graphic artists, commercial artists, typographers, printers, sculptors, architects, and schools of art and design. A Norton professional book, Norton. https://books.google.com/books?id=qQB7lqSrpnoC (1995) Tschichold, J.: Treasury of alphabets and lettering: a source book of the best letter forms of past and present for sign painters, graphic artists, commercial artists, typographers, printers, sculptors, architects, and schools of art and design. A Norton professional book, Norton. https://​books.​google.​com/​books?​id=​qQB7lqSrpnoC (1995)
50.
go back to reference Upchurch, P., Snavely, N., Bala, K.: From A to Z: supervised transfer of style and content using deep neural network generators. ArXiv e-prints arXiv:1603.02003 (2016) Upchurch, P., Snavely, N., Bala, K.: From A to Z: supervised transfer of style and content using deep neural network generators. ArXiv e-prints arXiv:​1603.​02003 (2016)
51.
go back to reference Van Santen, J.P., Sproat, R., Olive, J., Hirschberg, J.: Progress in Speech Synthesis. Springer, Berlin (2013)MATH Van Santen, J.P., Sproat, R., Olive, J., Hirschberg, J.: Progress in Speech Synthesis. Springer, Berlin (2013)MATH
52.
go back to reference Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–139. (2014) Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–139. (2014)
54.
go back to reference Wang, Y., Wang, H., Pan, C., Fang, L.: Style preserving Chinese character synthesis based on hierarchical representation of character. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, pp. 1097–1100. IEEE, (2008) Wang, Y., Wang, H., Pan, C., Fang, L.: Style preserving Chinese character synthesis based on hierarchical representation of character. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, pp. 1097–1100. IEEE, (2008)
55.
go back to reference Wang, Z., Yang, J., Jin, H., Shechtman, E., Agarwala, A., Brandt, J., Huang, T.S.: Deepfont: identify your font from an image. In: Proceedings of the 23rd ACM international conference on Multimedia, pp. 451–459. ACM, (2015) Wang, Z., Yang, J., Jin, H., Shechtman, E., Agarwala, A., Brandt, J., Huang, T.S.: Deepfont: identify your font from an image. In: Proceedings of the 23rd ACM international conference on Multimedia, pp. 451–459. ACM, (2015)
56.
go back to reference Willats, J., Durand, F.: Defining pictorial style: lessons from linguistics and computer graphics. Axiomathes 15(3), 319–351 (2005)CrossRef Willats, J., Durand, F.: Defining pictorial style: lessons from linguistics and computer graphics. Axiomathes 15(3), 319–351 (2005)CrossRef
57.
go back to reference Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798. (2014) Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798. (2014)
59.
go back to reference Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 3517–3521. IEEE, (2013) Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 3517–3521. IEEE, (2013)
60.
go back to reference Zong, A., Zhu, Y.: Strokebank: automating personalized Chinese handwriting generation. In: AAAI, pp. 3024–3030 (2014) Zong, A., Zhu, Y.: Strokebank: automating personalized Chinese handwriting generation. In: AAAI, pp. 3024–3030 (2014)
Metadata
Title
Learning typographic style: from discrimination to synthesis
Author
Shumeet Baluja
Publication date
09-05-2017
Publisher
Springer Berlin Heidelberg
Published in
Machine Vision and Applications / Issue 5-6/2017
Print ISSN: 0932-8092
Electronic ISSN: 1432-1769
DOI
https://doi.org/10.1007/s00138-017-0842-6

Other articles of this Issue 5-6/2017

Machine Vision and Applications 5-6/2017 Go to the issue

Premium Partner