Top

Machine Vision and Applications

Published in:

09-05-2017 | Original Paper

Learning typographic style: from discrimination to synthesis

Author: Shumeet Baluja

Published in: Machine Vision and Applications | Issue 5-6/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Typography is a ubiquitous art form that affects our understanding, perception and trust in what we read. Thousands of different font-faces have been created with enormous variations in the characters. In this paper, we learn the style of a font by analyzing a small subset of only four letters. From these four letters, we learn two tasks. The first is a discrimination task: given the four letters and a new candidate letter, does the new letter belong to the same font? Second, given the four basis letters, can we generate all of the other letters with the same characteristics as those in the basis set? We use deep neural networks to address both tasks, quantitatively and qualitatively measure the results in a variety of novel manners, and present a thorough investigation of the weaknesses and strengths of the approach. All of the experiments are conducted with publicly available font sets.

previous article Predicting multiple target tracking performance for applications on video sequences

next article Solving the robot-world hand-eye(s) calibration problem with iterative methods

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Available only for authorised users

Wang et al. have studied retrieving fonts found in photographs [55]. Extraction from photographs is not addressed in this paper. However, the underlying task of font retrieval will be presented in the experimental section.

The subjective evaluation was conducted by an independent user experience researcher (UER) volunteer not affiliated with this project. The UER was given a paper copy of the input letters, the generated letters and the actual letter. The UER was asked to evaluate the ‘R’ along the 3 dimensions listed above. Additionally, for control, the UER was also given examples (not shown here) which included real ‘R’s in order to minimize bias. The UER was not paid for this experiment.

Other version of multi-task learning can incorporate different error metrics or a larger diversity of tasks. In this case, the multiple tasks are closely related, though still provide the benefit of task transfer.

For completeness, we also analyzed the ‘R’s generated by the one-letter-at-a-time networks. They had similar performance (when measured with D) to the ‘R’ row shown in Table 3, with (6%) higher SSE.

This line of inquiry was sparked by discussions with Zhangyang Wang.

The substantial processes of segmenting, cleaning, centering and pre-processing the fonts from photographs are beyond the scope of this paper. We solely address the retrieval portion of this task. We do this by assuming the target font can be cleaned and segmented to yield input grayscale images such as used in this study. For a review of character segmentation, please see [10].

10000Fontscom (2016) Download. http://www.10000fonts.com/catalog/

Aucouturier, J.J., Pachet, F.: Representing musical genre: a state of the art. J. New Music Res. 32(1), 83–93 (2003)CrossRef

Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf. Learn. Chall. Mach Learn. 7, 19 (2012)

Bernhardsson, E.: Analyzing 50k fonts using deep neural networks. http://erikbern.com/2016/01/21/analyzing-50k-fonts-using-deep-neural-networks/ (2016)

Beymer, D., Russell, D., Orton, P.: An eye tracking study of how font size and type influence online reading. In: Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction-Volume 2, pp. 15–18. British Computer Society, (2008)

Bowey, M.: A 20 minute intro to typography basics. http://design.tutsplus.com/articles/a-20-minute-intro-to-typography-basics--psd-3326 (2009)

Bowey, M.: A fontastic voyage: generative fonts with adversarial networks. http://multithreaded.stitchfix.com/blog/2016/02/02/a-fontastic-voyage (2016)

Campbell, N.D., Kautz, J.: Learning a manifold of fonts. ACM Trans. Graph. (TOG) 33(4), 91 (2014)CrossRef

Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRef

10.

Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)CrossRef

11.

Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE, (2012)

12.

Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATH

13.

Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds.) Advances in Neural Information Processing Systems 28, pp. 1486–1494. Curran Associates, Inc. http://papers.nips.cc/paper/5773-deep-generative-image-models-using-a-laplacian-pyramid-of-adversarial-networks.pdf (2015)

14.

Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV, pp. 184–199. Springer, (2014)

15.

Dosovitskiy, A., Brox, T.: Generating Images with Perceptual Similarity Metrics based on Deep Networks. ArXiv e-prints arXiv:1602.02644 (2016)

16.

Eck, D., Schmidhuber, J.: A first look at music composition using LSTM recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale 103 (2002)

17.

Feng, J.C., Tse, C., Qiu, Y.: Wavelet-transform-based strategy for generating new chinese fonts. In: Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS’03, vol. 4, pp. IV–IV . IEEE, (2003)

18.

Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129. (2013)

19.

Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. CoRR arXiv:1508.06576 (2015)

20.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680. (2014)

21.

Graves, A.: Generating sequences with recurrent neural networks. CoRR arXiv:1308.0850 (2013)

22.

Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)

23.

Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: a recurrent neural network for image generation. CoRR arXiv:1502.04623 (2015)

24.

Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., Gool, L.: The interestingness of images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1633–1640 (2013)

25.

Ha, S.: The influence of design factors on trust in a bank’s website. Digital Repository@ Iowa State University, http://lib.dr.iastate.edu/etd/10858/ (2009)

26.

Hassan, T., Hu, C., Hersch, R.D.: Next generation typeface representations: revisiting parametric fonts. In: Proceedings of the 10th ACM symposium on Document engineering, pp. 181–184. ACM, (2010)

27.

Hu, C., Hersch, R.D.: Parameterizable fonts based on shape components. IEEE Comput. Graph. Appl. 21(3), 70–85 (2001)

28.

Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7304–7308 . IEEE, (2013)

29.

Hutchings, E.: Typeface timeline shows us the history of fonts. http://www.psfk.com/2012/04/history-of-fonts.html (2014)

30.

Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)

31.

Im, J.D., Kim, C.D., Jiang, H., Memisevic, R.: Generating images with recurrent adversarial networks. arXiv e-prints arXiv:1602.05110 (2016)

32.

Joachims, T.: Making large scale SVM learning practical. Universität Dortmund, Technical report (1999)

33.

Karayev, S., Hertzmann, A., Winnemoeller, H., Agarwala, A., Darrell, T.: Recognizing image style. CoRR arXiv:1311.3715 (2013)

34.

Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732. (2014)

35.

Khosla, A., Das Sarma, A., Hamid, R.: What makes an image popular? In: Proceedings of the 23rd International Conference on World Wide Web, pp. 867–876. ACM, (2014)

36.

Kwok, K.W., Wong, S.M., Lo, K.W., Yam, Y.: Genetic algorithm-based brush stroke generation for replication of Chinese calligraphic character. In: IEEE Congress on Evolutionary Computation, 2006. CEC 2006, pp. 1057–1064 . IEEE, (2006)

37.

LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRef

38.

Li, Y.M., Yeh, Y.S.: Increasing trust in mobile commerce through design aesthetics. Comput. Hum. Behav. 26(4), 673–684 (2010)CrossRef

39.

Lun, Z., Kalogerakis, E., Sheffer, A.: Elements of style: learning perceptual shape style similarity. ACM Trans. Graph. (TOG) 34(4), 84 (2015)CrossRef

40.

Miyazaki, T., Tsuchiya, T., Sugaya, Y., Omachi, S., Iwamura, M., Uchida, S., Kise, K.: Automatic generation of typographic font from a small font subset. CoRR arXiv:1701.05703 (2017)

41.

Mordvintsev, A., Olah, C., Tyka, M.: Inceptionism: going deeper into neural networks. http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html (2015)

42.

Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724. (2014)

43.

Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 3. (2016)

44.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.: Imagenet large scale visual recognition challenge. CoRR arXiv:1409.0575 (2014)

45.

Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)MathSciNetCrossRefMATH

46.

Suveeranont, R., Igarashi, T.: Example-based automatic font generation. In: International Symposium on Smart Graphics, pp. 127–138. Springer (2010)

47.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

48.

Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)CrossRef

49.

Tschichold, J.: Treasury of alphabets and lettering: a source book of the best letter forms of past and present for sign painters, graphic artists, commercial artists, typographers, printers, sculptors, architects, and schools of art and design. A Norton professional book, Norton. https://books.google.com/books?id=qQB7lqSrpnoC (1995)

50.

Upchurch, P., Snavely, N., Bala, K.: From A to Z: supervised transfer of style and content using deep neural network generators. ArXiv e-prints arXiv:1603.02003 (2016)

51.

Van Santen, J.P., Sproat, R., Olive, J., Hirschberg, J.: Progress in Speech Synthesis. Springer, Berlin (2013)MATH

52.

Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–139. (2014)

53.

Wang, M.: Multi-path convolutional neural networks for complex image classification. CoRR arXiv:1506.04701 (2015)

54.

Wang, Y., Wang, H., Pan, C., Fang, L.: Style preserving Chinese character synthesis based on hierarchical representation of character. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, pp. 1097–1100. IEEE, (2008)

55.

Wang, Z., Yang, J., Jin, H., Shechtman, E., Agarwala, A., Brandt, J., Huang, T.S.: Deepfont: identify your font from an image. In: Proceedings of the 23rd ACM international conference on Multimedia, pp. 451–459. ACM, (2015)

56.

Willats, J., Durand, F.: Defining pictorial style: lessons from linguistics and computer graphics. Axiomathes 15(3), 319–351 (2005)CrossRef

57.

Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798. (2014)

58.

Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)

59.

Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 3517–3521. IEEE, (2013)

60.

Zong, A., Zhu, Y.: Strokebank: automating personalized Chinese handwriting generation. In: AAAI, pp. 3024–3030 (2014)

Title: Learning typographic style: from discrimination to synthesis
Author: Shumeet Baluja
Publication date: 09-05-2017
Publisher: Springer Berlin Heidelberg
Published in: Machine Vision and Applications / Issue 5-6/2017
Print ISSN: 0932-8092
Electronic ISSN: 1432-1769
DOI: https://doi.org/10.1007/s00138-017-0842-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 5-6/2017

Staff-line detection and removal using a convolutional neural network

A vision-based system to support tactical and physical analyses in futsal

Power difference template for action recognition

Covert photo classification by deep convolutional neural networks

Solving the robot-world hand-eye(s) calibration problem with iterative methods

TabletGaze: dataset and analysis for unconstrained appearance-based gaze estimation in mobile tablets

Premium Partner