nach oben

Neural Computing and Applications

Erschienen in:

07.06.2018 | Original Article

Sitcom-star-based clothing retrieval for video advertising: a deep learning framework

verfasst von: Haijun Zhang, Yuzhu Ji, Wang Huang, Linlin Liu

Erschienen in: Neural Computing and Applications | Ausgabe 11/2019

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

This paper presents a novel learning-based framework for video content-based advertising, DeepLink, which aims at linking Sitcom-stars and online shops with clothing retrieval by using state-of-the-art deep convolutional neural networks (CNNs). Specifically, several deep CNN models are adopted for composing multiple sub-modules in DeepLink, including human-body detection, human pose selection, face verification, clothing detection and retrieval from advertisements (ads) pool that is constructed by clothing images crawled from real-world online shops. For clothing detection and retrieval from ad-images, we firstly transfer the state-of-the-art deep CNN models to our data domain, and then train corresponding models based on our constructed large-scale clothes datasets. Extensive experimental results demonstrate the feasibility and efficacy of our proposed clothing-based video advertising system.

Vorheriger Artikel Sparse regularized discriminative canonical correlation analysis for multi-view semi-supervised learning

Nächster Artikel Topological structure regularized nonnegative matrix factorization for image clustering

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Andriluka M, Pishchulin L, Gehler P, Schiele B (2014) 2d human pose estimation: new benchmark and state of the art analysis. In: 2014 IEEE conference on computer vision and pattern recognition. IEEE, pp 3686–3693

Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2016) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv preprint arXiv:1606.00915

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009. IEEE, pp 248–255

Erhan D, Szegedy C, Toshev A, Anguelov D (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154

Erin Liong V, Lu J, Wang G, Moulin P, Zhou J (2015) Deep hashing for compact binary codes learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2475–2483

Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338CrossRef

Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645CrossRef

Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

10.

Hadi Kiapour M, Han X, Lazebnik S, Berg AC, Berg TL (2015) Where to buy it: matching street clothing photos in online shops. In: Proceedings of the IEEE international conference on computer vision, pp 3343–3351

11.

He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision. Springer, pp 346–361

12.

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

13.

Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRef

14.

Hou S, Zhou S, Chen L, Feng Y, Awudu K (2016) Multi-label learning with label relevance in advertising video. Neurocomputing 171:932–948CrossRef

15.

Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst

16.

Huang J, Feris RS, Chen Q, Yan S (2015) Cross-domain image retrieval with a dual attribute-aware ranking network. In: Proceedings of the IEEE international conference on computer vision, pp 1062–1070

17.

Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

18.

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675–678

19.

Karpathy A, Fei-Fei L (2015) Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3128–3137

20.

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

21.

Larsson G, Maire M, Shakhnarovich G (2016) Fractalnet: ultra-deep neural networks without residuals. arXiv preprint arXiv:1605.07648

22.

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRef

23.

LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324CrossRef

24.

Li Y, Wan KW, Yan X, Xu C (2005) Real time advertisement insertion in baseball video based on advertisement effect. In: Proceedings of the 13th annual ACM international conference on Multimedia. ACM, pp 343–346

25.

Lin K, Yang HF, Hsiao JH, Chen CS (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35

26.

Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400

27.

Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2015) Ssd: single shot multibox detector. arXiv preprint arXiv:1512.02325

28.

Liu X, Kan M, Wu W, Shan S, Chen X (2017) Viplfacenet: an open source deep face recognition sdk. Front Comput Sci. https://doi.org/10.1007/s11704-016-6076-3 CrossRef

29.

Liu Z, Luo P, Qiu S, Wang X, Tang X (2016) Deepfashion: powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1096–1104

30.

Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

31.

López-Nores M, Blanco-Fernández Y, Pazos-Arias JJ (2013) Cloud-based personalization of new advertising and e-commerce models for video consumption. Comput J 56(5):573–592CrossRef

32.

Mei T, Hua XS, Li S (2009) Videosense: a contextual in-video advertising system. IEEE Trans Circuits Syst Video Technol 19(12):1866–1879CrossRef

33.

Murala S, Maheshwari RP, Balasubramanian R (2012) Local tetra patterns: a new feature descriptor for content-based image retrieval. IEEE Trans Image Process 21(5):2874–2886MathSciNetCrossRef

34.

Noh H, Hong S, Han B (2015) Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 1520–1528

35.

Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987CrossRef

36.

Redmon J, Divvala S, Girshick R, Farhadi A (2015) You only look once: unified, real-time object detection. arXiv preprint arXiv:1506.02640

37.

Redondo RPD, Vilas AF, Arias JJP, Cabrer MR, Solla AG, Duque JG (2012) Bringing content awareness to web-based idtv advertising. IEEE Trans Syst Man Cybern C (Appl Rev) 42(3):324–333CrossRef

38.

Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

39.

Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496CrossRef

40.

Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

41.

Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229

42.

Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Commun ACM 56(1):116–124CrossRef

43.

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

44.

Sun Y, Chen Y, Wang X, Tang X (2014) Deep learning face representation by joint identification-verification. In: Advances in neural information processing systems, pp 1988–1996

45.

Sun Y, Liang D, Wang X, Tang X (2015) Deepid3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873

46.

Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3476–3483

47.

Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1891–1898

48.

Sun Y, Wang X, Tang X (2015) Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2892–2900

49.

Szegedy C, Ioffe S, Vanhoucke V (2016) Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261

50.

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

51.

Szegedy C, Reed S, Erhan D, Anguelov D (2014) Scalable, high-quality object detection. arXiv preprint arXiv:1412.1441

52.

Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567

53.

Tan XY, Triggs B (2010) Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans Image Process 19(6):1635–1650MathSciNetCrossRef

54.

Toshev A, Szegedy C (2014) Deeppose: human pose estimation via deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1653–1660

55.

Uijlings JR, van de Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171CrossRef

56.

Wang J, Wang B, Duan LY, Tian Q, Lu H (2014) Interactive ads recommendation with contextual search on product topic space. Multimed Tools Appl 70(2):799–820CrossRef

57.

Wolf L, Hassner T, Maoz I (2011) Face recognition in unconstrained videos with matched background similarity. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 529–534

58.

Xia R, Pan Y, Lai H, Liu C, Yan S (2014) Supervised hashing for image retrieval via image representation learning. In: AAAI, vol 1, p 2

59.

Xie S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403

60.

Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. 2(3):5. arXiv preprint arXiv:1502.03044 (2015)

61.

Yadati K, Katti H, Kankanhalli M (2014) Cavva: computational affective video-in-video advertising. IEEE Trans Multimed 16(1):15–23CrossRef

62.

Yi D, Lei Z, Liao S, Li SZ (2014) Learning face representation from scratch. arXiv preprint arXiv:1411.7923

63.

Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

64.

Zhang H, Cao X, Ho JK, Chow TW (2017) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531CrossRef

Titel: Sitcom-star-based clothing retrieval for video advertising: a deep learning framework
verfasst von: Haijun Zhang
Yuzhu Ji
Wang Huang
Linlin Liu
Publikationsdatum: 07.06.2018
Verlag: Springer London
Erschienen in: Neural Computing and Applications / Ausgabe 11/2019
Print ISSN: 0941-0643
Elektronische ISSN: 1433-3058
DOI: https://doi.org/10.1007/s00521-018-3579-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 11/2019

A local cores-based hierarchical clustering algorithm for data sets with complex structures

Robust subspace learning-based low-rank representation for manifold clustering

Some new dual hesitant fuzzy linguistic operators based on Archimedean t-norm and t-conorm

Robust feature extraction and uncertainty estimation based on attractor dynamics in cyclic deep denoising autoencoders

Artificial intelligence design charts for predicting friction capacity of driven pile in clay

Prediction of the vertical force during FSW of AZ31 magnesium alloy sheets using an artificial neural network-based model

Premium Partner