Abstract
Multi-task learning plays an important role in face multi-attribute prediction. At present, most researches excavate the shared information between attributes by sharing all convolutional layers. However, it is not appropriate to treat the low-level and high-level features of the face multi-attribute equally, because the high-level features are more biased toward the specific content of the category. In this article, a novel multi-attribute tensor correlation neural network (MTCN) is used to predict face attributes. MTCN shares all attribute features at the low-level layers, and then distinguishes each attribute feature at the high-level layers. To better excavate the correlations among high-level attribute features, each sub-network explores useful information from other networks to enhance its original information. Then a tensor canonical correlation analysis method is used to seek the correlations among the highest-level attributes, which enhances the original information of each attribute. After that, these features are mapped into a highly correlated space through the correlation matrix. Finally, we use sufficient experiments to verify the performance of MTCN on the CelebA and LFWA datasets and our MTCN achieves the best performance compared with the latest multi-attribute recognition algorithms under the same settings.
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Retrieved from https://arXiv:1603.04467.Google Scholar
- A. H. Abdulnabi, G. Wang, J. Lu, and K. Jia. 2015. Multi-task CNN model for attribute prediction. IEEE Trans. Multimedia 17, 11 (Nov. 2015), 1949--1959. DOI:https://doi.org/10.1109/TMM.2015.2477680Google ScholarDigital Library
- Jiajiong Cao, Yingming Li, and Zhongfei Zhang. 2018. Partially shared multi-task convolutional neural network with local constraint for face attribute learning. In Proceedings of the CVPR. 4290--4299.Google ScholarCross Ref
- J. Douglas Carroll and Jih-Jie Chang. 1970. Analysis of individual differences in multidimensional scaling via an N-way generalization of Eckart-Young decomposition. Psychometrika 35, 3 (1970), 283--319.Google ScholarCross Ref
- Pierre Comon, Xavier Luciani, and André L. F. De Almeida. 2009. Tensor decompositions, alternating least squares and other tales. J. Chemo.: J. Chemo. Soc. 23, 7--8 (2009), 393--405.Google Scholar
- Cottrell, W Garrison, Metcalfe, and Janet. 1990. EMPATH: Face, emotion, and gender recognition using holons. In Proceedings of the NIPs. 564--571.Google Scholar
- A. Dantcheva and F. Bremond. 2017. Gender estimation based on smile-dynamics. IEEE Trans. Info. Forensics Secur. 12, 3 (Mar. 2017), 719--729. DOI:https://doi.org/10.1109/TIFS.2016.2632070Google ScholarDigital Library
- Hamdi Dibeklioğlu, Fares Alnajar, Albert Ali Salah, and Theo Gevers. 2015. Combining facial dynamics with appearance for age estimation. IEEE TIP 24, 6 (2015), 1928--1943.Google Scholar
- Hui Ding, Hao Zhou, Shaohua Kevin Zhou, and Rama Chellappa. 2018. A deep cascade network for unaligned face attribute classification. In Proceedings of the AAAI.Google Scholar
- M. Duan, K. Li, and K. Li. 2018. An ensemble CNN2ELM for age estimation. IEEE Trans. Info. Forensics Secur. 13, 3 (Mar. 2018), 758--772. DOI:https://doi.org/10.1109/TIFS.2017.2766583Google ScholarCross Ref
- Mingxing Duan, Kenli Li, Xiangke Liao, Keqin Li, and Qi Tian. 2019. Features-enhanced multi-attribute estimation with convolutional tensor correlation fusion network. ACM Trans. Multimedia Comput. Commun. Appl. 15, 3s (2019), 1--23.Google ScholarDigital Library
- Max Ehrlich, Timothy J. Shields, Timur Almaev, and Mohamed R. Amer. 2016. Facial attributes classification using multi-task representation learning. In Proceedings of the CVPR Workshops. 47--55.Google Scholar
- Nour El Din Elmadany, Yifeng He, and Ling Guan. 2016. Multiview learning via deep discriminative canonical correlation analysis. In Proceedings of the IEEE ICASSP. 2409--2413.Google ScholarDigital Library
- S. Fu, H. He, and Z. G. Hou. 2014. Learning race from face: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 36, 12 (Dec. 2014), 2483--2509. DOI:https://doi.org/10.1109/TPAMI.2014.2321570Google ScholarCross Ref
- Yun Fu, Guodong Guo, and Thomas S. Huang. 2010. Age synthesis and estimation via faces: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 32, 11 (2010), 1955--1976.Google ScholarDigital Library
- Lei Gao, Rui Zhang, Lin Qi, Enqing Chen, and Ling Guan. 2018. The labeled multiple canonical correlation analysis for information fusion. IEEE Trans. Multimedia 21, 2 (2018), 375--387.Google ScholarDigital Library
- G. Guo and G. Mu. 2010. Human age estimation: What is the influence across race and gender? In Proceedings of the CVPR Workshops. 71--78. DOI:https://doi.org/10.1109/CVPRW.2010.5543609Google Scholar
- Guodong Guo and Guowang Mu. 2014. A framework for joint estimation of age, gender and ethnicity on a large database. Image Vision Comput. 32, 10 (2014), 761--770.Google ScholarCross Ref
- Hu Han, Anil K. Jain, Fang Wang, Shiguang Shan, and Xilin Chen. 2018. Heterogeneous face attribute estimation: A deep multi-task learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 40, 11 (2018), 2597--2609.Google ScholarDigital Library
- Emily M. Hand and Rama Chellappa. 2017. Attributes for improved attributes: A multi-task network utilizing implicit and explicit relationships for facial attribute classification. In Proceedings of the AAAI. 4068--4074.Google Scholar
- David R. Hardoon, Sandor R. Szedmak, and John R. Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16, 12 (2004), 2639.Google ScholarDigital Library
- Sandor Szedmak Hardoon, David R. and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16, 12 (2004), 2639--2664.Google ScholarDigital Library
- Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2016. Learning deep representation for imbalanced classification. In Proceedings of the CVPR. 5375--5384.Google ScholarCross Ref
- Chen Huang, Yining Li, Chen Change Loy, and Xiaoou Tang. 2020. Deep imbalanced learning for face recognition and attribute prediction. IEEE Trans. Pattern Anal. Mach. Intell. 42, 11 (2020), 2781--2794. DOI:10.1109/TPAMI.2019.2914680Google ScholarCross Ref
- Gary B. Huang, Marwan Mattar, Tamara Berg, and Eric Learned-Miller. 2008. Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Workshop on Faces in “Real-Life” Images: Detection, Alignment, and Recognition. Erik Learned-Miller, Andras Ferencz, and Frédéric Jurie, Marseille, France. (inria-00321923). https://hal.inria.fr/inria-00321923/file/Huang_long_eccv2008-lfw.pdf.Google Scholar
- Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, and Alexander Hauptmann. 2018. Gnas: A greedy neural architecture search method for multi-attribute learning. In ACM Multimedia. 2049--2057.Google ScholarDigital Library
- S. J. Hwang, F. Sha, and K. Grauman. 2011. Sharing features between objects and their attributes. In Proceedings of the CVPR. 1761--1768. DOI:https://doi.org/10.1109/CVPR.2011.5995543Google Scholar
- Mahdi M. Kalayeh, Boqing Gong, and Mubarak Shah. 2017. Improving facial attribute prediction using semantic segmentation. In Proceedings of the CVPR. 6942--6950.Google ScholarCross Ref
- Tae-Kyun Kim, Shu-Fai Wong, and Roberto Cipolla. 2007. Tensor canonical correlation analysis for action classification. In Proceedings of the CVPR. IEEE, 1--8.Google ScholarCross Ref
- Pieter M. Kroonenberg and Jan De Leeuw. 1980. Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45, 1 (1980), 69--97.Google ScholarCross Ref
- Neeraj Kumar, Peter Belhumeur, and Shree Nayar. 2008. FaceTracer: A search engine for large collections of images with faces. In Proceedings of the ECCV. 340--353.Google ScholarDigital Library
- Young Ho Kwon and N. Da Vitoria Lobo. 1994. Age classification from facial images. In Proceedings of the CVPR. 762--767.Google Scholar
- Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. 2006. On the best Rank-1 and Rank-(R1, R2,…, RN) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21, 4 (2006), 1324--1342.Google ScholarDigital Library
- Gil Levi and Tal Hassncer. 2015. Age and gender classification using convolutional neural networks. In Proceedings of the CVPR Workshops. 34--42.Google ScholarCross Ref
- Qiaozhe Li, Xin Zhao, Ran He, and Kaiqi Huang. 2019. Visual-semantic graph reasoning for pedestrian attribute recognition. In Proceedings of the AAAI, Vol. 33. 8634--8641.Google ScholarCross Ref
- Zhifeng Li, Dihong Gong, Qiang Li, Dacheng Tao, and Xuelong Li. 2016. Mutual component analysis for heterogeneous face recognition. ACM Trans. Intell. Syst. Technol. 7, 3 (2016), 28.Google ScholarDigital Library
- Giuseppe Lisanti, Svebor Karaman, and Iacopo Masi. 2017. Multi channel-Kernel Canonical Correlation Analysis for Cross-View Person Reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (2017), 13.Google ScholarDigital Library
- Fan Liu, Jinhui Tang, Yan Song, Liyan Zhang, and Zhenmin Tang. 2015. Local structure-based sparse representation for face recognition. ACM Trans. Intell. Syst. Technol. 7, 1 (2015), 2.Google ScholarDigital Library
- Kuan Hsien Liu, Shuicheng Yan, and C. C. Jay Kuo. 2015. Age estimation via grouping and decision fusion. IEEE Trans. Info. Forensics Secur. 10, 11 (2015), 2408--2423.Google ScholarDigital Library
- Z. Liu, P. Luo, X. Wang, and X. Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the ICCV. 3730--3738. DOI:https://doi.org/10.1109/ICCV.2015.425Google Scholar
- Yong Luo, Dacheng Tao, Kotagiri Ramamohanarao, Chao Xu, and Yonggang Wen. 2015. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans. Knowl. Data Eng. 27, 11 (2015), 3111--3124.Google ScholarDigital Library
- Chao Ma, Jia-Bin Huang, Xiaokang Yang, and Ming-Hsuan Yang. 2015. Hierarchical convolutional features for visual tracking. In Proceedings of the ICCV. 3074--3082.Google ScholarDigital Library
- S. Mehrkanoon and J. A. K. Suykens. 2018. Regularized semipaired kernel CCA for domain adaptation. IEEE Trans. Neural Netw. Learn. Syst. 29, 7 (July 2018), 3199--3213. DOI:https://doi.org/10.1109/TNNLS.2017.2728719Google Scholar
- Venkatesh N. Murthy, Subhransu Maji, and R. Manmatha. 2015. Automatic image annotation using deep learning representations. Proceedings of the 5th ICMR. 603--606.Google Scholar
- X. Ning, W. Li, B. Tang, and H. He. 2018. BULDP: Biomimetic uncorrelated locality discriminant projection for feature extraction in face recognition. IEEE Trans. Image Process. 27, 5 (May 2018), 2575--2586. DOI:https://doi.org/10.1109/TIP.2018.2806229Google ScholarCross Ref
- G. J. Qi, C. Aggarwal, Q. Tian, H. Ji, and T. Huang. 2012. Exploring context and content links in social media: A latent space method. IEEE Trans. Pattern Anal. Mach. Intell. 34, 5 (May 2012), 850--862. DOI:https://doi.org/10.1109/TPAMI.2011.191Google Scholar
- Guo Jun Qi, Xian Sheng Hua, and Hong Jiang Zhang. 2009. Learning semantic distance from community-tagged media collection. In Proceedings of the ICME. 243--252.Google ScholarDigital Library
- R. Ranjan, V. M. Patel, and R. Chellappa. 2017. HyperFace: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans. Pattern Anal. Mach. Intell. PP, 99 (2017), 1--1. DOI:https://doi.org/10.1109/TPAMI.2017.2781233Google Scholar
- Rasmus Rothe, Radu Timofte, and Luc Van Gool. 2018. Deep expectation of real and apparent age from a single image without facial landmarks. Int. J. Comput. Vision 126, 2–4 (2018), 144–157.Google ScholarDigital Library
- Ethan M. Rudd, Manuel Günther, and Terrance E. Boult. 2016. Moon: A mixed objective optimization network for the recognition of facial attributes. In Proceedings of the ECCV. Springer, 19--35.Google Scholar
- C. O. Sakar and O. Kursun. 2017. Discriminative feature extraction by a neural implementation of canonical correlation analysis. IEEE Trans. Neural Netw. Learn. Syst. 28, 1 (Jan 2017), 164--176. DOI:https://doi.org/10.1109/TNNLS.2015.2504724Google ScholarCross Ref
- Richard Socher and Fei Fei Li. 2010. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In Proceedings of the CVPR. 966--973.Google ScholarCross Ref
- Zichang Tan, Jun Wan, Zhen Lei, Ruicong Zhi, Guodong Guo, and Stan Z. Li. 2017. Efficient group-n encoding and decoding for facial age estimation. IEEE Trans. Pattern Anal. Mach. Intell. 40, 11 (2017), 2610--2623.Google ScholarDigital Library
- Zichang Tan, Yang Yang, Wan Jun, Guodong Guo, and Stan Z. Li. 2020. Relation-aware pedestrian attribute recognition with graph convolutional networks. In Proceedings of the AAAI.Google Scholar
- Zichang Tan, Yang Yang, Jun Wan, Hanyuan Hang, Guodong Guo, and Stan Z. Li. 2019. Attention-based pedestrian attribute analysis. IEEE Trans. Image Process. 28, 12 (2019), 6126--6140.Google ScholarDigital Library
- J. Tang, Y. Tian, P. Zhang, and X. Liu. 2018. Multiview privileged support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 29, 8 (Aug. 2018), 3463--3477. DOI:https://doi.org/10.1109/TNNLS.2017.2728139Google Scholar
- Michal Uricar, Radu Timofte, Rasmus Rothe, Jiri Matas, and Luc Van Gool. 2016. Structured output SVM prediction of apparent age, gender and smile from deep features. In Proceedings of the CVPR Workshops. 730--738.Google ScholarCross Ref
- Alexei Vinokourov, John Shawe-Taylor, and Nello Cristianini. 2002. Inferring a semantic representation of text via cross-language correlation analysis. In Proceedings of the NIPs. 1497--1504.Google Scholar
- Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. [n.d.]. On deep multi-view representation learning. In Proceedings of the ICML. 1083–1092.Google Scholar
- Z. Wu, Q. Ke, J. Sun, and H. Y. Shum. 2011. Scalable face image retrieval with identity-based quantization and multireference reranking. IEEE Trans. Pattern Anal. Mach. Intell. 33, 10 (Oct. 2011), 1991--2001. DOI:https://doi.org/10.1109/TPAMI.2011.111Google Scholar
- Liping Xie, Dacheng Tao, and Haikun Wei. 2018. Early expression detection via online multi-instance learning with nonlinear extension. IEEE Trans. Neural Netw. Learn. Syst. 30, 5 (2018), 1486--1496.Google ScholarCross Ref
- Fei Yan and Krystian Mikolajczyk. 2015. Deep correlation for matching images and text. In Proceedings of the CVPR. 3441--3450.Google ScholarCross Ref
- Xinghao Yang, Weifeng Liu, Dapeng Tao, and Jun Cheng. 2017. Canonical correlation analysis networks for two-view image recognition. Info. Sci. Int. J. 385, C (2017), 338--352.Google Scholar
- Ting Yao, Tao Mei, and Chong Wah Ngo. 2015. Learning query and image similarities with ranking canonical correlation analysis. In Proceedings of the ICCV. 28--36.Google ScholarDigital Library
- Dong Yi, Zhen Lei, and Stan Z. Li. 2014. Age estimation by multi-scale convolutional network. In Proceedings of the ACCV. 144--158.Google Scholar
- J. Yu, X. Yang, F. Gao, and D. Tao. 2017. Deep multimodal distance metric learning using click constraints for image ranking. IEEE Trans. Cybernet. 47, 12 (Dec 2017), 4014--4024. DOI:https://doi.org/10.1109/TCYB.2016.2591583Google ScholarCross Ref
- Jun Yu, Chaoyang Zhu, Jian Zhang, Qingming Huang, and Dacheng Tao. 2019. Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans. Neural Netw. Learn. Syst. 31, 2 (2019), 661--674.Google ScholarCross Ref
- Yang Zhong, Josephine Sullivan, and Haibo Li. 2016. Face attribute prediction using off-the-shelf CNN features. In Proceedings of the IJCB. 1--7.Google ScholarCross Ref
Index Terms
- A Novel Multi-task Tensor Correlation Neural Network for Facial Attribute Prediction
Recommendations
Features-Enhanced Multi-Attribute Estimation with Convolutional Tensor Correlation Fusion Network
Special Issue on Face Analysis for Applications and Special Issue on Affective Computing for Large-Scale Heterogeneous Multimedia DataTo achieve robust facial attribute estimation, a hierarchical prediction system referred to as tensor correlation fusion network (TCFN) is proposed for attribute estimation. The system includes feature extraction, correlation excavation among facial ...
Multi-Task Learning with Deep Dual-Path Network for Facial Attribute Recognition
ICCPR '20: Proceedings of the 2020 9th International Conference on Computing and Pattern RecognitionFacial attribute recognition is a popular and challenging research topic in computer vision. In the traditional deep learning based attribute recognition methods, the mid-level network features and the differences between attribute groups are not fully ...
Correlated attribute transfer with multi-task graph-guided fusion
MM '12: Proceedings of the 20th ACM international conference on MultimediaDue to the describable or human-nameable nature of visual attributes, the attribute-based methods have been receiving much attentions in recent years in many applications. The advantages of the utilization of visual attributes are that they can be ...
Comments