Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning | IEEE Conference Publication | IEEE Xplore