Image co-segmentation is popular with its ability to detour supervisory data by exploiting the common information in multiple images. In this paper, we aim at a more challenging branch called scene image co-segmentation, which jointly segments multiple images captured from the same scene into regions corresponding to their respective classes. We first put forward a novel representation named
Visual Relation Network
(VRN) to organize multiple segments, and then search for meaningful segments for every image through voting on the network. Scalable topic-level random walk is then used to solve the voting problem. Experiments on the benchmark MSRC-v2, the more difficult LabelMe and SUN datasets show the superiority over the state-of-the-art methods.