Appearance model is one of the most important components for online visual tracking. An effective appearance model needs to strike the right balance between being adaptive, to account for appearance change, and being conservative, to re-track the object after it loses tracking (
., due to occlusion). Most conventional appearance models focus on one aspect out of the two, and hence are not able to achieve the right balance. In this paper, we approach this problem by a max-margin learning framework collaborating a descriptive component and a discriminative component. Particularly, the two components are for different purposes and with different lifespans. One forms a robust object model, and the other tries to distinguish the object from the current background. Taking advantages of their complementary roles, the components improve each other and collaboratively contribute to a shared score function. Besides, for realtime implementation, we also propose a series of optimization and sample-management strategies. Experiments over 30 challenging videos demonstrate the effectiveness and robustness of the proposed tracker. Our method generally outperforms the existing state-of-the-art methods.