Object tracking has made remarkable progress in the past few years. But most advanced trackers are becoming more expensive, which limits their deployment in mobile devices with limited resources. In addition, the current popular tracker realizes similarity learning through the feature correlation between multiple branches. Some of these cross-correlation methods lost a lot of face information, and some introduced a lot of unfavorable background information. Based on this motivation, this paper is committed to reducing the number of algorithm parameters and enhancing the ability of feature extraction. Heterogeneous convolution is introduced into the backbone network to reduce the convolution kernel parameters. Add a search box mechanism to dynamically adjust the network receiving domain to generate more feature maps with cheap operations. Furthermore, we also integrate the split-attention mechanism into the backbone network to standardize the arrangement of heterogeneous convolution. To evaluate the model, we conducted experiments on challenging VTB datasets and actual shooting datasets, which contain 82,351 facial features. Experimental results show that our method distance precision (DP) and overlap success precision (OP) are 93.5% and 67.5% respectively, which are comparable with the state-of-the-art object tracking methods and reduce about one-third of the parameters. Meanwhile, the feature mapping of each convolution module is explored, and the interpretation of lightweight convolution is given.