The just-in-time estimation of farmland traits such as biomass yield can aid considerably in the optimisation of agricultural processes. Data in domains such as precision farming is however notoriously expensive to collect and deep learning driven modelling approaches need to maximise performance but also acknowledge this reality. In this paper we present a study in which a platform was deployed to collect data from a heterogeneous collection of sensor types including visual, NIR, and LiDAR sources to estimate key pastureland traits. In addition to introducing the study itself we address two key research questions. The first of these was the trade off of multi-modal modelling against a more basic image driven methodology, while the second was the investigation of patch size variability in the image processing backbone. This second question relates to the fact that individual images of vegetation and in particular grassland are texturally rich, but can be uniform, enabling subdivision into patches. However, there may be a trade-off between patch-size and number of patches generated. Our modelling used a number of CNN architectural variations built on top of Inception Resnet V2, MobileNet, and shallower custom networks. Using minimum Mean Absolute Percentage Error (MAPE) on the validation set as our metric, we demonstrate strongest performance of 28.23% MAPE on a hybrid model. A deeper dive into our analysis demonstrated that working with fewer but larger patches of data performs as well or better for true deep models – hence requiring the consumption of less resources during training.