Data-driven estimation of building energy consumption with multi-source heterogeneous data
Introduction
Buildings need to consume energy at each stage of their life cycle with at least 50 years, which will share approximately 40% of total world energy usage and 30% of global greenhouse gas emissions [1]. For instance, in the United States (U.S.), 21% of the U.S. total energy use stems from residential sectors followed by commercial sectors at 19% [2]. For buildings in major cities of the U.S., they are even responsible for around 75% of energy use in the city [3]. In the European Union (EU), buildings are considered as the biggest contributor to energy use taking up 40% of the EU primary energy [4]. As for the non-residential buildings, the energy they consumed occupies 25% in the total European building stock, which has continually grown by 2.5% every year between 2000 and 2008 [5]. In China, buildings account for 25% and 6% of the national and global energy consumption, respectively [6], [7]. More than that, the growth rate of the building energy consumption is supposed to keep rising due to the strong development momentum in China. With the growth of populations and the acceleration of urbanization, more and more buildings will be constructed and worked to bring about the upward trend in the amount of energy consumption, which is predicted to rise by 45% from 2002 to 2025 [8]. Nevertheless, too high building energy consumption will raise negative impacts on the environment, such as air pollutions, greenhouse effect, urban heat island effect, and others, which can even do plenty of harm to human health and social economy development. Experts highlight that one of the efficient pathways to promote long-term and cost-effective energy conservation and emissions reduction is to decrease the energy consumption as much as possible [9]. Since buildings are the major energy-consuming sectors in the world causing energy inefficiency, they can act as a promising target with the greatest potential to reach the common goal toward sustainable development [10].
Remarkably, a good understanding of the building energy profile plays a significant role in decision making for energy management and conservation [11]. For one thing, it provides insights into the energy behavior in buildings, which serve as evidence for building owners to put forward reasonable building operation plans and design more energy-efficient buildings [12]. For another, it offers valuable chances of detecting outliers in energy consumption to definitely perceive risk ahead of time [13]. Accordingly, building owners can take prompt and reasonable actions to mitigate the failure from energy loss. It is estimated that buildings created based upon the proper energy forecast could expectedly reduce their system consumption by at least 10%, which will eventually slow the resource depletion and minimize the environmental impacts [14]. Therefore, the special concern should be raised toward a higher accuracy of the building energy consumption prediction, contributing to evaluating and improving the building energy performance.
Actually, it is not a straightforward task to make a precise evaluation of the building energy performance, since the energy will be determined by the interaction of various factors from physical, environmental, and social characteristics with attendant uncertainties. The process in the topic of building energy consumption assessment begins from the engineering approach (also named the physical modeling approach), which largely relies on the physical principles to quantify the thermal dynamics and energy behavior at the whole building level. This approach is typically performed in building energy simulation software, like EnergyPlus, Ecotect, DOE-2, BLAST, and others [15]. However, physical models are too dependent on inputs from detailed building and environmental parameters, which are unable to achieve accurate prediction in the absence of such complete inputs. Then, another representative model named the statistical approaches (also called the empirical approaches) is developed to explore empirical data about heating/cooling load, lighting, electricity usage [8]. Nevertheless, the accuracy of this method lies in the sufficient high-quality data collected from the buildings, which is not always easily obtained. The traditional regression model based on the linear assumption is mostly used in this method to simplify the problem and determine the correlation between energy consumption and related factors, but it fails to reveal the nonlinear dependency in real cases. Besides, it normally requires a wealth of accumulated knowledge and expertise to implement the engineering or statistical approaches, making these exiting methods hard to popularize in practice. Recent years have witnessed the rapid development of artificial intelligence (AI) techniques, which can automatically and rapidly discover hidden knowledge according to intricate correlations behind the vast amount of data [16], [17]. In this regard, more and more attention have been focused on the more advanced and intelligent data-driven predictions based on machine learning methods/algorithms due to flexibility and accuracy [15]. Among various machine learning methods, a great concern is to select the proper one for creating a more robust and reliable predicted model.
In particular, ensemble models, such as the decision tree, gradient boosting, and others, are one of the popular and powerful machine learning algorithms comprising several weak learners, which can achieve better prediction performance compared to a single model. Especially for the gradient boosting, it converts the composition models into a stronger leaner in a gradual, additive and sequential manner, which has proven useful to explore data with noise, heterogeneous features, and correlation across many domains [18]. As a variant of the gradient boosting algorithm, the CatBoost model proposed in 2018 by the Yandex company is very new and novel [19], which outperforms others in its flexibility of tackling categorical features, state-of-the-art predictive results, lower chance of overfitting, and easy implementation. Since the CatBoost algorithm is an emerging method, its applications in practical tasks are still rare. For instance, Huang et al. [20] applied the CatBoost method to estimate the reference evapotranspiration in a humid region, which could generate more satisfactory accuracy efficiently than the random forest (RF) and support vector machine (SVM). Kang et al. [21] proposed a CatBoost-based framework for accurately predicting social media popularity involving two stages namely feature representation and CatBoost regression training. Zhang et al. [22] integrated the bidirectional long short term memory neural network with the CatBoost algorithm, in order to select proper features and forecast electricity prices under small errors. It is noticeable that buildings now can also be information-intensive to provide big data sources for energy analysis. On the one hand, the CatBoost model is suitable to learn these available sets of features in different data types (i.e., integer, float, string, boolean) from buildings to address the non-linearity and complicated prediction problem. On the other hand, the CatBoost model has predominance in providing a better understanding of the relationship between energy consumption and its relevant variables, which is unachievable using the common black-box models. Based upon the above-mentioned adaptability of the CatBoost model, there are reasons to believe that predictive results with high reliability and explainability can be obtained, resulting in data-driven decision making for delivering more optimal energy-efficiency strategies.
In this research, we refer to the newly developed machine learning algorithm termed the CatBoost model to estimate the building energy consumption, which could be the first time in this topic. Apart from building a sound CatBoost model through parameter tuning for predicting the building energy and quantifying the impacts from multi-source heterogeneous data, our method can also detect outliers of energy performance to make risk alarms. As a case study, the CatBoost model is carried out to provide a deep insight into an energy benchmarking dataset about Seattle, which holds the promise in building energy management and evaluation. There are three research questions remained to be addressed: (1) How to train the CatBoost model with optimal parameters to precisely estimate the energy use in the whole building portfolio and calculate the feature importance; (2) How to analyze the predictive results thoroughly in order to not only discover opportunities for energy conservation but also detecting abnormal energy consumption profiles; (3) How to discuss energy performance for buildings under different energy-efficiency levels. Overall, this research contributes to infusing more computational intelligence into building energy evaluation and management, which in the end facilitates building owners and designers to optimize energy utilization and minimize carbon emissions for the development of green buildings.
The remaining of this paper is organized as follows. Section 2 reviews the current AI-based building energy consumption predictions. Section 3 elaborates on an overall framework of the CatBoost-based prediction method with three major steps. Section 4 implements the proposed method in a real-world case study to verify its feasibility. Section 5 confirms not only the generalization ability of the CatBoost model in different datasets but also the better regression ability of the CatBoost model over the other two popular ensemble models. Section 6 summarizes the conclusions and future works.
Section snippets
Literature review
Nowadays, more buildings have installed advanced metering systems and various smart sensors bringing the surge in the amount of energy data, which provides convenience to train relevant computing models and judge the models’ performance. Considerable attempts have been made to deploy various AI methods to heavily explore these collected data for an intelligent prediction and understanding of the building energy consumption, resulting in the rapid and reliable decision making in energy saving.
Methodology
The schematic outline of the CatBoost-based estimation method for building energy consumption is illustrated in Fig. 1. It is evidence that preprocessed data is fed into the CatBoost model, whose parameters are fine-tuned by the 5-fold cross-validation. Based upon the result analysis from energy prediction, outlier detection, and feature importance measurement, some valuable suggestions, and strategies can be presented to further improve the building energy performance.
Data preparation
As a case study, Seattle's building energy performance data collected in 2015 and 2016 by Seattle's Energy Benchmarking Program (SMC 22.920) serves as the main dataset. The study area is Seattle with a land area of 215 km2, where buildings are responsible for around 33% of the city’s major emissions [51]. The available data is in total 6716 lines, which tracks the annual energy usage of non-residential and multifamily properties (buildings) with an area of more than 20,000 sf. Since the raw
Discussions
Since the Energy Star program grades the building’s performance by the 1–100 score, it is also the interest of this research to explore the building energy consumption under different energy-efficiency levels. Four CatBoost models are trained and optimized based upon four separate datasets extracted from four levels, respectively. Moreover, the effectiveness of the novel CatBoost algorithm is validated by a comparison of two popular machine learning models. Discussions are summarized as follows.
Conclusions and future works
Since the topic of energy conservation and emission reduction is currently receiving more and more attention, it is of necessity to estimate the building energy consumption precisely for better evaluation and optimization of energy performance. To realize the goal of sustainable development, a novel ensemble model named CatBoost proposed in 2018, which is still unexplored in the topic of building energy consumption evaluation, is applied in a case study to nonlinearly predict the Site EUIWN
CRediT authorship contribution statement
Yue Pan: Writing - original draft, Methodology, Visualization, Investigation, Validation, Formal analysis. Limao Zhang: Conceptualization, Supervision, Methodology, Writing - review & editing, Funding acquisition.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
The Ministry of Education Tier 1 Grant, Singapore (No. M4011971.030) and the Start-Up Grant at Nanyang Technological University, Singapore (No. M4082160.030) are acknowledged for their financial support of this research.
References (58)
- et al.
Building energy: a review on consumptions, policies, rating schemes and standards
Energy Procedia
(2019) - et al.
Towards nearly zero-energy buildings: the state-of-art of national regulations in Europe
Energy
(2013) - et al.
Energy consumption and efficiency technology measures in European non-residential buildings
Energy Build
(2017) - et al.
Analysis of typical public building energy consumption in northern China
Energy Build
(2017) - et al.
A thorough assessment of China’s standard for energy consumption of buildings
Energy Build
(2017) - et al.
Energy performance optimisation of building envelope retrofit through integrated orthogonal arrays with data envelopment analysis
Renew Energy
(2020) - et al.
Methodology to estimate building energy consumption using EnergyPlus Benchmark Models
Energy Build
(2010) - et al.
Valuation of energy efficient certificates in buildings
Energy Build
(2018) - et al.
Structural health monitoring and assessment using wavelet packet energy spectrum
Saf Sci
(2019) - et al.
Solutions to reduce energy consumption in the management of large buildings
Energy Build
(2013)
A review of data-driven building energy consumption prediction studies
Renew Sustain Energy Rev
BIM log mining: Learning and predicting design commands
Autom Constr
BIM log mining: exploring design productivity characteristics
Autom Constr
Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions
J Hydrol
A review on applications of ANN and SVM for building electrical energy consumption forecasting
Renew Sustain Energy Rev
Applying support vector machines to predict building energy consumption in tropical region
Energy Build
Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy
Appl Energy
A review of data-driven approaches for prediction and classification of building energy consumption
Renew Sustain Energy Rev
Energy analysis of a building using artificial neural network: a review
Energy Build
Forecasting diurnal cooling energy load for institutional buildings using Artificial Neural Networks
Energy Build
Machine learning-based thermal response time ahead energy demand prediction for building heating systems
Appl Energy
Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks
Appl Energy
Predicting fuel consumption for commercial buildings with machine learning algorithms
Energy Build
Prediction of residential building energy consumption: a neural network approach
Energy
Development of an RDP neural network for building energy consumption fault detection and diagnosis
Energy Build
Diagnostic tools of energy performance for supermarkets using Artificial Neural Network algorithms
Energy Build
Artificial intelligence-based fault detection and diagnosis methods for building energy systems: advantages, challenges and the future
Renew Sustain Energy Rev
Multi-classifier information fusion in risk analysis
Inform Fusion
A novel ensemble learning approach to support building energy use prediction
Energy Build
Cited by (99)
Structured stochastic models based on multi-source heterogeneous data for predicting internal electricity load of non-residential buildings
2024, Journal of Building EngineeringForecasting model of building energy consumption based on parallel Kriging sampling algorithm
2024, Sustainable Energy Technologies and AssessmentsExploring urban building space provision and inequality: A three-dimensional perspective on Chinese cities toward sustainable development goals
2024, Sustainable Cities and SocietyAn improved method for water depth mapping in turbid waters based on a machine learning model
2024, Estuarine, Coastal and Shelf Science