Elsevier

Applied Energy

Volume 268, 15 June 2020, 114965
Applied Energy

Data-driven estimation of building energy consumption with multi-source heterogeneous data

https://doi.org/10.1016/j.apenergy.2020.114965Get rights and content

Highlights

  • A categorical boosting model is employed to intelligently forecast building energy consumption.

  • It raises accuracy and mitigates uncertainty in understanding building energy performance.

  • Feature importance can be measured to quantify features’ impacts on energy consumption.

  • Outlier detection can distinguish normal and abnormal energy usage to make early warnings.

  • Results will provide references to make data-driven decisions in optimizing energy utilization.

Abstract

For better energy evaluation and management, a categorical boosting (CatBoost)-based predictive method is presented to accurately estimate building energy consumption by learning large volumes of multi-source heterogeneous data collected from buildings. To be specific, the newly-developed CatBoost model belonging to the ensemble learning has superiority in handling categorical variables and producing reliable results. As a case study, our proposed method is validated in a multi-dimensional dataset about Seattle's building energy performance provided by the city’s government, aiming to estimate the weather normalized site energy use intensity of buildings and characterize its non-linear relationship with other 12 possible influential features. Results from the 5-fold cross-validation demonstrate that the model exhibits a strong ability in predicting the exact value of energy intensity precisely, which can even outperform popular machine learning algorithms including random forest and gradient boosting decision tree under R2 of 0.897. Based on a defined threshold, these predicted values can be classified as the normal or abnormal energy consumption reaching an accuracy of 99.32% for outlier detection, which is helpful in alarming potential risks at an early stage and developing strategies to enhance the energy efficiency. Moreover, results from the established model can be interpreted objectively, suggesting that features concerning the physical and energy characteristics contribute more to energy estimation than environmental features. Since such results understand the building energy consumption and efficiency in a data-driven manner, they can eventually serve as guidance for building owners and designers in designing and renovating buildings to achieve better energy-conserving performance.

Introduction

Buildings need to consume energy at each stage of their life cycle with at least 50 years, which will share approximately 40% of total world energy usage and 30% of global greenhouse gas emissions [1]. For instance, in the United States (U.S.), 21% of the U.S. total energy use stems from residential sectors followed by commercial sectors at 19% [2]. For buildings in major cities of the U.S., they are even responsible for around 75% of energy use in the city [3]. In the European Union (EU), buildings are considered as the biggest contributor to energy use taking up 40% of the EU primary energy [4]. As for the non-residential buildings, the energy they consumed occupies 25% in the total European building stock, which has continually grown by 2.5% every year between 2000 and 2008 [5]. In China, buildings account for 25% and 6% of the national and global energy consumption, respectively [6], [7]. More than that, the growth rate of the building energy consumption is supposed to keep rising due to the strong development momentum in China. With the growth of populations and the acceleration of urbanization, more and more buildings will be constructed and worked to bring about the upward trend in the amount of energy consumption, which is predicted to rise by 45% from 2002 to 2025 [8]. Nevertheless, too high building energy consumption will raise negative impacts on the environment, such as air pollutions, greenhouse effect, urban heat island effect, and others, which can even do plenty of harm to human health and social economy development. Experts highlight that one of the efficient pathways to promote long-term and cost-effective energy conservation and emissions reduction is to decrease the energy consumption as much as possible [9]. Since buildings are the major energy-consuming sectors in the world causing energy inefficiency, they can act as a promising target with the greatest potential to reach the common goal toward sustainable development [10].

Remarkably, a good understanding of the building energy profile plays a significant role in decision making for energy management and conservation [11]. For one thing, it provides insights into the energy behavior in buildings, which serve as evidence for building owners to put forward reasonable building operation plans and design more energy-efficient buildings [12]. For another, it offers valuable chances of detecting outliers in energy consumption to definitely perceive risk ahead of time [13]. Accordingly, building owners can take prompt and reasonable actions to mitigate the failure from energy loss. It is estimated that buildings created based upon the proper energy forecast could expectedly reduce their system consumption by at least 10%, which will eventually slow the resource depletion and minimize the environmental impacts [14]. Therefore, the special concern should be raised toward a higher accuracy of the building energy consumption prediction, contributing to evaluating and improving the building energy performance.

Actually, it is not a straightforward task to make a precise evaluation of the building energy performance, since the energy will be determined by the interaction of various factors from physical, environmental, and social characteristics with attendant uncertainties. The process in the topic of building energy consumption assessment begins from the engineering approach (also named the physical modeling approach), which largely relies on the physical principles to quantify the thermal dynamics and energy behavior at the whole building level. This approach is typically performed in building energy simulation software, like EnergyPlus, Ecotect, DOE-2, BLAST, and others [15]. However, physical models are too dependent on inputs from detailed building and environmental parameters, which are unable to achieve accurate prediction in the absence of such complete inputs. Then, another representative model named the statistical approaches (also called the empirical approaches) is developed to explore empirical data about heating/cooling load, lighting, electricity usage [8]. Nevertheless, the accuracy of this method lies in the sufficient high-quality data collected from the buildings, which is not always easily obtained. The traditional regression model based on the linear assumption is mostly used in this method to simplify the problem and determine the correlation between energy consumption and related factors, but it fails to reveal the nonlinear dependency in real cases. Besides, it normally requires a wealth of accumulated knowledge and expertise to implement the engineering or statistical approaches, making these exiting methods hard to popularize in practice. Recent years have witnessed the rapid development of artificial intelligence (AI) techniques, which can automatically and rapidly discover hidden knowledge according to intricate correlations behind the vast amount of data [16], [17]. In this regard, more and more attention have been focused on the more advanced and intelligent data-driven predictions based on machine learning methods/algorithms due to flexibility and accuracy [15]. Among various machine learning methods, a great concern is to select the proper one for creating a more robust and reliable predicted model.

In particular, ensemble models, such as the decision tree, gradient boosting, and others, are one of the popular and powerful machine learning algorithms comprising several weak learners, which can achieve better prediction performance compared to a single model. Especially for the gradient boosting, it converts the composition models into a stronger leaner in a gradual, additive and sequential manner, which has proven useful to explore data with noise, heterogeneous features, and correlation across many domains [18]. As a variant of the gradient boosting algorithm, the CatBoost model proposed in 2018 by the Yandex company is very new and novel [19], which outperforms others in its flexibility of tackling categorical features, state-of-the-art predictive results, lower chance of overfitting, and easy implementation. Since the CatBoost algorithm is an emerging method, its applications in practical tasks are still rare. For instance, Huang et al. [20] applied the CatBoost method to estimate the reference evapotranspiration in a humid region, which could generate more satisfactory accuracy efficiently than the random forest (RF) and support vector machine (SVM). Kang et al. [21] proposed a CatBoost-based framework for accurately predicting social media popularity involving two stages namely feature representation and CatBoost regression training. Zhang et al. [22] integrated the bidirectional long short term memory neural network with the CatBoost algorithm, in order to select proper features and forecast electricity prices under small errors. It is noticeable that buildings now can also be information-intensive to provide big data sources for energy analysis. On the one hand, the CatBoost model is suitable to learn these available sets of features in different data types (i.e., integer, float, string, boolean) from buildings to address the non-linearity and complicated prediction problem. On the other hand, the CatBoost model has predominance in providing a better understanding of the relationship between energy consumption and its relevant variables, which is unachievable using the common black-box models. Based upon the above-mentioned adaptability of the CatBoost model, there are reasons to believe that predictive results with high reliability and explainability can be obtained, resulting in data-driven decision making for delivering more optimal energy-efficiency strategies.

In this research, we refer to the newly developed machine learning algorithm termed the CatBoost model to estimate the building energy consumption, which could be the first time in this topic. Apart from building a sound CatBoost model through parameter tuning for predicting the building energy and quantifying the impacts from multi-source heterogeneous data, our method can also detect outliers of energy performance to make risk alarms. As a case study, the CatBoost model is carried out to provide a deep insight into an energy benchmarking dataset about Seattle, which holds the promise in building energy management and evaluation. There are three research questions remained to be addressed: (1) How to train the CatBoost model with optimal parameters to precisely estimate the energy use in the whole building portfolio and calculate the feature importance; (2) How to analyze the predictive results thoroughly in order to not only discover opportunities for energy conservation but also detecting abnormal energy consumption profiles; (3) How to discuss energy performance for buildings under different energy-efficiency levels. Overall, this research contributes to infusing more computational intelligence into building energy evaluation and management, which in the end facilitates building owners and designers to optimize energy utilization and minimize carbon emissions for the development of green buildings.

The remaining of this paper is organized as follows. Section 2 reviews the current AI-based building energy consumption predictions. Section 3 elaborates on an overall framework of the CatBoost-based prediction method with three major steps. Section 4 implements the proposed method in a real-world case study to verify its feasibility. Section 5 confirms not only the generalization ability of the CatBoost model in different datasets but also the better regression ability of the CatBoost model over the other two popular ensemble models. Section 6 summarizes the conclusions and future works.

Section snippets

Literature review

Nowadays, more buildings have installed advanced metering systems and various smart sensors bringing the surge in the amount of energy data, which provides convenience to train relevant computing models and judge the models’ performance. Considerable attempts have been made to deploy various AI methods to heavily explore these collected data for an intelligent prediction and understanding of the building energy consumption, resulting in the rapid and reliable decision making in energy saving.

Methodology

The schematic outline of the CatBoost-based estimation method for building energy consumption is illustrated in Fig. 1. It is evidence that preprocessed data is fed into the CatBoost model, whose parameters are fine-tuned by the 5-fold cross-validation. Based upon the result analysis from energy prediction, outlier detection, and feature importance measurement, some valuable suggestions, and strategies can be presented to further improve the building energy performance.

Data preparation

As a case study, Seattle's building energy performance data collected in 2015 and 2016 by Seattle's Energy Benchmarking Program (SMC 22.920) serves as the main dataset. The study area is Seattle with a land area of 215 km2, where buildings are responsible for around 33% of the city’s major emissions [51]. The available data is in total 6716 lines, which tracks the annual energy usage of non-residential and multifamily properties (buildings) with an area of more than 20,000 sf. Since the raw

Discussions

Since the Energy Star program grades the building’s performance by the 1–100 score, it is also the interest of this research to explore the building energy consumption under different energy-efficiency levels. Four CatBoost models are trained and optimized based upon four separate datasets extracted from four levels, respectively. Moreover, the effectiveness of the novel CatBoost algorithm is validated by a comparison of two popular machine learning models. Discussions are summarized as follows.

Conclusions and future works

Since the topic of energy conservation and emission reduction is currently receiving more and more attention, it is of necessity to estimate the building energy consumption precisely for better evaluation and optimization of energy performance. To realize the goal of sustainable development, a novel ensemble model named CatBoost proposed in 2018, which is still unexplored in the topic of building energy consumption evaluation, is applied in a case study to nonlinearly predict the Site EUIWN

CRediT authorship contribution statement

Yue Pan: Writing - original draft, Methodology, Visualization, Investigation, Validation, Formal analysis. Limao Zhang: Conceptualization, Supervision, Methodology, Writing - review & editing, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgment

The Ministry of Education Tier 1 Grant, Singapore (No. M4011971.030) and the Start-Up Grant at Nanyang Technological University, Singapore (No. M4082160.030) are acknowledged for their financial support of this research.

References (58)

  • K. Amasyali et al.

    A review of data-driven building energy consumption prediction studies

    Renew Sustain Energy Rev

    (2018)
  • Y. Pan et al.

    BIM log mining: Learning and predicting design commands

    Autom Constr

    (2020)
  • Y. Pan et al.

    BIM log mining: exploring design productivity characteristics

    Autom Constr

    (2020)
  • G. Huang et al.

    Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions

    J Hydrol

    (2019)
  • A. Ahmad et al.

    A review on applications of ANN and SVM for building electrical energy consumption forecasting

    Renew Sustain Energy Rev

    (2014)
  • B. Dong et al.

    Applying support vector machines to predict building energy consumption in tropical region

    Energy Build

    (2005)
  • R.K. Jain et al.

    Forecasting energy consumption of multi-family residential buildings using support vector regression: Investigating the impact of temporal and spatial monitoring granularity on performance accuracy

    Appl Energy

    (2014)
  • Y. Wei et al.

    A review of data-driven approaches for prediction and classification of building energy consumption

    Renew Sustain Energy Rev

    (2018)
  • R. Kumar et al.

    Energy analysis of a building using artificial neural network: a review

    Energy Build

    (2013)
  • C. Deb et al.

    Forecasting diurnal cooling energy load for institutional buildings using Artificial Neural Networks

    Energy Build

    (2016)
  • Y. Guo et al.

    Machine learning-based thermal response time ahead energy demand prediction for building heating systems

    Appl Energy

    (2018)
  • A. Rahman et al.

    Predicting electricity consumption for commercial and residential buildings using deep recurrent neural networks

    Appl Energy

    (2018)
  • A. Rahman et al.

    Predicting fuel consumption for commercial buildings with machine learning algorithms

    Energy Build

    (2017)
  • M.R. Biswas et al.

    Prediction of residential building energy consumption: a neural network approach

    Energy

    (2016)
  • F. Magoulès et al.

    Development of an RDP neural network for building energy consumption fault detection and diagnosis

    Energy Build

    (2013)
  • G. Mavromatidis et al.

    Diagnostic tools of energy performance for supermarkets using Artificial Neural Network algorithms

    Energy Build

    (2013)
  • Y. Zhao et al.

    Artificial intelligence-based fault detection and diagnosis methods for building energy systems: advantages, challenges and the future

    Renew Sustain Energy Rev

    (2019)
  • Y. Pan et al.

    Multi-classifier information fusion in risk analysis

    Inform Fusion

    (2020)
  • Z. Wang et al.

    A novel ensemble learning approach to support building energy use prediction

    Energy Build

    (2018)
  • Cited by (99)

    View all citing articles on Scopus
    View full text