The first policy issue deals with how energy-efficiency programs are evaluated from a public policy perspective. We review the different metrics that are used by policymakers for evaluating energy efficiency, focusing on whether the total resource cost test is the best metric for addressing the challenges ahead (such as mitigating the effects of climate change). The second policy issue examines how the practice of evaluation depends on how the results are used: for example, (a) demonstrating energy efficiency as a reliable energy resource, (b) using energy efficiency as a means for reducing carbon emissions, (c) determining shareholder incentives, (d) improving the quality of programs, etc. The third policy issue deals with the challenges associated with the consideration of national EM&V protocols—and the effect that a national cap and trade policy might have on this issue.
Evaluation metrics
Metrics here are used as measures of success or progress toward goals. Metrics for acquisition programs may include gross savings (reductions in energy use that are related to the implementation of efficiency measures); net savings (savings that would not have occurred in the absence of a program intervention); cost–effectiveness as measured by the cost of a resource compared to other energy resources (Total Resource Cost); the cost of the resource to the implementing agency (program administrator or utility cost test); or a societal test based on all benefits and costs that can be assigned values. Market transformation programs use metrics that look at progress toward changing a market, improving market share, and the sustainability of the market changes that results from program intervention.
In the evolving world of energy and environmental policy, we should not be surprised if there are shifts in emphasis that question program goals, their metrics, and how evaluations are conducted. Historically, in many parts of the USA, the emphasis has been on (1) efficiency paradigm, (2) programs, (3) net savings, and (4) the Total Resource cost (TRC) test of cost-effectiveness.
2 With the growing importance of GHG reduction, each of these may have to be reconsidered, as discussed below.
First, a paradigm based on energy efficiency (reduction in consumption relative to what it was or what it might be) does not necessarily align exactly with the desire of an absolute reduction in GHG emissions or policies aimed at “net-zero” consumption. Retrofitting the worst wasters of energy surely helps, but making a massive home more efficient than it might otherwise be still results in some level of continued (but lower) GHG emissions that may still be above the level needed to successfully reach a reduction objective. If policy makers set the goals to be an absolute reduction in consumption and emissions in order to reach a specific target, then the practice of evaluation should try to measure absolute consumption. Evaluation follows the metrics.
Secondly, the emphasis on “programs” is too restrictive for policies that hope to operate in the consciousness of consumers, in the marketplace, and to create dramatic change in consumption through synergism among many stimuli. Even the concept of market transformation programs assumes that there is a directed approach that is limited in scope. By expecting all or most of the change to come from the limited implementation of programs narrows the practice of evaluation. Instead, the focus should be on the marketplace, and the evaluator needs to evaluate how the market is changing over time with respect to energy efficiency: in buildings, retail, manufacturing as well as in the educational fields and in the training of professionals (architects and engineers, finance and insurance industry, etc.).
Thirdly, as noted earlier, net savings is a concept of attribution that is both very hard to measure in a complex world of confounding influences (outstripping the capability of social science to provide reliable answers) and one that policy makers with a view toward the end-point of reduced emissions may find less relevant in the future. However, the concept of examining relative attribution is always important for making sure that the ratepayer/taxpayer/societal resources are being spent prudently as well as making sure that resources are being used as effectively as possible on programs that reduce emissions. Relative attribution measures the effectiveness of one approach compared to others in order to understand the relative causes of market change from multiple cause and effect relationships. The outcomes from these studies would show that one effort may be twice as influential as another, but it would not segregate total effects into multiple pots of net impacts per approach because each approach does not impact the market incrementally. If one were to totally move away from net savings in energy policy, this would run counter to developing environmental policies that reward entities for “incremental or additional” GHG reductions—which is one of the key concepts underlying the Clean Development Mechanism in the Kyoto Protocol. However, as noted previously, it is time to revisit net savings—not only in the energy efficiency arena but also in the international discussions on climate change. If urgent and comprehensive efforts are needed, and the efforts by everyone (including free riders) need to be encouraged, then it may be necessary to live with the “extra costs” of paying people to reduce their energy use and emissions, even if they were already or may soon be influenced by other energy efficiency efforts. The other option is to implement “smarter” public policy by strategically making decisions upfront on what should be funded and what should not be funded (e.g., promoting energy efficiency measures and programs that will reduce the probability of free riders) and reduce the evaluation resources used to test for additionality for those programs over time.
Lastly, the concept of a TRC test, whether of programs or resources, is a regulatory paradigm that is designed to making sure that ratepayers/taxpayers are receiving the least supply cost resource, not necessarily the least total cost resource of best choice for reasons outside of regulatory purview. There are concerns with the continued use of the TRC—some argue that a new cost-effectiveness metric be used or that the TRC be changed by making significant changes to the inputs: e.g., avoided cost, discount rate, value of carbon emissions, measure lifetime (Hall et al.
2008), and non-energy benefits. For example, until recently, in some jurisdictions, the benefits of the TRC were based on the avoided cost of a new generation plant—in California, that plant was the combined cycle gas turbine while in other states it may be coal-fired generation.
3 In other states, the avoided cost can be the cost to produce the next kWh with the equipment operating at that time. However, if climate change is the focus of national policy and we are to reduce our carbon emissions, then some suggest that a renewable energy plant should form the basis for avoided cost calculations—which would make energy efficiency more attractive, since most renewable energy supplies are more costly than carbon-based generation.
4 Furthermore, in some states, carbon adders are used in cost-effectiveness tests to try to account for the negative impacts of carbon-based fuels (coal, oil). However, the real cost of climate change is undoubtedly larger than the costs reflected in the carbon adders.
Another concern that has more recently arisen regarding the TRC test is that it has tended to be applied in an asymmetrical fashion. That is, the customer’s direct costs for the energy efficiency measure are virtually always added in to the total cost side of the TRC equation, but the additional customer benefits beyond utility system resource savings (e.g., increased productivity, reduced maintenance costs, esthetics, etc.) are very seldom quantified and added in to the TRC calculation (in part because they are more difficult and expensive to measure). This has led some to call for the use of a “utility cost test” (aka “administrator’s cost test”), which is more comparable to the way that other utility resources are judged (i.e., costs and benefits to the utility system).
The purpose of discounting is to bring all costs and returns at different points in time to a net present value, so that different investment choices with different costs and returns can be compared. The discounting values typically reflect the perspectives of the key stakeholders involved in managing risk (e.g., the utility perspective versus the public agency (societal) perspective). Some argue that the current approach for discounting the value of future savings supports the analysis of short-term economic decisions but does not support the analysis of long-term decisions like climate change where the impacts occur over a much longer time period. Very small, zero discount or negative discount rates have been proposed as one solution, making energy efficiency more financially attractive.
As noted above, some states have used carbon values in their benefit–cost tests, but the values are based on a traded value of carbon, a proxy to represent an expected trade value (e.g., within a cap-and-trade system), or a derivative of a traded or expected trade value. Although we do not know the real value of avoided carbon emission, the value may be significantly higher if all of the environmental costs (or the costs of achieving a sustainable environment according to some analysts) are included (instead of $3 to $45 as reflected in carbon markets in Europe and in the Northeast and which exclude all of the environmental costs), making energy efficiency an even more viable solution for addressing climate change.
The effective useful life (EUL) of a measure (i.e., measure lifetime) is the period of time that the measure is expected to perform its intended function in a typical installation.
5 Some estimates of EULs are conservative and underestimate the actual lifetimes of measures. As a result, measures that are very effective and long-lived (e.g., windows, insulation and new building envelopes) are not recognized or valued as highly as they should: while their initial costs are reflected in the benefit–cost ratio, their long-term savings are reduced (e.g., 20–25 years, instead of 40–60 years). On the other hand, estimates for some measures may be too optimistic, not accounting for business turnover (which may cause measures to be removed prematurely) or simply relying on manufacturers’ estimates which have not been supported by field observations. Careful analysis of measure lifetimes is warranted to ensure that there is no bias overall.
Non-energy benefits (or costs)—such as reduced emissions (see above) and environmental benefits, productivity improvements, high comfort and convenience, reduced debt and lower levels of arrearage, and job creation—are typically not included in benefit–cost tests. Some argue that these non-energy benefits should be included, since evaluation methods are available and, more importantly, these benefits often are valued more highly than the energy benefits for motivating end users to invest in energy efficiency or change their energy behavior (Skumatz et al.
2009).
In conclusion, as the scope of energy-related policy and investments both expand and intersect with energy and environmental regulation, they may force cost-effectiveness metrics to be redefined, with implications for EM&V.
Evaluation practice
The practice of evaluation depends on how the results are used, as noted above: for example, (a) demonstrating energy efficiency as a reliable energy resource, (b) using energy efficiency as a means for reducing carbon emissions, (c) determining shareholder incentives, (d) improving the quality of programs, etc. The number and types of stakeholders have increased over time, making the evaluation practice more comprehensive and of greater interest to parties who wish to use the evaluation results for their own agendas. For example, if certain stakeholders either do not value energy efficiency as a resource or question the cost and/or reliability of energy efficiency as a resource, then it is incumbent upon the evaluator to accurately quantify the costs and benefits of the energy-efficiency program and to compare the value of that resource (e.g., the cost of conserved energy) with other resources (e.g., the avoided cost of a combined cycle natural gas plant, a wind generator, or some other generation source). In addition, evaluators need to periodically assess the persistence of these energy savings over time, to ensure that energy efficiency can be counted in utility procurement plans and to make sure that the carbon reductions are properly accounted for over the life of an energy efficiency measure. Similarly, if stakeholders are primarily interested in shareholder incentives, then the evaluation will focus on those measures that affect the ultimate outcome of the incentive mechanism (e.g., high impact measures), and may pay little attention to those measures (and programs) that do not result in significant energy savings.
If stakeholders are interested in energy efficiency as a strategy for reducing carbon emissions, then the evaluator must be able to convert energy savings into carbon emissions (as described in
Carbon emissions calculation). In addition, the appropriate cost-effectiveness calculation should be used, as determined by the policy makers: e.g., a revised Total Resource Cost test (see
Evaluation metrics). In fact, it may turn out that policymakers are interested in all energy-efficiency programs (no matter the cost effectiveness) if they believe that climate change is a problem that needs to be addressed urgently and comprehensively.
Finally, if stakeholders are primarily interested in improving the quality of the programs, then more evaluation work will need to go into process evaluations (as well as impact evaluations) to ensure that the programs are delivering the energy efficiency measures efficiently and effectively. We need more research on which consumers participate or do not participate in energy-efficiency programs and why. We need more research on behavior of the key stakeholders: how they use energy and how they make decisions on investing in energy efficiency. And we need more research on the overall market for energy efficiency products and services—how is it changing and how have programs affected the market.
National EM&V protocols
Although EM&V guidelines have been prepared by several organizations at the state, national and international levels (e.g., TecMarket Works
2004; CPUC
2006; NAPEE
2007; USDOE
2007), there are no national EM&V protocols that organizations are required to follow. Given the interest in energy efficiency resource standards and potentially a national cap and trade policy/program and the need for ensuring high-quality standards for conducting evaluations, there is renewed interest in having a national EM&V protocol. The elements of the national protocol would include: common evaluation terms and definitions; common evaluation methods; common savings values and assumptions (e.g., energy, costs, measure life, and persistence); guidelines in savings precision and accuracy; and common reporting formats.
The key strength of national evaluation protocols is that they allow energy savings results to be grounded within an assessment approach that can produce reliable and transparent savings estimates if the protocol is based on rigorous evaluation practices. National protocols (using a consistent set of inputs and reporting formats) can also allow savings to be compared from one state to another or from one evaluation to another. If the research is based on the same protocols, in theory, the results should be comparable—assuming that the protocols are detailed enough to prescribe the required evaluation approach. Similar approaches also reduce evaluation estimation error risks (increasing the credibility of energy efficiency) and reduce evaluation costs to states that wish to use these approaches. A national protocol would also minimize confusion for and reduce barriers for the growing market of energy efficiency providers (the transaction costs are high for providers who have to meet different state-mandated evaluation requirements and levels of rigor).
Comparability and compatibility assume that the definitions for what constitutes an achieved impact are also identical. Because many states define net energy savings differently (see
Net energy savings calculation), a protocol that prescribes a reliable evaluation approach, but is applied to different definitions of net energy impacts, will not provide comparable results. It is not enough to prescribe an evaluation approach to achieve comparability and compatibility, the definitions on which that protocol is based must also be prescribed.
In establishing a protocol, it is also important to place into that protocol a prescriptive approach for dealing with the conditions that most impact the reliability of the findings. It is important to remember that energy efficient measures do not provide savings on their own. Savings are produced only after measures are integrated within a customer decision and operational environment that provides savings to be measured. The authors of this paper have seen identical programs, implemented in similar climates, and serving similar customers result in substantially different energy impacts. If the policy makers want to understand why programs produce the savings they achieve, then the protocols must also focus on prescribing evaluation approaches that support this purpose.
There are other concerns that may prevent the adoption of a national protocol if one were to be developed and implemented. First, engaging a broad range of stakeholders is challenging: standards need to be perceived as fair by a diverse set of stakeholders, although some may see the protocols as restricting independent minded jurisdictions from doing what they want. Others may see standard protocols as going beyond their needs, acting to increase evaluation costs or making progress reporting too complicated without showing a corresponding need for the evaluation or the resulting information. Moreover, it may be difficult to get consensus among all of the stakeholders on some of the key issues mentioned previously: e.g., rely on gross energy savings or net energy savings; if net energy savings are used, include free riders and spillover or just free riders; etc. Second, a national protocol may be seen as impeding innovation in evaluation practice at the state level or inadvertently exclude evaluation practices that are valid. Third, best achievable practices in evaluation may differ from one region to another, due to resource availability (see below) or different reporting requirements. Similarly, a national protocol may be viewed as too stringent for some states, and too lenient for other states. Fourth, a national or international protocol may end up as too general and not specific for the reasons above and because savings algorithms and assumptions will vary by program design (see below). And fifth, a national protocol may increase transaction costs if entities need to respond to local reporting requirements and goals as well as national requirements.
To develop a national or international protocol, the evaluation standards must be developed objectively by third parties and build in room for flexibility and opportunity for updates. The protocols must also ensure that local, state, or national goals and reporting needs are being addressed and are not over-specified or go beyond what is desired. Thus, an open and transparent process with opportunities for stakeholder input and participation needs to be encouraged and the crafters of the protocol must be willing to negotiate some elements (e.g., less rigor in the beginning versus more rigor in the future). In this process, uniform support for a national protocol from a divergent group of stakeholders may be very difficult to achieve, especially if that protocol were to increase costs or reporting requirements beyond acceptable levels.
A key consideration in any move toward a widely applicable protocol must focus on the resources available to support that protocol’s application. An evaluation protocol that is based on an evaluation budget that is 8% of the portfolio’s resources may not be applicable for evaluations of programs that have fewer evaluation resources. On the other hand, if the protocol is based on the lower ends of the evaluation budget spectrum (e.g., 0.5-3%), then entities that need more reliable results must increase the evaluation rigor and move beyond the prescribed protocol (as well as be willing to pay for that reliability through an increased evaluation budget). In this case, the EM&V protocol would provide an array of evaluation categories: set a minimum level of rigor for all programs, but encourage evaluations to go beyond the minimum level of rigor, if the desire and budget are available. This condition would be similar to California’s EM&V protocols that encourage high rigor reliable studies, but allow the Commission to move to lower rigor approaches when higher rigor is not needed or wanted.