The difficulties associated with constraint-violation penalties that are commonly used in evolutionary algorithms include time-consuming trial runs and parameter calibration (Dridi
et al.
2008). On the other hand, penalty-free methods eliminate the need to design penalty functions and are relatively straightforward to implement without sacrificing the computational efficiency (Siew and Tanyimboh
2012). Also, penalty-free methods can maintain infeasible solutions that may have useful properties that may not be common in feasible solutions in successive generations of the optimization. Other constraint handling methods have been proposed (Deb
et al.
2002). For example, Ray
et al. (
2001) suggested three stages of nondomination ranking using different combinations of the objective and constraint functions. Constraint handling in Deb
et al. (
2002) involves a binary tournament in which feasible solutions automatically dominate infeasible solutions. We developed a penalty-free strategy that exploits all efficient solutions generated, without introducing additional measures aimed at reducing the propagation of infeasible solutions.
2.1 Details of the Optimization Model
We used the EPANET 2 hydraulic simulation model (Rossman
2000) to determine the hydraulic properties of all solutions generated in the optimization process and to ensure the solutions satisfy conservation of mass and energy. The optimization model minimizes the initial construction cost,
f
1
, the infeasibility measure,
f
2
, and the number of pipes,
f
3
, as explained below.
$$ {f}_1={\displaystyle \sum_{ij}f\left({L}_{ij},{D}_{ij}\right)} $$
(1)
$$ \begin{array}{ccc}\hfill {f}_2=l+h+\left({S}^{*}-S\right)+\left({S}_g^{*}-{S}^{*}\right):\hfill & \hfill l={\displaystyle \sum_{i=1}^N \max \left(0,{R}_i^{req}-{R}_i\right),}\hfill & \hfill h={\displaystyle \sum_i \max \left(0,{H}_i^{req}-{H}_i\right)}\hfill \end{array} $$
(2)
$$ {f}_3={\displaystyle \sum_{ij}{p}_{ij}} $$
(3)
in which
N = number of nodes; for pipe
ij,
L
ij
= length;
D
ij
= diameter;
p
ij
= 1 if pipe
ij is included in the topology and
p
ij
= 0 otherwise;
H
i
and
H
i
req
= available and required residual head at demand node
i, respectively;
R
i
and
R
i
req
= actual and required number of independent supply paths to node
i, respectively;
S = entropy;
S
* = maximum entropy; and
S
g
*
= global maximum entropy.
The function
l in Eq.
2 represents the total topological infeasibility of a candidate solution. The topological infeasibility at node
i was taken as the shortfall in the number of independent supply paths
R
i
. The required number of independent supply paths,
R
i
req
, is typically 1 and 2, respectively, for branched and fully looped configurations. The function
h in Eq.
2 represents the residual head infeasibility. If
H
i
≥
H
i
req
for all demand nodes, then the solution is hydraulically feasible. The required residual head
H
i
req
is the head at a node above which demands are satisfied in full.
H
i
req
is typically not less than a
minimum of about 7 m (OFWAT
2008).
For any feasible topology that has loops, there are multiple feasible sets of flow directions each of which has a maximum entropy value. S
* is the theoretical maximum value of entropy for a particular feasible set of flow directions while S
g
*
is the global maximum entropy value considering all permissible topologies. The global maximum entropy value S
g
*
is not known a priori; our algorithm evolves the global maximum entropy solution by assuming it corresponds to the largest entropy value it has so far identified. The infeasibility measure f
2
seeks feasible solutions that have high values of entropy (a proxy for hydraulic reliability and redundancy). Minimizing the infeasibility measure f
2
promotes the inclusion of a range of maximum entropy solutions for which, by definition, S = S
*, in the nondominated set in addition to S
g
*
.
To complete the characterization of the infeasibility function
f
2
, the entropy functions are described here briefly (Tanyimboh and Templeman
1993).
$$ S={S}_0+{\displaystyle {\sum}_{i=1}^N{P}_i{S}_i}; $$
(4)
S = entropy;
S
0
= entropy of source supplies;
S
i
= entropy of node
i;
P
i
=
T
i
/
T = fraction of the total flow through the network that reaches node
i;
T
i
= total flow that reaches node
i;
T = total demand;
$$ {S}_0=-{\displaystyle \sum_{i\in I}\frac{Q_{0i}}{T} \ln \left(\frac{Q_{0i}}{T}\right)}; $$
(5)
Q
0i
= inflow rate at source node
i;
I = the set of source supply nodes;
$$ \begin{array}{cc}\hfill {S}_i=-\frac{Q_{i0}}{T_i} \ln \left(\frac{Q_{i0}}{T_i}\right)-{\displaystyle \sum_{ij\in out\left({N}_i\right)}\frac{Q_{ij}}{T_i}} \ln \left(\frac{Q_{ij}}{T_i}\right),\hfill & \hfill\ i=1,.....,N;\hfill \end{array} $$
(6)
Q
i0
= demand at node i; Q
ij
= flow rate in pipe ij; and out(N
i
) = set of all pipe flows from node i.
For a typical node with, say, two incident pipes downstream, it can be shown that
S
i
≤ ln(3) ≈ 1.1 (Shannon
1948). Given that
P
i
=
T
i
/
T ≤ 1.0, it is expected that the value of the network entropy
S in Eq.
4 will be relatively small for the typical water distribution system. Therefore, it is expected that the contributions of the entropy terms (
S
* −
S) and (
S
g
*
−
S
*) to the infeasibility measure
f
2
in Eq.
2 will be relatively small. The objective function
f
2
may be considered an
entropy-
augmented infeasibility measure. Minimizing
f
2
aims simultaneously to satisfy residual head and topology requirements and maximize entropy. Eqs.
4‐
6 are an extension of the statistical entropy function that is a measure of uncertainty (Shannon
1948). In a probabilistic system the uncertainty is a maximum if all possible system states or outcomes are equally likely. Conversely, the uncertainty decreases as the probabilities associated with the states or outcomes become more unequal. The term [(
S
* −
S) + (
S
g
*
−
S
*)] = (
S
g
*
−
S) in the infeasibility measure
f
2
may be considered an estimate of the
unrealized entropy potential; by definition its value is zero for
S =
S
* =
S
g
*
.
2.2 Practical Topology Confirmation and Redundant Binary Codes
We developed a topology confirmation algorithm coded in C, to enable a consistent and bias-free fitness assessment of all feasible and infeasible solutions. The total number of paths
NP
i
supplying demand node
i from all sources collectively was determined with regard to the pipe flow directions obtained from EPANET 2. We used an efficient path enumeration algorithm proposed by Yassin-Kassab
et al. (
1999). If
NP
i
= 0, the node cannot be supplied. If
NP
i
= 1, the node can be supplied. If
NP
i
≥ 2, for all nodes, a path inter-dependency investigation is carried out to check whether the network is fully looped. We adopted a practical procedure that does not involve an exhaustive enumeration of all the paths supplying each node. For a pair of independent supply paths, removing a pipe from one path does not affect the other path. Therefore, the procedure entails removing all pipes one at a time and in each case observing whether all nodes can be reached. If all nodes can be supplied from one or more sources after the removal of all pipes one by one with replacement, then all nodes have at least two independent supply paths. It is worth observing that EPANET 2 sets default values of node pressures and pipe flows within parts of a network that are not connected to a source. We addressed this by assigning zero flows and pressures, respectively, to such pipes and nodes.
In order to represent the vector of decision variables in a genetic algorithm, an
n-bit binary string gives rise to 2
n
different
n-bit codes and, depending on the number of decision variables, some codes may be redundant. We assumed redundant codes represent closed pipes whose flow-carrying capacity is zero. The closed pipes are allocated pipe sizes taken from just above the upper end of the
real set of available pipe diameters. The data required to implement the procedure are the unit costs for the
fictitious or assumed diameters. As the fictitious diameters have no functional value, it is anticipated they will become extinct through evolution and natural selection. The benefits of this novel approach are that it is entirely generic and very practical; additional parameters that require special calibration are not introduced and pre-optimization trial runs are not required. The premature loss of potentially useful genes is thus avoided, and the genetic code that is transmitted in successive generations is not degraded (Herrera
et al.
1998).