Network model
There are
N sensor nodes being deployed randomly with a square field and the BS is located far away from square sensing field. The nodes observe the area continuously and transmit the observation results to the BS periodically.
s
i
represents the
i-th sensor node, and the corresponding sensor nodes set is
S = {
s1,
s2,…,
s
N
},|
S| =
N. To determine the optimal parameters for our model, we make the following assumptions:
1.
The BS is located outside the square area and far away from the observation field. Once being deployed, the BS and sensor nodes will keep stationary.
2.
The location-unaware sensor nodes are uniformly distributed in the observation area, and each of them will be allocated with a unique ID.
3.
After detecting the strength of the signal, the sensor node can estimate the approximate distance from the sender, and adjust the transmission power adaptively to save energy according to the distance.
4.
All sensor nodes are capable of data fusion. Assuming data being perfect relevant, they can be fused into a plurality of packets with equal size.
In this paper, the energy consumed during per round can be estimated based on the energy consumption of the nodes for reception or transmission in each round. To measure the energy consumption, the First Order Radio model is exploited [
7]. The energy spent for transmission of
l-bit packet from the transmitter to the receiver at a distance (
d) can be defined as:
$$ E_{Tx} = \left\{ \begin{aligned} lE_{elec} + l\varepsilon_{fs} d^{2} ,{\kern 1pt} \;{\kern 1pt} d < d_{0} \hfill \\ lE_{elec} + l\varepsilon_{mp} d^{4} ,{\kern 1pt} \;{\kern 1pt} d \ge d_{0} \hfill \\ \end{aligned} \right. $$
(1)
where
E
elec
is the dissipated energy to operate the transmitter or receiver circuitry per bit,
d is the transmission distance.
ɛ
fs
and
ɛ
mp
are the amplifier energy factors for free space and multi-path fading channel models, respectively. The cross over distance
d0 is the threshold that depends on the specific scene and the amplifier energy factors, which can be given as
\( d_{0} = \sqrt {\varepsilon_{mp} /\varepsilon_{fs} } \).
The energy being spent for receiving
l-bit data can be given as:
$$ E_{Rx} (l,d) = lE_{elec} $$
(2)
and the energy consumption for data aggregate by CH is:
$$ E_{Aggr} (l,d) = lE_{DA} $$
(3)
where
E
DA
is the energy consumption of data fusion per unit.
Optimal competition radius
In order to prolong the lifetime of the network, it is necessary to balance the energy consumption among sensor nodes [
27]. For reducing the energy consumption in the cluster, the inter-cluster communication distance should be restricted within the threshold
d0, and it will ensure to keep in touch of the sensor’s energy loss by free space model. Under the condition of the single hop mode, the CH can send data to the BS directly. If the BS is far away from the monitoring area, the CH needs to employ multi-path attenuation model to deal with the power amplification loss. It will increase the energy consumption of CH greatly. Therefore, CHs are more likely to consume energy and turn to die earlier than its member nodes, which will shorten the lifetime of the whole network badly. Therefore, it is crucial to set optimal competition radius of CHs and form distributed and uniform clusters to balance the energy consumption.
When a node broadcasts a candidate-CH message, the range is termed as the competition radius of candidate-CH. Only the nodes within the radius of the competition can receive the message from the candidate CH. During the stage of the CH selection, the spatial distribution of CHs can be restricted by setting the competition radius. In a single hop mode, the energy of the sensor node may be primarily used for sending data to the BS. Therefore, the radius of competition can be regarded as the key parameter for affecting the lifetime of the network. If the radius value is larger, it will result in a lower number of clusters, and more energy consumption due to the higher signal power with respect to transmit across a larger distance. In brief, the appropriate competitive radius will be conducive to balance the energy consumption of CHs and the overhead of intra-cluster communications.
Suppose that the monitoring area is a square with the length
M and
N sensor nodes are deployed, then the node’s density is
ρ =
N/
M2. Let
d
CH
denote the competition radius of CH,
d
toBS
is the transmission distance from the CH to BS. Some redundant nodes will be scheduled into sleep state, and the detailed mechanism will be discussed in “
Determination of redundant nodes” section. Suppose that the number of redundant nodes in the cluster is
V, the percentage of active member nodes in the cluster member node will be
σ = 1 −
V/(
n − 1). For simplicity, the length of the packet being delivered to CH from the active member node is set to
l-bits in each round.
E
toCH
indicates the energy consumption for sending
l-bits packet from the member node to its CH,
E
re
is the energy consumed by CH for receiving such packet, and
E
toBS
denotes the energy consumption for sending
l-bit packet from the CH to BS. In addition, the energy consumption of all active nodes with range of CH’s competition radius to transmit their collected data to the CH can be obtained:
$$ \begin{aligned} E_{toCH} = \sigma \int_{{_{{d_{CH} }} }}^{0} {2\pi x \times \rho \times \left( {lE_{elec} + l\varepsilon_{fs} x^{2} } \right)} {\text{d}}x \hfill \\ = \sigma l\pi \rho \times \left( {E_{elec} d_{CH}^{2} + \frac{1}{2}\varepsilon_{fs} d_{toBS}^{4} } \right) \hfill \\ \end{aligned} $$
(4)
where 2
πx ×
ρ × d
x is the number of nodes being located in the ring annular area with the length
x(0 ≤
x ≤
r
i
).
For each CH, the energy consumed for receiving monitored data from its member nodes can be estimated as:
$$ E_{re} = l \times \left( {\pi r_{i}^{2} \rho - 1} \right) \times E_{elec} $$
(5)
The energy consumed of CH for data aggregation is given as:
$$ E_{ag} = l \times \pi r_{i}^{2} \rho \times E_{DA} $$
(6)
Besides, the energy consumed by CH for data forwarding to the BS can be given as:
$$ E_{toBS} = lE_{elec} + l\varepsilon_{mp} R_{i}^{4} $$
(7)
where
R
i
is the distance from the CH to the BS.
Therefore, the total energy dissipation in a cluster can be calculated as:
$$ E_{cluster} = E_{toCH} + E_{toBS} + E_{re} + E_{ag} = l\pi \rho r_{i}^{2} \left( {\sigma + \frac{1}{2}\sigma \varepsilon_{fs} r_{i}^{2} + E_{elec} + E_{DA} } \right) + l\varepsilon_{mp} R_{i}^{4} $$
(8)
Then, the average energy consumed by a node in a cluster is:
$$ E_{avg} = \frac{{E_{cluster} }}{{\pi r_{i}^{2} \rho }} $$
(9)
By taking the derivative of
r
i
in formula (
9), we can derive the optimal competition radius
d
CH
as follows:
$$ d_{CH} = \sqrt[4]{{\frac{{2\varepsilon_{mp} }}{{\sigma \pi \rho \varepsilon_{fs} }}}}d_{toBS} $$
(10)
From the formula (
4) and (
5), it can be observed that the optimal competitive radius of each node will increases as the distance between the node and the BS. According to the optimal competition radius of each node, it can minimize local energy consumption, and come into being an uneven hierarchical structure of clustered WSNs. In the region close to the BS, the distribution density of the clusters will have relatively small size.
Selection of CHs
During the phase of CH-determination, CHs will be determined based on a linear combination of probability selection and local competition. Once the operation is complete, all nodes may be put into the status of member, candidate-CH or CH. Initially, each node is treated as common node, and will generate a certain probability for being candidate-CH in view of the distance from the BS and the residual energy. Comparing with the nodes far away from the BS, the sensors that are close to the BS will have higher probability of being candidate-CHs. Therefore, the probability function can be defined as follows:
$$ CHS(i) = \alpha \times \frac{{d_\text{{max }} - d_{toBS} (i)}}{{d_\text{{max }} - d_\text{{min }} }} + (1 - \alpha ) \times \frac{{E_{res} (i)}}{{E_{init} }} $$
(11)
where
dmax denotes the is the largest distance to the BS, and
dmin denotes the nearest distance.
α is the constant parameter.
Each node will obtain the probability of being candidate-CHs to ascertain whether it has competencies required to be a CH in current round. Moreover, the nodes being candidate-CHs will modify their own state, and calculate the optimal competitive radius according to their distance to the BS. To save energy, other nodes that fail to become candidate-CHs can turn off the module of wireless communication during the phase of CH selection.
Besides, the candidate-CHs should acquire the location information and adjacent competitors within its communication range. Each candidate-CH maintains a neighbor candidate-CH table, which includes the node’s ID, the remaining energy and status flag of adjacent CHs. After CH’s selection, CHs will broadcast the message, which includes their identity and the member nodes list, and wait for adjacent member nodes to join in. According to the received signal strength, non-CH node estimates the distance of its neighboring CHs, and choose to join in the nearest one. It can reduce the energy consumption of the member nodes for data delivery, and also make the nodes near the BS undertake more burden of data forwarding or aggregation to achieve balanced energy consumption of the entire network. Next, the member nodes will send to the CH with the join_MSG, which contains the node’s ID and distance between the CHs. By receiving the join_MSG from the non-CHs, the CH will send ACK message and update the cluster membership list simultaneously.
Determination of redundant nodes
Since the member nodes are often distributed in the adjacent region, the data collected by those sensors tend to demonstrate spatial and temporal correlations [
27]. In order to reduce unnecessary energy consumption, some sensor nodes maybe turn into the sleep state. Our work mainly focuses on the node’s sleep schedule, which is generally more suitable to solve the problems of data redundancy and transmission conflict. Moreover, by making some nodes enter the dormant state, the sleep scheduling strategy can save the energy consumption derived from node’s active state. In this section, we will discuss how to extend the WSN’s lifetime by using optimization strategy based on fuzzy clustering theory.
The main idea is based on classification by fuzzy mathematics, which can group the nodes with high data similarity into the same category [
28]. According to the scheduling mechanism, a certain mount of sensor nodes will be selected from all categories. More concretely, while the perceptual data received by CH from its member nodes is accumulated to a certain extent, the fuzzy similarity matrix can be constructed to make clustering. Next, in the premise of the data fusion accuracy as high as possible, some nodes can be chosen from all categories as redundant ones. Finally, a specific sleep scheduling mechanism can be applied in those redundant nodes to reduce the communication costs and traffic conflicts.
The mutual support degree between node s
i
and s
j
can be defined as the confidence distance, which is expressed as function Del(i, j). The smaller the value is, the closer the measurement value of the pair of sensor nodes is. Conversely, it demonstrates that the monitored data collected by those nodes differ considerably. Clustering method is based on fuzzy matrix to classify the observed objects. For different confidence levels, different classification results can be obtained, and then to form a dynamic clustering diagram.
Assuming domain S = {s1, s2,…, s
n
} denotes the set of member nodes of a cluster, and n is the number of nodes in the cluster. The monitoring time can be divided into m intervals, and x
ij
represents the data collected by the member node s
i
at time j. The original data matrix can be given as X = (x
ij
)n×m.
After standardization, the matrix
X will be transformed into a fuzzy matrix [
29]. Firstly, the shift and standard deviation transformation is implemented and the element in the normalized matrix can be given as
$$ x^{\prime}_{ij} = \frac{{x_{ij} - \bar{x}_{j} }}{{z_{j} }},\quad (i = 1,2, \ldots ,n,\;j = 1,2, \ldots ,m) $$
(12)
where
\( \bar{x}_{j} = \frac{1}{n}\;\sum\nolimits_{i = 1}^{n} {x_{ij} } ,\;z_{j} = \sqrt {\frac{1}{n}\sum\nolimits_{i = 1}^{n} {(x_{ij} - \bar{x}_{j} )^{2} } } \; \), (
j = 1, 2,…,
m).
For
\( x_{ij}^{\prime } \) ∉ [0, 1], it is necessary to make further process for the point of view of unified dimension.
$$ x^{\prime\prime}_{ij} = \frac{{x^{\prime}_{ij} - \min_{1 \le i \le n} \{ x^{\prime}_{ij} \} }}{{\max_{1 \le i \le n} \{ x^{\prime}_{ij} \} - \min_{1 \le i \le n} \{ x^{\prime}_{ij} \} }},\quad (j = 1,2, \ldots ,m) $$
(13)
Thus, the fuzzy similar matrix
\( R\, = \,\left( {x_{ij}^{\prime \prime } } \right)_{n \times m} \) can be obtained.
Taking into account that temporal correlation of the data collected by those nodes, the correlation coefficient method is employed to form the fuzzy similarity matrix.
$$ r_{ij} = \frac{{\left| {\sum\nolimits_{k = 1}^{m} {(x_{ik} - \bar{x}_{j} )(x_{jk} - \bar{x}_{j} )} } \right|}}{{\sqrt {\sum\nolimits_{k = 1}^{m} {(x_{ik} - \bar{x}_{i} )^{2} } } \sqrt {\sum\nolimits_{k = 1}^{m} {(x_{jk} - \bar{x}_{j} )^{2} } } }} $$
(14)
Next, the λ-truncation matrix R
λ
= (r
ij
(λ)) can be deduced according to the fuzzy similarity matrix. Since R
λ
is Boolean matrix, the classification of the sensor nodes depends on whether the element value of matrix R
λ
is equal to 1 or not. The specific principles are as follows: (i) If both R
λ
and R are equivalent matrixes, the nodes can be categorized directly; (ii) Otherwise, until R
λ
being converted into an equivalent Boolean matrix by certain rules, the classification method will not be implemented.
Furthermore, the fuzzy similarity matrix is used to obtain the clustering graph. By choosing λ1 = 1 and generating the equiform class [x
i
]
R
= {x
j
|r
ij
= 1} for each x
i
, the node x
j
can be attributed into the same class when the condition is satisfied. By taking λ2(λ2 < λ1) as the second maximum value, the elements pair (x
i
, x
j
) with the similarity degree λ2 can be found out from the matrix R, i.e. r
ij
= λ2. Similarly, by merging x
i
and x
j
in the equivalent classification λ1 into one class, the equivalent classification on the level λ2 can be obtained. And so forth, let λ1 > λ2 > ··· > λ
k
until S is merged into a single class, the number of k categories for clustering can be obtained.
After clustering, the sensor nodes in same cluster can be partitioned into several categories based on the similarity of monitoring data. In each category, some nodes may be selected as redundant nodes and scheduled to sleep state. It will reduce the overall energy consumption of the network significantly. Obviously, if the number of redundant nodes selected is less, the greater the amount of information is retained on the whole, that is, the data collected is more comprehensive.
Let
s
i
(v)
denote the
i-th node in category
v, and the number of nodes in category
v can be expressed as
V = |
s(v)|,
\( \sum\nolimits_{v = 1}^{k} {\left| {s^{(v)} } \right|} = \,n \). To measure the difference between the data being collect at time
m, we have
$$ Del(s_{i}^{(v)} ,s_{j}^{(v)} ) = \sqrt {\sum\limits_{u = 1}^{m} {(x_{iu}^{(v)} - x_{ju}^{(v)} )^{2} } } ,\quad (i,j = 1,2, \ldots ,V,\;\;i \ne j) $$
(15)
According to the principle of redundant nodes’ selection, it should be guaranteed to minimize the amount of information loss. Thus, the objective function for selecting redundant nodes can be given as:
$$ s_{*}^{(v)} = \arg \hbox{min} \left\{ {\sum\limits_{i = 1}^{V} {Del(s_{i}^{(v)} ,s_{j}^{(v)} )} } \right\} $$
(16)
Eventually,
\( s_{*}^{\left( v \right)} \) denotes the redundant nodes selected from the category
v.