Abstract

Due to the dynamically changing topology of Internet of Vehicles (IoV), it is a challenging issue to achieve efficient data dissemination in IoV. This paper considers strongly connected IoV with a number of heterogenous vehicular nodes to disseminate information and studies distributed replication-based data dissemination algorithms to improve the performance of data dissemination. Accordingly, two data replication algorithms, a deterministic algorithm and a distributed randomised algorithm, are proposed. In the proposed algorithms, the number of message copies spread in the network is limited and the network will be balanced after a series of average operations among the nodes. The number of communication stages needed for network balance shows the complexity of network convergence as well as network convergence speed. It is proved that the network can achieve a balanced status after a finite number of communication stages. Meanwhile, the upper and lower bounds of the time complexity are derived when the distributed randomised algorithm is applied. Detailed mathematical results show that the network can be balanced quickly in complete graph; thus highly efficient data dissemination can be guaranteed in dense IoV. Simulation results present that the proposed randomised algorithm outperforms the present schemes in terms of transmissions and dissemination delay.

1. Introduction

As a promising branch of Internet of Things (IoT), Internet of Vehicles (IoV) mainly improves traffic efficiency and assists road safety through wireless communication technologies [1]. Interconnected by means of vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications, IoV could provide data services including road safety (such as collision warning and smart traffic management), entertainment demand services (such as advertisements and online videos), and location-based services (such as interest points and path optimization). Thus IoV plays a vital role in accident warning, traffic management, and user entertainment services [2].

To enhance on-road transportation safety and efficiency, efficient data dissemination which can enable high-rate communications and rapid data dissemination, is essential for applications in IoV. Data replication can improve dissemination performance effectively, as all the vehicles involved in data dissemination help disseminate a certain quantity of message copies. Therefore, the process of information dissemination could be expedited and the dissemination delay could be reduced [3].

Characterised by decentralised control, emerging applications in IoV are confronted with problems, such as efficient cooperation among vehicles and network consensus [4]. Adapting to dynamically changing network, a type of algorithm based on distributed averaging, gossip algorithm, attracts lots of interest [5]. Through a series of communications, the participants could have the same value or reach the common state. However, gossip-based algorithms might lead to a significant waste of network resources (network capacity, bandwidth, and computing resources) by transmitting redundant information. Similarly, although dynamic data replication can accelerate data dissemination in distributed ad hoc networks, replication-based methods could also meet a variety of problems; for instance, the high density may result in longer communication delay, which causes network resources wasting and scalability issues. Towards data replication, Spyropoulos et al. [6] proposed to disseminate a limited number of replicas; however they did not consider available network capacity and bandwidth. RAPID [7] solved the problem by taking data utilities into account to determine how the replication should carry out. Additionally, traditional replication-based dissemination algorithms could lead to high communication overhead as well as congestions and sometimes even broadcast storm by passing around redundant information. Considering the mentioned problems, the quantity of data replicas spread in the area should be controlled.

To accelerate information dissemination, every vehicle could carry a number of data replicas. In this way, the computational burden can be distributed among the vehicles and the network load balancing can be achieved. Accordingly, a concept of network balance is proposed.

In this study, we mainly investigate data dissemination in dense IoV, which can be abstracted as complete graph by graph theory. In the situation of complete graph, we assume that every two vehicular nodes are within each other’s communication rage. As a tentative study, the conference paper [8] focuses on data dissemination in the context of complete graph.

Additionally, since nodes in the network can have different capabilities in terms of computation or processing due to their heterogeneity, it would be better to carry an appropriate number of data replicas according to the vehicles’ own capabilities rather than an approximately equal number of data replicas as [9]. Dissemination strategies will be adjusted according to different capabilities of nodes. However, most previous work studies homogeneous vehicles in vehicular communication. Therefore, this study considers heterogeneous vehicles such that data replication strategy should be determined by the capabilities of the vehicles.

To achieve data dissemination to a target area with reduced dissemination delay and consumed resources, a deterministic algorithm and a distributed randomised algorithm based on data replication are proposed for dense vehicular scenarios. In the proposed framework, different types of vehicles have different dissemination capabilities. Each vehicular node is allocated with a corresponding value to indicate the quantity of replicas that the node can spread. Every node selects one of its neighbours to exchange data depending on the proposed algorithms, and then the pair of nodes take proportional average operations, such that the values of the vehicular nodes could be updated. By iterating the operations among the nodes, the network can reach a balanced status; that is, the network converges to a consensus. To prove the efficiency of the algorithms, we evaluate the convergence complexity by calculating the average operations needed for network balance. Detailed theoretical analysis of convergence complexity is provided.

To summarise, the current study presents the following key contributions.

( We consider heterogeneous vehicles with different capabilities and propose a deterministic algorithm and a distributed randomised algorithm for dense scenarios in IoV, by utilising data replication to enhance data dissemination. In the algorithms, the quantity of data copies is bounded while a network balanced status can be achieved.

( Theoretical analysis is presented to illustrate the number of stages needed when the network achieves a balanced status in the deterministic algorithm. The upper and lower bounds of the distributed randomised algorithm are also derived. Simulation results show the effectiveness of the distributed randomised algorithm.

The remainder of the paper is structured as below. Section 2 introduces related data dissemination schemes in vehicular networks as well as the average consensus problem. Section 3 describes the system framework. Section 4 presents a deterministic algorithm and a distributed randomised algorithm for complete graph. Section 5 gives the upper and lower bounds of the proposed randomised algorithm. Section 6 evaluates the performance of the distributed randomised algorithm. Finally, Section 7 summarises the study and Section 8 presents the future prospect.

This section mainly introduces some information dissemination schemes in IoV. Meanwhile, related work on average consensus problem is discussed.

2.1. Data Dissemination in Vehicular Networks

As multiple data replicas can be forwarded to an area of interest, many works have studied replication-based data dissemination schemes and a variety of dissemination strategies have been developed [6, 7]. As a simple data dissemination scheme, while flooding has the merits of high dissemination speed and wide coverage, it could cause serious broadcast storm. Towards the problem, improvements have been made by Torres et al. [10] to adapt to various traffic scenarios.

In the routing mechanism developed by [11], the amount of data spread in the target area mainly depended on the distance from source to the base station within its communication range. Xing et al. [12] proposed a framework of utility maximisation problem for multimedia dissemination and obtained the closed form of the network utility. Wu et al. [13] aimed to fully utilise the available network capacity and presented a distributed data replication scheme. Shen et al. [14] designed a data dissemination framework to schedule data transmission with maximum dissemination utility and took advantage of the space-time network coding to improve the network efficiency. To minimise the dissemination delay to a desired number of receivers, Yan et al. [15] converted the problem to processor scheduling problem and proposed heuristics to solve the problem. Xiang et al. [16] quantified different classes of data preferences and designed a safety data dissemination protocol. Zhao et al. [17] incorporated link quality and diversity as the sender metric, based on which an efficient selection mechanism for bulk data dissemination was proposed. Chen et al. [18] studied the relation between content replication and RSU deployment and developed a cooperative replication scheme. Given a set of tasks to be executed in vehicular clouds, Jiang et al. [19] proposed the balanced-task-assignment (BETA) policy to minimise the probability of deadline violation. The authors in [20] focused on data dissemination in IoV with social characteristic and applied the property in the design of dissemination strategies. Fan et al. [9] considered vehicles with the same capability while Lin et al. [21] studied resource allocation in vehicular cloud computing systems with heterogeneous vehicles and proposed a semi-Markov based architecture to achieve optimal resource allocation. Ghorai et al. [22] considered that the obstacles might affect radio propagation and then proposed a forwarding node selection algorithm based on fuzzy logic. Ding et al. [23] studied the cooperation in group vehicular interactions and presented a dynamic member public goods game model and a greedy based neighbour selection scheme towards the high density vehicular networks.

2.2. Average Consensus Problem in Wireless Networks

As the average consensus problem attracts lots of interest in research areas, such as wireless networks, many researches have been done to address the problem [24]. Boyd et al. [25] studied randomised gossip algorithms. They developed a distributed subgradient method to improve the speed of gossip algorithms and designed a framework that could be applied to analyse distribute algorithms in different scenarios. The consensus studied by Fagnani et al. [26] could be achieved at some point. Different from average preserving algorithms, this consensus point might not be the same as the average by initial states. As bidirectional communication among agents was not necessary, the studied algorithms could be applied to more settings. To describe gossip periodic sequences in an undirect graph, Yu et al. [27] used transfer function of the node. Chen et al. [28] utilised probabilistic grouping in the proposed distributed random grouping algorithm to converge to the sums. Therefore, the impact of dynamically changing topology would be alleviated. Aysal et al. [29] developed a novel gossiping algorithm for deriving the average values that could simplify the process of random gossiping and described the conditions to guarantee the network convergence. To study network consensus in strongly connected networks, Wu et al. [30] presented a gossip-based algorithm and showed it could quickly reach consensus as well as reducing the consumed transmissions. Franceschelli et al. [31] studied the execution time of heterogeneous tasks in an undirected graph. They proposed randomised interaction algorithm based on gossip to let the nodes cooperatively complete the tasks to minimise the task execution time. Nedić et al. [32] studied the characteristics of weighted-averaging dynamic for network consensus.

The mentioned literature talks about the reliability and efficiency of data dissemination as well as network consensus rather than both of the problems. Also, as few of the previous works study the heterogeneity of vehicles; this study considers a scenario that a number of vehicles with different capabilities exist, according to which the replication strategy is determined. In summary, we aim to design replication-based dissemination schemes to facilitate data dissemination in heterogeneous vehicular networks while the network convergence rate is investigated.

3. System Framework

3.1. Network Architecture

The proposed network architecture is shown in Figure 1. As it is shown, a source vehicle carries a message and aims to disseminate the message to the area that is indicated by the circle. The message dissemination is completed by pure vehicle-to-vehicle communication. In the network, two vehicular nodes would update their own values after an average operation until the network consensus is reached.

Different from previous settings, this study considers heterogeneous vehicles with different capabilities such that the number of replicas assigned to each vehicle should be determined according to the vehicle’s capability. For example, there are three types of vehicles that are classified as red vehicles, yellow vehicles, and black vehicles. Assume that red vehicle could carry 100 replicas while yellow and black ones could carry 200 and 300, respectively. Assign each type of vehicles a parameter to indicate the maximum number of replicas the vehicles can spread. When two vehicles communicate, they exchange the replicas not by simply averaging their values; instead, the average operations are taken according to the vehicles’ capabilities. For example, in Figure 1, let and denote the values of the red vehicle and the yellow one, respectively. The following operation will be taken when they communicate with each other, where and denote the new values of the red vehicle and the yellow one, respectively.

3.2. Time Model

In the proposed architecture, it is allowed to let pairs of independent nodes contact and exchange information in parallel. We apply the synchronous time model [25]. As in the synchronous time model, the nodes can communicate simultaneously, while it only allows one node to communicate in each time slot in the asynchronous time model.

3.3. Assumptions and Definitions

This part presents some related assumptions and definitions for bounded number of data dissemination.

Assumption 1. The vehicular network of our concern is described as an undirected graph . Assume that every two vehicular nodes can exchange information with each other. Consider dense IoV, such as the scenario when road congestions happen or parking lots with many parked cars. According to graph theory, this type of network topology can be abstracted as complete graph. Every vehicle node owns a value to indicate the number of replicas it could spread. Let denote the value of node . Accordingly, graph becomes a weighted graph.

Assumption 2. As it is stated that the number of message replicas is limited to a value, we let parameter denote the maximum quantity of replicas. Assume there are types of vehicles in the system, e.g., . The corresponding capability of the vehicles are indicated as . If the vehicle of (value ) and the vehicle of (value ) meet, the operation should be taken based on proportion, , such that the new values should be , and . In a system with homogeneous vehicles, .

To calculate the communication stages needed to obtain network balance, the following lemmas and definitions are presented.

Definition 3. The nodes in the weighted graph are associated with corresponding nonnegative numbers. We say that an -balanced status is achieved among the nodes in the graph with the following conditions met. (i)For any node, the number of message replicas is not smaller than 1, that is, .(ii)For any pair of nodes with , , where .(iii)If and , no edge should exist between the two nodes with values and .

Lemma 4. Let , and be real numbers satisfying the condition that is equal to . Then the following results can be obtained , and if the inequality holds.

Lemma 4 is very easy to obtain and will be used to represent the change of potential.

Definition 5. Assume that denotes a list containing real numbers and denotes a list containing nonnegative integers. We present the following definitions. (i)A real average function is a mapping , such that for two numbers , if , or if .(ii)An integer average function is a mapping such that for two numbers , if , if , or if .(iii)As to list , potential of list is defined as .(iv)Assume ; we have , which could be indicating a small piece of length from the bar of length to go down by . Function presents the potential change after an average operation (see Lemma 4).(v)Assume is converted into a new list after taking average operations. Let denote all the tuples which involve in average operations and then fulfill the list transformation. Then we have to represent the sum of product.

Definition 6. A communication stage represents an average operation that happens in the connected graph. Only the nodes linked by the independent edges could exchange information and take average operations. Pairs of nodes connected by different independent edges can communicate with each other at the same time.

Through iterative average operations among the nodes, we will achieve an -balanced status. The quantity of stages needed for -balance will be analysed to reflect the complexity of network convergence.

4. Algorithm Design

We abstract the strongly connected network topology as complete graph. For data dissemination in complete graph, we propose a deterministic algorithm and analyse the stages needed for network balance in Section 4.1. Then, we present a distributed randomised algorithm in Section 4.2. Upper and lower bounds of the randomised algorithm are derived through detailed theoretical analysis in Section 5.

4.1. Deterministic Algorithm

Here, a deterministic algorithm is proposed for complete connected graph. The algorithm is described as Algorithm 1.

Input: Graph .
Output: Graph , communication stages .
(Initialisation;
(;
( nodes, values with ;
(Stage (step ( - step (1):
(sort in descending order;
(if the nodes with value no smaller than two are more than
the ones with value zero then
(let the latter ones take operations with the former ones;
(else
( take operations the other way around.
(end if
(if there is no node with value at least two or with value
zero then
(Let and take average for
;
(end if
(Enter into the next stage, ;
(End of Algorithm.

In the proposed deterministic algorithm, the nodes perform operations in a deterministic manner to achieve network balance. The first step is to initialise the input graph, assume the source node has a value , all other nodes have value zero. Following the parameter initialisation, the deterministic algorithm is executed in two ways, which is determined by the number of nodes with value at least two and nodes with value zero. The operations will be iterated until the network reaches a consensus. The flowchart of the proposed deterministic algorithm is shown in Figure 2, to give a clear description of how the operations are done in a deterministic way. In the flowchart, denotes the number of nodes with value at least two while denotes the number of nodes with value zero. Finally, we have Theorem 7 to show the consumed communication stages.

Theorem 7. For complete connected graph, an algorithm exists such that after stages of real average operations, the network can reach an -balanced status. The algorithm is shown as Algorithm 1.

Proof. It is easy to see that there are at most stages for steps ) - (. The upper bound for steps ( - ( is given below.
Let be the parameter value of node . In general, assume that . Let and take average.
Assume that after one stage, pair and has the largest average , and pair and has the least average . We assume that .
It is noted that either or . After stages, the difference of the nodes is at most , such that after stages, an -balanced status can be achieved.

4.2. Distributed Randomised Algorithm

The following part develops a distributed randomised algorithm (DRA), shown as Algorithm 2.

Input: graph ;
Output: graph , parameter .
( Initialisation;
(Let ;
(Let indicate a vehicular node;
(Let indicate the distribution task of node ;
(Let indicate the new distribution task of node ;
(repeat
(;
(Each vehicular node at sending status randomly
chooses a vehicular node within its neighbourhood, then
sends the communication request;
(Each vehicular node at receiving status selects the
node from the received request if the gap between the
two nodes is the largest;
(Take pairwise proportional average operations for the
corresponding pairs of nodes;
(Let and indicate the vehicles who take average with
each other;
(New values of node and are updated
as , and
, respectively;
(until ()
(End of Algorithm.

The flowchart of the proposed deterministic algorithm is shown in Figure 3, to give a clear description of how the operations are done in a random manner.

During the process of initialisation, related parameters as well as the settings (including the values of nodes, the maximum number of replicas, and number of vehicles of different types) in the network are set. Then, we have to determine how the replication strategy is carried out. For each node in sending status, it randomly selects a neighbour node to send the communication request. The receiving node would choose a node with the largest gap to take specific average operations according to the capabilities of vehicles. The algorithm would execute an iterative procedure until the network reaches consensus. Finally, the algorithm returns graph and the number of average stages when the network reaches the -balanced status.

To analyse the effectiveness of the randomised algorithm, we derive several important theorems based the famous Chernoff bounds [33]. The new theoretical results are shown as Theorems 8 and 9 and Corollary 10. The proof makes our entire study self-contained.

Theorem 8 (see [8]). Let be independent random - variables, where takes with probability at least for . Let , and . Then for any , .

Theorem 9 (see [8]). Let be independent random - variables, where takes with probability at most for . Let . Then for any , .

Corollary 10 (see [34]). Let be independent random - with being the sum of .
(1) If takes with probability at most for , then for any , .
(2) If takes with probability at least for , then for any , .

5. Performance Analysis

In each time step, every node becomes active with probability 1/2 independently. Consider a node that is active; let be its degree; that is, the number of its neighbours. Node selects at most one of its neighbours to contact and take the proportional average operation. Each neighbour has an equal probability to be selected, i.e., . The active nodes may receive more than one contact request and they would select one of the contact requests with the largest gap. To represent the averaging time of the randomised algorithm, we have the following theorem referring to Boyd et al. [25]. Here, the averaging time means the smallest time it takes a value to be -close to the average value in the system.

Theorem 11 (see [25]). The averaging time of the distributed randomised algorithm described above is given as where .

In the following part, an upper bound and a lower bound of the communication stages consumed for network balance are derived for the proposed randomised algorithm.

5.1. Upper Bound

Before we present the upper bound of the proposed randomised algorithm, the following concepts need to be clarified.

Definition 12. For two integers and ; the average operation generates two new integers , with the value of being equal to , and is equal to the remaining value.

Definition 13. Let denote a set of real numbers. is defined as .

Definition 14. Let , and denote a list of real numbers. Assume that is transformed into after a series of communication stages. We regard as an -shrink compared with when is within a factor of .

Lemma 15 is derived to show how the gap of a list of numbers shrinks via the specific average operations.

Lemma 15 (see [8]). Let be a function from that generates a random element in . Assume and are two subsets of satisfying , and , . Then with a probability at most we have where is a constant in . Furthermore, if for some fixed then the failure probability is at most for some fixed .

Proof. Let denote the number of elements in and let denote the number of elements in . In subset , with probability , each element sends its corresponding request to an element in . Combining with Chernoff bound, the inequality holds with a small probabilityAssume and . The probability that isCombining inequalities (6) and (10), the failure probability is at most . Thus the lemma is proved.

Lemma 16. Let be the list of all elements that will take average operations. For some fixed , with the failure probability not larger than , the following conclusions hold.(1)After stages of average operations, there is an -shrink.(2)After stages of integer average operations, there is an -shrink if is at least for some to be large enough.

Proof. Let , , and ; the median is , which is also equal to . and denote the sets of elements greater than and not larger than the median () of and , respectively. In general, assume is not larger than . Let , , , and . Three periods of communication stages will be discussed in the following part.
If , enter Period 1 below, or otherwise, enter Period 3.
Period 1. communication phases will be performed as follows.
Use to represent a set of elements , which satisfies one of the following conditions:
) ; does not participate in any average operation;
) is one of the elements generated by averaging elements and , where and belong to set .
Use to represent a set of elements a, which satisfies one of the following conditions:
) ; does not participate in any average operation;
) is one of the elements generated by averaging elements and , where and belong to set .
Assume that .
Then, select three constants with the relation , and constants . For analysis, is set to 0.05, and is set to 0.1. It is assumed that . According to Lemma 15, for constant , the failure probability of is not larger than . Choose an integer that makes hold. After stages, we have the following inequalities.Its failure probability is at most .
is represented as the set of elements that belong to in set . If , the process will execute Period 2, or otherwise, Period 3 will be executed.
Period 2. communication phases will be performed as follows.
is defined to represent the final set generated by the set from period 1 after a series of communication operations. is defined as . is used to represent the set of elements via phases in this period.
is defined to represent the list of elements with values in the range of after stages, while indicates the elements in the range of . Similarly, and represent the elements in and in after stages, respectively. As it is stated, always holds in Period 2.
The number of elements that belong to is at most , thus they can help up to elements (the value of these elements is at most ) increase to at least . The reason is that each element with a maximum value can contribute up to . For the elements with values greater than , the contribution of these elements is at most . Thus, the quantity of elements in the range of is at least . According to Lemma 15, after each phase of operation, the number of elements in set would be decreased in a fixed rate, as each element in has a greater probability to be able to do operations with the elements in set .
Let be the probability that the number of elements in that select elements in to take average is less than , and let be the probability that the number of elements in that take average with the ones in is less than . According to Theorem 9 and Lemma 15, we have and . Thus, the elements in will be reduced.
Accordingly, with a failure probability no smaller than , that is, , the number of elements in is decreased by not smaller than It can be seen that communication stages are needed to reach stage . Once , enter Period 3.
It is easy to verify the case for all integer averages when the initial gap is big enough. We just need to set up the gap such that . There is a gap from the new list to the old list .
Period 3. communication phases will be performed as follows.
The set produced from after corresponding operations is denoted as . The equalities and hold.
Assume that with a very small failure probability, the elements in in sending status would select elements from . Meanwhile, with a probability not larger than , the elements fail to do average with those in . If an element has sent the requests for times, then the probability that it would not select an element from is not larger than . The probability that an element has no more than stages to send requests is not larger than .
Thus, the probability that an element in would not do average with that in is no greater than . The probability that two elements in would do average with the same element is no greater than .
Consequently, the probability that the number of stages in Period 3 is selected to be will not be larger than .
To summarise from the above analysis, the failure probability is not greater than .

Definition 17. We treat the communication that includes stages as -successful when there is an shrink in aspect of gap. For further analysis, let parameter indicate the probability when it fails to achieve a shrink within .

The following Lemma 18 is derived in our previous work [8]; thus proof of the lemma is omitted here. Based on this lemma, we present Theorem 19.

Lemma 18 (see [8]). indicates a parameter. Partition the communication stages into groups with each containing stages, which are denoted as . Then there are independent random variables for each group such that (1) is -successful(2).(3)there are at least to be -successful.

We use to indicate the quantity of vehicles. The corresponding value of node is when the network converges.

Theorem 19. The following conclusions will be achieved after stages by applying the proposed randomised algorithm.
(1) If , then each is either or
(2) If , then each and has a difference not greater than two between every two of them
(3) If , then , and the numbers of value zero and two are both no greater than .

Proof. Applying Lemma 18, and Chernoff bound, it can be known that a gap could be achieved after stages. And apply Lemma 16 stages if the difference is bounded by a fixed integer. The following cases are presented if the gap is .(i): the quantities of vehicles with value zero and two are at least in the same condition. We bound the largest elements with a constant. In this case, the number of items of and is at least .(ii): when the number of vehicles with value greater than zero is at least , the vehicles with value three will be removed via stages. Thus, and no more than vehicles have the value zero, and no more than with two.(iii): after stages, not less than elements are at least two as to constant .When the number of vehicles with value two is not less than , the number of zero would be decreased by for constant . And after stages, all the zero items will be removed. When two vehicles are with gap of at least two encounters, their values will be updated such that the gap will not be greater than 1.Assume indicates the set of the rest of the items, and , and . is defined as the value of , and it is assumed that . Additionally, let and and . The number of that selects value not smaller than is . In this way, value should disappear after stages.

To show the efficiency of real average operations, Theorem 20 is proved.

Theorem 20. Assume denotes the quantity of vehicles. Then the randomised algorithm takes stages to enter into an -balanced status.

5.2. Lower Bound

Theorem 21. For a complete connected graph, stages are needed to reach an -balanced network status.

Proof. Referring to the proposed randomised algorithm, it may only double the number of elements via each phase. Consequently, the number of communication stages consumed is if the quantity of vehicular nodes is more than the number of replicas .

6. Simulation and Analysis

We conduct our simulations by using NS-3 simulator. Consider the complexity of performance evaluation; we use the OpenSteetmap [35] to extract an area for simulation. The size of the selected area is set to 2000 m 2000 m, the satellite map of which is shown as Figure 4(a). Meanwhile, we use SUMO [36] to transform the extracted area into a simplified road network presented as Figure 4(b). We also use SUMO to generate the movement trajectories of vehicles, which will be input in the simulator to describe the vehicles’ movement.

In the simulations, the number of vehicles is varied from 600 to 800 with a step length 50 to reflect a dense vehicular environment. The vehicle velocity is varied from 30 to 60 km/h that follows a normal distribution. The number of message copies is varied from 400 to 800 with a step length 100. The packet size is set to 512 bytes. The transmission range of vehicles is set to 300 m. As to the communication protocol, IEEE 802.11p is adopted to guarantee the reliability of information transmission. The two-ray path loss model is applied in the simulation as the model can calculate both the direct path and the ground reflection path. The signal-to-noise ratio (SNR) threshold is set to 4 dBm. Considering the impact of obstacles on wireless signal in urban environment as well as vehicular mobility, we apply the Lognormal Shadowing model that is suitable for the proposed scenarios. The list of simulation parameters is shown in Table 1.

6.1. Compared Algorithms

To evaluate the performance of the proposed replication-based randomised algorithm, we compared it with several data dissemination algorithms in vehicular networks, which are described below.(i)Constrained Capacity Replication (CCR): CCR is a distributed algorithm which can assist the vehicles to select data replication strategy autonomously according to the current network capacity.(ii)DOVE: DOVE controls the number of receivers in data dissemination and transforms the problem to the processor scheduling problem by utilising road layout and traffic information.(iii)EDDA: EDDA considers the common urban and highway scenarios. It selects all the independent pairs of nodes firstly, and then takes average operations between the corresponding nodes. The average operations will run iteratively until the network converges.(iv)Our proposed randomised algorithm: the proposed randomised algorithm considers the heterogeneous properties of vehicles while controlling the number of data replicas. Each node in sending status randomly selects one of its neighbours to send a contact request while the receiving node selects a request with the largest gap to take the proportional average operation.

6.2. Performance Metrics

We evaluate the performance of these algorithms according to the following metrics.(i)Number of communication stages: it indicates the average operations for the network to be balanced, which can represent the communication overhead of data dissemination to a certain extent. Meanwhile, it can also reflect the number of data transmissions for network balance as well as the network convergence complexity.(ii)Dissemination delay: it presents the consumed time to obtain network balance, which can be utilised to measure effectiveness of the algorithms.(iii)Packet delivery ratio: it can directly indicate how many vehicles could receive the replicas, as well as reflecting the dissemination performance.(iv)Throughput: it is used to evaluate the proposed randomised algorithm when the capabilities of vehicles are considered.

6.3. Impact of Number of Vehicular Nodes

To evaluate the performance of the proposed randomised algorithm with different network densities, we vary the number of vehicles from 600 to 800 to represent increasing network size.

The experimental performance analysis for the consumed communication stages is depicted in Figure 5. The proposed randomised algorithm outperforms other compared data dissemination schemes when the number of vehicles increases from 600 to 800, which means it needs fewer average operations to achieve network balance. CCR mainly considers network capacity to determine the replication limit while DOVE utilises road layout and traffic information to reach a desired number of vehicular receivers and minimise the dissemination delay. Our proposed randomised algorithm benefits from strong connectivity of the dense vehicular scenarios such that it can obtain network convergence with fewer communication stages than other schemes. As EDDA is developed for scenarios with normal urban density and highway, it is slightly inferior to the randomised algorithm. With the increasing number of vehicular nodes, the consumed communication stages grow for all the compared algorithms.

The variation of dissemination delay of the compared algorithms when the number of participating vehicles increases is shown in Figure 6. As is shown in the figure, when the traffic densities become higher, the dissemination delay increases for all the compared algorithms as it needs more time to fulfill data dissemination. The proposed randomised algorithm takes less time to complete data dissemination and realise network consensus compared to the other three schemes. This is because it considers the different capabilities of the vehicles in data dissemination and thus constructs a better replication strategy which could reduce data dissemination delay.

Figure 7 shows the performance of packet delivery ratio of the compared algorithms when the number of nodes involved in data dissemination varies. In terms of delivery ratio, our proposed randomised algorithm presents an improvement compared with EDDA, DOVE, and CCR. Also, the delivery ratio of all the algorithms increases when the number of nodes increases from 600 to 800. More vehicles cooperatively participate in data dissemination such that the network connectivity would be enhanced. Accordingly, the successful packet transmissions will be improved by frequent vehicle communication instead of transmission failures caused by fewer forwarding vehicular nodes.

6.4. Impact of Number of Data Replicas

By varying the number of data replicas, we evaluate the impact of number of allowed data replicas on consumed communication stages and dissemination delay of the compared algorithms.

Figure 8 shows the changing trend of our proposed randomised algorithm and other compared algorithms with increasing number of data replicas. As to communication stages, the randomised algorithm performs fewer operations than EDDA profiting from the strongly connected network property of dense networks, while DOVE and CCR both need more stages for network balance. Additionally, as there are more data replicas to be disseminated in the network, it can be seen that more average operations are required to achieve network convergence. As a result, the four compared algorithms would consume increasing communication stages when more data replicas are spreading to the dissemination area.

The change of data dissemination delay under different number of data replicas is shown in Figure 9. As observed from the figure, the dissemination delay of all the compared algorithms increases when the number of replicas increases with a step length 100. The reason mainly lies in that more message replicas would take more communication stages for the network to be balanced, which leads to higher dissemination delay. Meanwhile, data dissemination would be accelerated when the randomised algorithm is applied to the scenario as the algorithm selects pairs of nodes with the largest gap and adjusts how to do average operations according to the properties of vehicles. This is why the randomised algorithm outperforms EDDA, DOVE, and CCR.

Figure 10 depicts packet delivery ratio of the four compared algorithms varying the number of data replicas. Other than taking advantage of heterogeneous network and random average operations, the proposed randomised algorithm also benefits from better network connectivity to obtain higher delivery ratio than other algorithms. EDDA controls the number of replicas in arbitrarily connected network while CCR focuses on network capacity and DOVE wants to minimise the delay; however, they all only consider the vehicles with the same capability. The growth of delivery ratio is similar to Figure 9 that with the number of replicas changing from 400 to 800, the ratio increases for all the compared solutions. The performance comparison verifies that our proposed randomised algorithm can efficiently improve the performance of data dissemination and expedite the network convergence.

6.5. Comparing Different Versions of the Proposed Randomised Algorithm

We evaluate the communication stages and throughput of the proposed randomised algorithm in dense traffic scenario, in the case of two conditions; that is, the vehicles are homogeneous and heterogeneous. The number of vehicular nodes is varied from 600 to 800 with a step length 50. The comparison results are shown in Figures 11 and 12, respectively. In Figure 11, we can see that the number of communication stages consumed when the vehicles have the same capability is larger than the one consumed when the different capabilities of vehicles are considered. Figure 12 presents the total data sent during data dissemination. In the figure, we can see that the randomised algorithm with different capabilities achieves a 10% improvement compared with the algorithm with homogeneous vehicles. The comparison shows that the consideration of heterogeneous vehicular networks could reduce the consumed communication overhead while improving the system throughput to some extent.

7. Conclusion

To enhance data dissemination in dense vehicular scenarios, we propose a network architecture, considering heterogeneous network composed of vehicles with different capabilities. By studying data replication and network consensus properties, two replication-based algorithms including a deterministic algorithm and a distributed randomised algorithm are designed. In the proposed algorithms, the vehicles take proportional average operations depending on their own capabilities. The operations will be iterated until the network converges. Mathematical analysis is derived to evaluate the complexity of network convergence. An upper bound and a lower bound of the randomised algorithm are analysed in detail. Simulation results show that the proposed randomised algorithm can reduce data dissemination delay and improve communication overhead.

8. Prospective Directions

In this study, we consider scenarios with relative ideal communication situations, which means that link interruption and other interferences are not considered. Future work should incorporate the factors that will affect data dissemination. Also, as a future prospect, we intend to enrich data dissemination mechanisms which can be adapted to complicated scenarios in IoT. Solutions that apply infrastructures such as unmanned aerial vehicles (UAVs) in cooperative networks could also be seen as a promising prospect to enhance network performance. The effectiveness of the replication-based algorithms in more IoT application scenarios needs further verification.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Disclosure

An earlier conference version of this paper [8] has been presented in 12th International Conference on WASA.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work is supported by the National Science Foundation of China (no. 61772385, no. 61373040, and no. 61572370) and autonomous electric vehicle driving ability and safety evaluation technology and system development (SQ2018YFB010236-04).