A typical relay-based cooperative wireless network [
18,
44] operates as follows: There exists a network of a finite number of source users, a finite number of relay nodes and a common destination node. The source users transmit packets to the destination node with the cooperation of the relays. If a direct transmission of a user’s packet to the destination fails, a cooperation strategy among sources and relays is employed to specify the relays that will store the blocked packet in their buffer. Relays are responsible for the transmission of the blocked packets to the destination; for example, [
45]. In a wireless network, transmission failures occur either due to packet collisions, or due to channel fading/noise and attenuation. In both cases, the packet has to be re-transmitted in a later slot. The former case occurs when more than one node transmits simultaneously [
41], while the latter one refers to the probabilistic nature of transmissions; see, for example, [
45,
49,
50].
In this work, we consider a simple relay-based cooperative wireless network with a single source, two infinite capacity relay nodes, and a common destination with collisions under a load balancing relay scheme. To keep the model in a simple form and enhance the readability of the paper, we assume that there is no direct link between source and destination, and, thus, the source transmits its packets to the destination with the aid of the relays. The employed cooperation strategy among source and relays is queue-based. In particular, we consider the join the shortest relay queue (JSRQ) policy and study its impact on the total transmission time, i.e., the time that is needed to transmit a packet from the source to the destination, and the total expected number of buffered packets in the system.
The analysis performed in this paper has connections with two distinct research branches.
Random-access networks Interacting queues are of practical relevance as they appropriately capture the
interference in random-access networks; for example, [
17,
18,
42,
43]. In addition to their practical relevance, such queueing systems are mathematically interesting as they are oftentimes formidably challenging to analyze even in their simplest forms [
21,
41]. Their challenge mainly lies in the fact that the service rate of a queue depends on the state of the other queues, which in turn leads to a so-called network of coupled queues.
When dealing with such networks of coupled queues, the first point of focus concerns the characterization of the stable throughput region, aka the stability region. For small, simple networks the stability region can be fully determined, for example, [
17,
43,
57], while for large, general networks, only bounds on these regions are known [
12,
39,
48,
54]. For a thorough overview of several techniques used for the derivation of the stability region, the interested reader is referred to [
24]. An alternative way to derive stability conditions is to use the concept of dominant systems, under which the network of interest is (stochastically) dominated by a simpler one that is easier to analyze; for example, [
46,
48] and the references therein. Another powerful tool to investigate necessary and sufficient stability conditions for
work-conserving queueing networks is the use of fluid models; see [
11,
13,
25,
28,
51,
52].
The second point of focus in the literature concerns the computation of the joint queue length distribution. This type of performance analysis has recently regained attention due to the rapid development of real-time applications that require delay-based guarantees [
26]. However, the impact of interacting queues causes severe mathematical difficulties, and there are very few studies that deal with the investigation of the queueing delay. In [
42], by performing an appropriate truncation of the infinite Markov chain, the authors approximate the steady-state probability vector, within any desired degree of accuracy, whereas in [
55] diffusion approximations were applied. In [
17,
18,
41], the probability generating function (PGF) of the joint (relay) queue length distribution was obtained in terms of the solution of a boundary value problem. In [
13,
61], fluid models were used to investigate the delay analysis of random access networks. In [
53], bounds for the queueing delay in a random access network with
\(N>2\) nodes were also derived.
Note that, in the vast majority of the above-mentioned literature, cooperation is employed as a solution to the problem of the absence of a reliable direct link between source and destination. In our work, we additionally assume queue-aware cooperation criteria, by forwarding packets to the least loaded relay with the ultimate goal that of the optimization of the overall network performance; for example, [
38]. With that in mind, we realize that the use of queue-aware load balancing schemes improves the system’s overall performance, since their mission is to effectively divide the work among the participating nodes.
Under certain conditions, the so-called JSQ policy has several strong optimality properties: The JSQ policy minimizes the overall mean delay among the class of load balancing policies that do not have any advance knowledge of the service requirements [
20,
40,
59]. However, to the best of our knowledge, there is no work in the related literature that deals with the delay analysis of a cooperative random access network (i.e., a network of interacting queues with collisions) under the JSQ policy.
JSQ policy and methods of analysis The JSQ policy has been extensively studied in systems with two queues. The JSQ policy was introduced in [
30], in the context of two parallel (exponential) servers and a Poisson stream of arrivals. The first major contributions toward the exact analysis of the JSQ policy for two queues were made in [
23,
35], using a uniformization approach that established that the PGF of the joint equilibrium distribution of the queue lengths is meromorphic (i.e., the equilibrium distribution can be written as a linear combination—finite or infinite—of product-form terms). The generating function approach was used in combination with the theory of Riemann–Carleman–Hilbert boundary value problems to provide a compact analytical method [
15,
22], but this approach does not always provide explicit expressions for the equilibrium distribution.
Although these methods are mathematically elegant and provide interesting mathematical insights, they are not so useful in the numerical computation of performance metrics. Therefore, several other numerically oriented methods were developed. In particular, the matrix geometric method (MGM) was proposed in a series of works as a possible solution; see, for example, [
27,
29,
47]. Despite its algorithmic nature, MGM has several limitations as it is computationally challenging to compute the so-called rate matrix, especially in the case of quasi-birth-death systems with both infinite phases and levels resulting in an infinite dimension rate matrix. In [
34], the authors investigate how to compute the infinite dimension rate-matrix for a class of simple (nearest neighbor) random walks, and they demonstrate their results by applying their analysis to the case of the JSQ model.
Another method used for the performance analysis of queueing systems is the power-series algorithm (PSA). The PSA is an algorithmic, numerically oriented procedure that has been successfully used for the analysis of JSQ systems, as well as to numerically obtain the performance measures of multi-dimensional queueing models, which fit in the class of quasi birth-and-death processes. The intrinsic idea behind the PSA is the transformation of the balance equations into a recursively solvable set of equations by adding one dimension to the state space. This is achieved by expressing the equilibrium distribution as a power series in some variable based on the model parameters (usually the load). This allows the calculation of the equilibrium joint queue length distribution. The PSA was first applied by Beneš [
7] and thereafter by Hooghiemstra et al. [
32]. Important generalizations were presented in [
8‐
10,
19,
36,
58].
In a different direction, the precedence relation (PR) method, see, for example, [
60], can be used to systematically construct bounds for any performance measure of the Markov chain under consideration.
The strength of the PSA and the PR methods is that they are not restricted to two queues, but apply equally well to more than two queues. A major weak point, however, is that the theoretical foundation of the PSA is still incomplete and that PR produces no exact results, but bounds.
The CA [
3‐
5] is an alternative method, that can be used to directly solve the balance equations and it leads to an explicit solution in the form of a linear combination—finite or infinite—of product-form terms. We will further elaborate on the CA in Sect.
3.
For a comparison of some of the above-mentioned approaches used for the analysis (among others) of JSQ systems, and an exposition of their strengths and limitations, the interested reader is referred to [
1].
1.2 Contribution
Application-oriented contribution In this work, we consider a slotted-time, collision-based relay-assisted cooperative communication system with the JSQ routing protocol at the relays. Relays have random access to the medium, and they are equipped with infinite capacity buffers. Such networks have not been extensively analyzed and very little is known regarding their delay performance even for small-scale networks. This is due to the strong interdependence/interaction among queues, since the successful transmission rate of a relay depends on the state of the others. In this work, we provide insights into the characterization of the delay and the performance of the system at hand. To the best of our knowledge, this is the first work that provides analytic results regarding the delay performance of the JSQ routing protocol in networks of interacting queues.
To gain more insights into the performance of the system at hand, we compare the JSRQ scheme with the Bernoulli routing scheme, as well as with the single relay system. Our numerical results reveal the superiority of the JSRQ over the Bernoulli routing scheme when the relays are more likely to remain silent, i.e., they are “slow”. At this point, regarding the Bernoulli routing protocol, we have to mention that in contrast to the continuous time setting, its stationary analysis under the slotted time setting is a challenging task (see Appendix
B) since the two queues do not behave independently.
Fundamental contribution In this work, we extend for the first time the framework of the CA to a new class of random walks. We violate one of the main assumptions of the CA, viz., that of transitions being allowed only to neighboring states. We show, using the system at hand as a vehicle for illustration, that the CA can be extended to random walks in the quadrant that obey the following conditions:
Homogeneity:The same transitions occur according to the same rates for all interior points, and similarly for all points on the horizontal boundary, and for all points on the vertical boundary.
Forbidden steps:No transitions from interior states to the North, North–East, and East.
Bounded transitions:Only transitions to a bounded region.
Later in our analysis we refer to the above conditions as
H,
F and
B. Note that the CA was developed to satisfy a more restrictive setting than
bounded transitions, by allowing only transition to the nearest neighbors; see, for example, [
4‐
6]. We also mention the work in [
3] that deals with a random walk with a similar but simpler structure.
The framework extension achieved in this paper sets the first stepping stones toward a full extension of the CA to random walks with large steps (extending the results obtained in [
2]) and to queueing systems with (bounded) batch arrivals and/or (bounded) batch departures.
We demonstrate how to obtain the equilibrium joint queue length distribution through the CA and provide a detailed implementation algorithm. Moreover, we numerically validate our theoretical findings by employing the PSA and present an extensive numerical comparison discussing which method performs better in terms of accuracy and time complexity (computational time). Other validation methods are also discussed.