Skip to main content
Top
Published in: EURASIP Journal on Wireless Communications and Networking 1/2009

Open Access 01-12-2009 | Research Article

QoS Differentiated and Fair Packet Scheduling in Broadband Wireless Access Networks

Authors: Rong Yu, Yan Zhang, Shengli Xie

Published in: EURASIP Journal on Wireless Communications and Networking | Issue 1/2009

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper studies the packet scheduling problem in Broadband Wireless Access (BWA) networks. The key difficulties of the BWA scheduling problem lie in the high variability of wireless channel capacity and the unknown model of packet arrival process. It is difficult for traditional heuristic scheduling algorithms to handle the situation and guarantee satisfying performance in BWA networks. In this paper, we introduce learning-based approach for a better solution. Specifically, we formulate the packet scheduling problem as an average cost Semi-Markov Decision Process (SMDP). Then, we solve the SMDP by using reinforcement learning. A feature-based linear approximation and the Temporal-Difference learning technique are employed to produce a near optimal solution of the corresponding SMDP problem. The proposed algorithm, called Reinforcement Learning Scheduling (RLS), has in-built capability of self-training. It is able to adaptively and timely regulate its scheduling policy according to the instantaneous network conditions. Simulation results indicate that RLS outperforms two classical scheduling algorithms and simultaneously considers: (i) effective QoS differentiation, (ii) high bandwidth utilization, and (iii) both short-term and long-term fairness.

1. Introduction

With the rapid growth of demands on wireless multimedia applications and wireless data services, it is expected that broadband wireless access (BWA) networks should be able to support high-speed, high-quality, realtime, and nonrealtime applications, such as video, image, voice, text, and data. Quality of Service (QoS) provisioning for heterogeneous traffic with diverse properties is a necessity in emerging BWA networks. Packet scheduling algorithms, which are in charge of the bandwidth allocation and multiplexing at the packet level, play key roles for resource sharing and QoS provisioning in BWA networks.
Different from traditional wired scheduling problems, packet scheduling in BWA networks will encounter two major difficulties: the high variability of wireless channel conditions and the unknown model of packet arrival process. The capacity of wireless channel usually suffers either moderate or drastic fluctuations due to the signal attenuation, inter-user interference, fading and user mobility, and so forth. As a result, the channel conditions of users may vary with time and location asynchronously at random, leading to the so-called time-dependent error and location-dependent error [1]. In BWA networks, the process of packet arrivals is generally complicated and hard to accurately model in advance, since the packet arrivals are induced by the aggregation of multiple classes of traffic with different properties, such as constant and variable rates, and continuous and burst modes.
Besides the difficulties, packet scheduling algorithms for wireless multimedia networks are required to concurrently achieve three performance objectives: QoS differentiation and guarantee, wireless bandwidth utilization maximization, and fairness provisioning. Firstly, the transmission of heterogeneous classes of traffic with diverse QoS requirements naturally demands the scheduler to provide differential services. Secondly, the scheduler should be able to arrange the packet transmissions reasonably so that the precious wireless bandwidth can be utilized in an efficient way. Thirdly, the scheduler should also make sure that every traffic flow gets its fair share of the system resources. These three objectives, in most practical situations, are tightly coupled and complementary with each other. A good scheduling algorithm should provide satisfying performance tradeoff among them.
These special difficulties and performance objectives make the scheduling problem in BWA networks considerably challenging. The packet scheduler of a BWA network should operate across different sessions (i.e., connections or traffic flows) to guarantee that all sessions receive services with qualities according to the contract established when sessions are setup. At the beginning of each round of scheduling, the scheduler should decide one of the sessions for transmission. In order to make the right decision at each round of scheduling and hence accumulatively achieve the optimal long-term performance, the scheduler should be capable to predict the possible rewards (or costs) induced by different scheduling actions. However, the high variability of wireless channel conditions and the unknown characteristics of traffic flows make this long-term prediction difficult. Meanwhile, the multiple performance objectives introduce a set of interactional factors that should be considered together for prediction.
In this paper, we observe that the problem of packet scheduling in BWA networks could be described as a semi-Markov Decision Process (SMDP). There are two major advantages to solve the scheduling problem by the methodology of SMDP. Firstly, by modelling the scheduling problem as an SMDP, the entire scheduling process is divided into a series of stages. Decisions are made stage by stage. In other words, the original complicated problem is divided into multiple relatively simple subproblems, which are much more tractable than the original one. Secondly, there exist various of advanced mathematical techniques to solve the formulated SMDP problems, especially the learning-based approach. With the aid of learning-based approach, the scheduler has underlying ability to capture the main characteristics of the system and accurately predict the long-term rewards (or costs), even though there is no full knowledge of the system model.
The remainder of this paper is organized as follows. A brief introduction of existing packet scheduling algorithms is given in Section 2. The SMDP formulation is presented in Section 3. After introducing the system model, we cast the packet scheduling problem into the framework of SMDP. The construction of cost function is explained in details. Section 4 discusses the solution of the formulated SMDP problem. The methodology of reinforcement learning [2] is employed. A feature-based linear approximating architecture is adopted to approach the optimal average cost. The Temporal Difference (TD) technique [2] is used to continuously regulate the tunable parameter variables. Section 5 reports the results of simulation experiments, where the proposed algorithm is compared with two existing algorithms. Conclusion is presented in Section 6.
In literature, the problem of packet scheduling is extensively studied in both wired and wireless environments. We broadly categorize the algorithms into three classes and present a brief introduction of them. For comprehensive overview of existing scheduling algorithms, readers are referred to [3].

2.1. Algorithms in Wired Environment

Algorithms of First-Come-First-Served (FCFS) and Round Robin (RR) are first developed for wired networks. Their original versions and improved versions (e.g., Weighted Round Robin and Deficit Round Robin [4]) are underlying schemes for wireless networks. Another well-known algorithm is Generalized Processor Sharing (GPS) [5], which is ideal and unimplementable in packet-based networks. Many other scheduling algorithms try to approximate GPS as far as possible. The merit of these wired scheduling algorithms is their simplicity and ease of implementation. But if directly used in BWA networks, these algorithms cannot achieve high bandwidth utilization and promise fairness guarantee.

2.2. Algorithms in Wireless Environment with GE-Model

The classical two-state Gilbert-Eilliot (GE) model [6, 7] is firstly used to model the wireless link variation. The channel is simply described to be either in "good" or in "bad" state in this model. A survey of this class of scheduling algorithms can be found in [1]. Work in [8] devised Idealized Weight Fair Queuing (IWFQ) algorithm. In [9], Channel-condition Independent Fair Queueing (CIF-Q) algorithm was put forward. Both algorithms of IWFQ and CIF-Q need to simulate a virtual error-free fair queueing system and try to schedule packets in the same order as the ideal reference system does. IWFQ and CIF-Q differ in the way they compensate the lagging flows. Wong et al. [10] presented Token Bank Fair Queueing (TBFQ) algorithm, which uses token pools and token bank to keep track of the service status of each flow and dynamically regulate the priorities of the flows to occupy the channel resource. All these three algorithms achieve good performance tradeoff among the three performance issues aforementioned. But they only work under the GE channel model, which is too coarse to characterize the practical variability of the wireless channel conditions.

2.3. Algorithms in Wireless Environment with Multistate Model

Some researchers considered variable-rate transmission in scheduling algorithms [1115]. The Modified Largest Weighted Delay First (M-LWDF) algorithm [11] schedules packets according to a simple but effective constructed metric. The scheme was proved analytically to be throughput optimal. But it does not explicitly guarantee any fairness. The algorithm of Jalali et al. [12] was proposed to satisfy the so-called proportional fairness; the algorithm of Liu et al. [13] tried to maximize the long-term average total throughput for a given time fraction; the algorithm of Borst and Whiting [14] was made to be Pareto-optimal. The algorithm of Park et al. [15] selects user for transmission based on the cumulative distribution function (CDF) of the user rates.
Besides the above three classes of scheduling algorithms, researchers also consider standard-compatible schemes recently. In [16], a scheduling algorithm which is practical and compatible to the IEEE 802.16 standard is proposed. It provides QoS support in terms of bandwidth and delay bounds for all types of traffic classes as defined by the standard.
By summarizing the above-mentioned scheduling algorithms, we notice that most of the algorithms are designed based on heuristic methods. Either introducing the virtual/ideal service models as reference systems [8, 9] or defining some key metrics as criteria [1114], many existing algorithms are built on the observation, judgement, intuition, and experience of the designers. Although heuristic algorithms have the merit of simplicity, their underlying drawbacks are twofold. Firstly, heuristic algorithms are generally proposed with respect to some given special outside conditions, which means that they are less of adaptivity. When the practical conditions do not match the provided ones, it is hard for heuristic algorithms to maintain their scheduling performance. Secondly, designing a heuristic scheduling algorithm is easy when the problem is relatively simple but could be rather tough when the scheduling environments and objectives become complicated. For packet scheduling problem in BWA networks, the scheduler is required to consider three main objectives simultaneously and, meanwhile, combat highly variable channels and unknown traffic models. Such a complicated scheduling problem is far out of the ability of a heuristic algorithm to handle.
To overcome the disadvantages of heuristic algorithms, we propose in this paper a learning-based algorithm for the packet scheduling problem in BWA networks. The proposed algorithm has in-built capacity of studying from outside environment and adapting system parameters according to the performance requirements. This attractive feature makes the proposed algorithm appropriate for the scheduling problem in BWA networks. More specifically, when this learning-based algorithm is applied in BWA networks, we only need to translate the scheduling objectives to the system reward (or cost) and then connect the scheduling problem with the ingredients of the solution framework. Little attention should be paid on the details of the entire scheduling process such as how to dynamically adapt the scheduler for the optimal performance. Because the scheduler has potential ability of self-training, it could automatically tuning itself towards an optimal system reward (or cost) according to the actual environment. Actually, the works in [13, 14] have realized that fixed and unadaptable scheduling schemes are incapable in complicated scheduling problem. Undetermined constants are integrated into the key metrics at the system initialization phase so that the scheduler could adjust its strategy at some extent. Nonetheless, the learning abilities of these algorithms are very limited, and they should still be looked on as heuristic algorithms.

3. Semi-Markov Decision Process Formulation

In this section, we cast the packet scheduling problem into the framework of Semi-Markov Decision Process (SMDP). As the base of our discussion, the wireless network model and channel model are firstly introduced. After that, we connect the packet scheduling problem with the ingredients of SMDP. The construction of cost function is comprehensively discussed.

3.1. System Model

3.1.1. Wireless Network Model

We consider a general shared-channel infrastructure-based BWA network, in which mobile hosts are served by a base station. The communication between a mobile host and the base station may contain more than one session (or traffic flow). Centralized scheduling of packet transmission is performed at the base station. A general scheduler model is described in Figure 1, where the usage of channel condition monitoring mechanism like LSM [17] is assumed. The method to implement LSM in actual systems can be found in [14]. In presence of LSM, the base station is supposed to have full knowledge of the channel conditions of all sessions.

3.1.2. Wireless Channel Model

The conditions of wireless links between the base station and each of the mobile hosts are independent of each other. We use the multistate Markov channel model (i.e., FSMC in [18, 19]) to describe the variation of the wireless channel condition. As shown in Figure 2, state transition only occurs between adjacent states. Since FSMC is derived by partitioning the received signal-to-noise ratio (SNR) into multiple intervals, there is a maximum feasible rate in each state with respect to the actual requirement of average Bit-Error Rate (BER).

3.2. SMDP Ingredients

In a general shared-channel wireless network, the scheduler should make scheduling decisions based on the knowledge of the system state, including the queueing delay of all packets, the channel conditions of all sessions, and the practical service rates of all sessions. Suppose that there are https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq1_HTML.gif sessions in the system, and each session has a queueing buffer that can store https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq2_HTML.gif packets at the most. (Without loss of generalization, we assume that all queues can store same number of packets. A more practical assumption may be that the buffer length of the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq3_HTML.gif th session is https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq4_HTML.gif in packets.) At time https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq5_HTML.gif , the queueing delay of the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq6_HTML.gif th packet in the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq7_HTML.gif th session is denoted by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq8_HTML.gif . For the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq9_HTML.gif th session, let https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq10_HTML.gif denote the maximum available channel rate at time https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq11_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq12_HTML.gif the practical average service rate at time https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq13_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq14_HTML.gif the nominal service rate (negotiated when the session is setup). We now connect the packet scheduling problem with the main ingredients of the Semi-Markov Decision Process (SMDP) as follows.

3.2.1. Stage

The scheduling process is naturally divided into a series of stages by events https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq15_HTML.gif , which represent packet arrivals and departures (we say that a packet "departs" if its service completes). Stages are indexed by integers https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq16_HTML.gif Let https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq17_HTML.gif denote the time that event https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq18_HTML.gif happens. The time duration of the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq19_HTML.gif th stage is represented by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq20_HTML.gif .

3.2.2. State

A state https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq21_HTML.gif consists of the queueing status of the whole scheduling system, including the current queueing delay of all packets, the wireless link conditions of all sessions, and the actual service rates of all sessions. In particular, the state at the beginning of stage https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq22_HTML.gif is defined by (for ease of presentation, we also call https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq23_HTML.gif the state of stage https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq24_HTML.gif in this paper)
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ1_HTML.gif
(1)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq25_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq26_HTML.gif , and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq27_HTML.gif . For the computation of the actual service rate https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq28_HTML.gif , we improve the approach provided in [12]. We define a time window https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq29_HTML.gif which is sufficiently long. If session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq30_HTML.gif is assigned for transmission, the actual service rate of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq31_HTML.gif at the end of the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq32_HTML.gif th stage is updated by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq33_HTML.gif ; otherwise, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq34_HTML.gif . The set of all possible states is called state space, denoted by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq35_HTML.gif , which is a multiple dimension space with huge size.

3.2.3. Decision

At stage https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq36_HTML.gif , with the occurrence of event https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq37_HTML.gif , the scheduler should make a decision https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq38_HTML.gif according to the state https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq39_HTML.gif . The decision is defined by the session number of the next transmission, that is, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq40_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq41_HTML.gif . The decision space is denoted by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq42_HTML.gif . At stage https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq43_HTML.gif , if event https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq44_HTML.gif is packet departure, since the wireless channel is idle, the decision subset https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq45_HTML.gif contains all sessions that can transmit. (A session that "can transmit" is a session with nonempty queue and nonzero channel rate.) If event https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq46_HTML.gif is packet arrival, in the case that the channel is busy (being used for transmission), https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq47_HTML.gif is a singleton; in the case that the channel is idle (as no session can transmit, e.g., an empty system), the decision subset https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq48_HTML.gif contains all sessions that can transmit.

3.2.4. Policy

Policy https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq49_HTML.gif is a sequence of decision functions, where decision function https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq50_HTML.gif is a mapping from state space https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq51_HTML.gif to the decision space https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq52_HTML.gif . A policy is said to be stationary if for all https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq53_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq54_HTML.gif , which means that the decision function does not change with stages. The space of stationary policy is denoted by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq55_HTML.gif . We only consider stationary policies as candidate policies in this paper, and without confusion, we simply use https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq56_HTML.gif as the denotation of stationary policy. Note that, under policy https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq57_HTML.gif , the decision at stage https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq58_HTML.gif is represented by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq59_HTML.gif .

3.2.5. Cost Function

As mentioned above, the definition of system cost of scheduling problem in BWA networks should take into account three objectives: (i) wireless bandwidth utilization maximization, (ii) QoS differentiation and guarantee, and (iii) fairness provisioning. Since cost function has major influence to the scheduler's performance, we place the construction of cost function in a single subsection which will be presented shortly. In the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq60_HTML.gif th stage, the cost with state https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq61_HTML.gif and decision https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq62_HTML.gif is represented by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq63_HTML.gif . Under policy https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq64_HTML.gif , the average cost of the system is provided by
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ2_HTML.gif
(2)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq65_HTML.gif is the total number of stages. The target of the SMDP problem is to find the optimal policy which leads to the minimum average cost. A policy https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq66_HTML.gif is said to be optimal if
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ3_HTML.gif
(3)
for arbitrary other policy https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq67_HTML.gif . Usually, the average cost under the optimal policy is called optimal average cost, denoted by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq68_HTML.gif .

3.2.6. Bellman Equation

Generally, the optimal policy https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq69_HTML.gif of the SMDP problem could be obtained by solving the Bellman optimality equation. In the average cost SMDP problem, Bellman optimality equation takes the following form [20]:
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ4_HTML.gif
(4)
Here, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq70_HTML.gif stands for expectation; https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq71_HTML.gif represents the duration of current stage; https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq72_HTML.gif denotes the decision set at current time; https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq73_HTML.gif is the state at next stage. The value https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq74_HTML.gif is the optimal average cost, and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq75_HTML.gif is called the optimal differential cost, which has the following interpretation: when we operate the system under an optimal policy, then https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq76_HTML.gif equals to the expectation of the difference of the total cost (over the infinite horizon) for a system with initial state https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq77_HTML.gif , compared with a system with initial state https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq78_HTML.gif . State https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq79_HTML.gif is a recurrent state with zero differential cost (e.g., the state in which all queues are empty).
Once the optimal differential cost function https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq80_HTML.gif is available, the optimal scheduling policy can be obtained by
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ5_HTML.gif
(5)
As shown in (5), the optimal policy is composed by a series of optimal decisions at all stages, which maximizes the value of the righthand side of (5).
The desire for the exact solution of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq81_HTML.gif in (4) is unrealistic because of the huge size of state space and the overwhelming computational requirement. Hence, we employ the methodology of Reinforcement Learning (RL) [2] to produce a near optimal solution. The central idea is to approach https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq82_HTML.gif by a parameterized approximating function https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq83_HTML.gif . Online learning is carried out to continuously regulate the parameter vector https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq84_HTML.gif and make https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq85_HTML.gif more and more close to https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq86_HTML.gif . The details of the RL solution are presented in the next section. Before that, we explain the construction of cost function in a separated subsection.

3.3. Construction of Cost Function

The definition of cost function directly determines the performance of the scheduling algorithm. In our problem, we restrict the cost function of interest to have the following important properties.
(i)
Simplicity. A simple form of cost function can significantly reduce the computation complexity of the whole scheduling algorithm.
 
(ii)
Effectiveness. Since the wireless packet scheduler is expected to concurrently address performance issues of QoS, bandwidth utility and fairness, the definition of cost function should take into account all these three aspects.
 
(iii)
Scalability. In different practical services and applications, the requirements on every single performance objective may be quite different. Therefore, the structure of the cost function may have some tunable coefficients which allow the network supervisor to regulate according to actual situations.
 
Our discussion of the cost function starts from a draining problem, where given a number of packets in the system, and assuming no more packet arrivals, the scheduler should try to arrange the order of packet transmissions so that the total cost is minimized. Draining problem is indeed a simplified version of the original scheduling problem. By assuming that there is no new packet arrival, draining problem is much more analyzable than the original scheduling problem. Here, we consider a draining problem with the following setup. There are https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq87_HTML.gif sessions in the system. The initial queue length of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq88_HTML.gif is https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq89_HTML.gif in packets, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq90_HTML.gif . At the beginning, the queueing delay of all the packets is supposed to be zero. We separately define three different forms of cost function below. Each definition actually involves one performance objective.

3.3.1. Bandwidth Utilization

The first definition of cost function relates to the performance issue of bandwidth utilization. At the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq91_HTML.gif th stage, let
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ6_HTML.gif
(6)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq92_HTML.gif is the head-of-line (HOL) delay of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq93_HTML.gif at the beginning of the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq94_HTML.gif th stage; https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq95_HTML.gif is the indicator function, satisfying that https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq96_HTML.gif when https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq97_HTML.gif is true, and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq98_HTML.gif when https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq99_HTML.gif is false. By this definition, the total cost is equal to sum of the delay of all packets. To minimize the total cost, the scheduler will always assign the session with the highest channel rate to transmit. Obviously, this scheduling policy will result in bandwidth utilization maximization.

3.3.2. QoS Differentiation

The second definition of cost function involves the performance issue of QoS differentiation. At the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq100_HTML.gif th stage, let
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ7_HTML.gif
(7)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq101_HTML.gif is the weight with session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq102_HTML.gif . The existence of weights makes the scheduling rule different from the one under definition (6). To be concrete, consider two sessions https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq103_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq104_HTML.gif with weights https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq105_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq106_HTML.gif , respectively, and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq107_HTML.gif . Suppose at a certain stage that https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq108_HTML.gif , and both are the highest channel rates of all sessions. The scheduler will choose to assign https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq109_HTML.gif for transmission first, because for a same period of waiting time, the cost of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq110_HTML.gif is larger than https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq111_HTML.gif . We can see that different weights give the session different priority levels. Following the definition of cost in (7), the scheduler can achieve QoS differentiation.

3.3.3. Fairness

The performance issue of fairness is integrated into the third definition of cost function. At the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq112_HTML.gif th stage, let
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ8_HTML.gif
(8)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq113_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq114_HTML.gif are, respectively, the nominal and actual average service rate of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq115_HTML.gif . Under this definition, the scheduler is insensitive to the bandwidth utilization or QoS differentiation. What concerns the scheduler is whether every traffic gets its ordered service. The following theorem explains how the scheduler provides fair services for multiple traffic.
Theorem 1.
Considering a scheduling system with cost function of (8) and working under the Fluid limit model [21], if all sessions have the same channel rate https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq116_HTML.gif , the proportion of service that session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq117_HTML.gif receives from the scheduler is given by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq118_HTML.gif , where the sum of service proportion is https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq119_HTML.gif .
Proof.
See the appendix.
Theorem 1 indicates that under the definition of (8), each traffic flow will get its fair share of service being proportional to the square root of its nominal service rate. A more general form of cost function that relates to fairness issue is provided by extending the definition of (8) as follows:
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ9_HTML.gif
(9)
It is easy to prove that the service proportion of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq120_HTML.gif under the definition of (9) has the form as https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq121_HTML.gif .
Although none of the above three cost functions can simultaneously relate to all the three performance issues, they lead us to construct an appropriate one. Actually, by simply combining the definitions given by (6), (7), and (8), we obtain the cost function as follows:
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ10_HTML.gif
(10)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq122_HTML.gif 's are constants called the priority factors, and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq123_HTML.gif 's are called the fairness factors, defined by
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ11_HTML.gif
(11)
Here https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq124_HTML.gif is a constant associated with session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq125_HTML.gif that relates to fairness. The specific values of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq126_HTML.gif 's and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq127_HTML.gif 's are left to be determined according to the practical situations by the network supervisor. Regarding the form of the cost function in (10), it is worth to mention that a cost function of similar form is introduced in [11]. The authors assembled the factors of HOL delay, channel rate, and constant weight in a multiplication manner to produce an effective cost function. It is easy to see that the form of cost function definition by (10) has three important properties as mentioned at the beginning of this subsection. Firstly, it is simple and easy to compute in each stage. Secondly, as a combination of (6), (7), and (8), it effectively takes into account the three performance issues: bandwidth utilization, QoS guarantee, and fairness. Thirdly, the scalable coefficients https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq128_HTML.gif 's and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq129_HTML.gif 's provide flexibility for the network supervisor to regulate the scheduler according to the practical requirements.

4. Reinforcement Learning Solution

4.1. Function Approximation Architecture

A common approximation architecture of RL is shown in Figure 3. The architecture consists of two components: the feature extractor and the function approximator. The feature extractor is used to produce a feature vector https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq130_HTML.gif , which is capable of capturing the most important aspects of state https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq131_HTML.gif . Features are usually handcraft, based on human intelligence or experience. Our choice of the feature vector will be presented shortly.
The function approximation architectures could be broadly categorized into two classes: the nonlinear and the linear structures. One typical nonlinear architecture is the multilayer perceptron or feedforward neural network with a single hidden layer (see, e.g., [22]). Under this architecture, the output approximating function https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq132_HTML.gif is the nonlinear combination of input feature vector https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq133_HTML.gif with internal tunable weight vector https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq134_HTML.gif . This architecture is powerful because it can approximate arbitrary functions of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq135_HTML.gif . The drawback, however, is that the dependence on https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq136_HTML.gif is nonlinear, and tuning can be time-consuming and unreliable.
Alternatively, a linear feature-based approximation architecture [22] is much simpler and easier to implement, in which the output approximating function https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq137_HTML.gif is the linear combination of the input feature https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq138_HTML.gif , given by
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ12_HTML.gif
(12)
where the superscript https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq139_HTML.gif represents transpose. In this paper, we choose linear approximation architecture in the RL framework.
Since the linear combination of feature vector should approximate the optimal differential average cost, the definition of feature vector explicitly relates to the definition of cost function. As (10) has shown, the cost at each stage is determined by the weighted queueing delay of the packet being assigned to transmit. Thus, the components of the feature vector should be defined with respect to all packets in the system, each component corresponding to one packet. Let https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq140_HTML.gif denote the packet size of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq141_HTML.gif . At stage https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq142_HTML.gif , the remaining waiting time for the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq143_HTML.gif th packet in session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq144_HTML.gif could be estimated by https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq145_HTML.gif , where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq146_HTML.gif is the current actual service rate of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq147_HTML.gif . Thus, the feature component with respect to the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq148_HTML.gif th packet of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq149_HTML.gif is defined by
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ13_HTML.gif
(13)
Here, the item https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq150_HTML.gif could be viewed as the estimation of the queueing delay for the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq151_HTML.gif th packet in session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq152_HTML.gif . Note that if https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq153_HTML.gif provides an accurate prediction of the service rate of session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq154_HTML.gif , (13) coincides well with (10) and the loss of optimality of the linear architecture could be sufficiently small. Extending (12) yields
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ14_HTML.gif
(14)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq155_HTML.gif is the number of packet in session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq156_HTML.gif ; https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq157_HTML.gif is the parameter variable associated with feature https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq158_HTML.gif . The total number of tunable parameters in https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq159_HTML.gif is equal to https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq160_HTML.gif .

4.2. Online Parameter Tuning

After establishing the approximating architecture, we employ the technique of Temporal-Difference (TD) learning [2] to perform online parameter tuning. The algorithm starts with an arbitrary initial parameter vector https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq161_HTML.gif and improves the approximation as more and more state transitions are observed. At each stage, a greedy decision is made based on current https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq162_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq163_HTML.gif . Specifically, the decision should be made by
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ15_HTML.gif
(15)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq164_HTML.gif is the estimation of state at the next stage. The computation of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq165_HTML.gif is trivial if we simply assume that no packet arrives during the current stage. More concretely, let https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq166_HTML.gif denote the number of packets in session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq167_HTML.gif . At the beginning of a stage, given the state
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ16_HTML.gif
(16)
if no more packet arrives before the current transmission completes, then the state at the end of current stage (i.e., the beginning of next stage) is estimated by
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ17_HTML.gif
(17)
where for https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq168_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq169_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq170_HTML.gif ; for https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq171_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq172_HTML.gif , https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq173_HTML.gif .
In the process of parameter tuning, the update of parameter vector https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq174_HTML.gif and scalar https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq175_HTML.gif is carried out stage by stage, according to
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ18_HTML.gif
(18)
where https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq176_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq177_HTML.gif are small step size parameters, and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq178_HTML.gif is called temporal difference. Generally, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq179_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq180_HTML.gif should satisfy the following conditions:
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ19_HTML.gif
(19)
In (18), the temporal difference https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq181_HTML.gif is found as
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ20_HTML.gif
(20)
Note that we use the linear approximation structure as (12), which leads to https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq182_HTML.gif in (18). We can see the benefit of using the linear feature-based approximation architecture which significantly reduced the complexity of gradient computation in (18). Figure 4 presents the flow chart of the proposed Reinforcement Learning-based Scheduling (RLS) algorithm, which summarizes the main operations of the algorithm.

5. Performance Evaluation

In this section, we evaluate the performance of RLS by network simulation. Two simulation experiments are carried out. The first experiment is to investigate the convergence performance of the parameter tuning procedure. The second one is to compare the performance of RLS with two existing packet scheduling algorithms: CSDP-WRR [17] and CIF-Q [9].

5.1. Simulation Setup

We consider three classes of traffic for typical BWA networks, including audio, video, and data traffic flows. As listed in Table 1, there are five sessions. All traffic flows are generated based on the models presented in [23]. Note that the traffic models are especially proposed for one of the BWA standards IEEE 802.16. In Table 1, the link conditions of audio, video, and data-1 are supposed to be stably full-rate, while link conditions of data-2 and data-3 are assumed to suffer drastic variation. More specifically, link variability of data-2 and data-3 sessions is characterized by a six-state Markov channel model [18, 19]. Transmission rates of link states https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq183_HTML.gif are, respectively, 100, 80, 60, 40, 20, 0 percents of the full channel rate (2 Mbps). The steady-state probability vector for error pattern-1 is https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq184_HTML.gif , while the one for pattern-2 is https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq185_HTML.gif . The channel state transition probabilities of the data sessions are listed in Table 2.
Table 1
Properties of the 5 sessions in the simulation.
Session number ( https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq186_HTML.gif )
1
2
3
4
5
Traffic type
Audio
Video
Data-1
Data-2
Data-3
Source model
IDP
2IRP
4IPP
4IPP
4IPP
Packet size
66 B
188 B
192 B
192 B
192 B
Mean rate
22.4 kbps
0.19 Mbps
0.8 Mbps
0.8 Mbps
0.8 Mbps
Peak rate
64 kbps
0.4 Mbps
1.8 Mbps
1.8 Mbps
1.8 Mbps
Link variation
None
None
None
Pattern-1
Pattern-2
Table 2
Channel state transition probabilities and steady-state probabilities https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq187_HTML.gif .
Parameter
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq188_HTML.gif
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq189_HTML.gif
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq190_HTML.gif
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq191_HTML.gif
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq192_HTML.gif
Error-free
1
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq193_HTML.gif
Error pattern-1
0.92
0.40
0.40
0.50
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq194_HTML.gif
Error pattern-2
0.70
0.30
0.30
0.70
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq195_HTML.gif
Generally, audio and video traffic has higher priority than data traffic, due to the requirement of realtime transmission. Thus, we set the priority factors of the audio and video traffic flows as https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq196_HTML.gif , that is, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq197_HTML.gif , and the priority factors of data traffic flows as https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq198_HTML.gif , that is, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq199_HTML.gif . In the aspect of providing fair service, data traffic is usually more sensitive than audio and video traffic. This is because data traffic flows, such as FTP and e-mail, are generally served in the mode of best effort (BE). There is no guarantee how much service the system has to provide for these traffic flows. Without consideration of fair service, multiple data traffic flows in a same system may receive fairly different amounts of service, due to the different wireless channel conditions. Hence, we set larger fairness factors for the data traffic flows. The fairness factors of audio and video traffic are set as https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq200_HTML.gif , that is, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq201_HTML.gif , and the fairness factors of data traffic are set as https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq202_HTML.gif , that is, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq203_HTML.gif .

5.2. Convergence of Parameter Tuning

The first experiment investigates the convergence of RLS in the procedure of parameter tuning. Since the optimal average cost is the most important metric of the system performance, the variation of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq204_HTML.gif reflects the convergence of the learning algorithm. We run the simulation and record the value of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq205_HTML.gif . The step size of the learning algorithm should be set before the simulation. As mentioned in Section 4, the setting of step size parameters https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq206_HTML.gif and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq207_HTML.gif should satisfy the conditions of (19). One efficient way to determine these parameters is to set the value of step size roughly inversely proportional to the stage number (or running time) [22]. Let https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq208_HTML.gif represent the step size with respect to feature component https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq209_HTML.gif at stage https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq210_HTML.gif . We set
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ21_HTML.gif
(21)
and the step size with the approximate optimal average cost https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq211_HTML.gif at stage https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq212_HTML.gif :
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ22_HTML.gif
(22)
After running for 800 seconds, we record the variation of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq213_HTML.gif and plot as curve in Figure 5.
It is observed that RLS converges very fast in the procedure of parameter tuning. The approximate optimal average cost https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq215_HTML.gif has reached a relatively stable value after 100 seconds. Although https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq216_HTML.gif keeps varying as time goes by, the fluctuation is still in an acceptable range. It is clear that the learning algorithm has converged. For comparison, we also choose another group of step size parameters. We set
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ23_HTML.gif
(23)
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ24_HTML.gif
(24)
where we can see that the step size parameters https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq217_HTML.gif 's in (23) are https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq218_HTML.gif of the ones in (21), and parameters https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq219_HTML.gif 's in (24) are https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq220_HTML.gif of the ones in (22). The second group of step size parameters is much smaller than the first group. The resulted curve of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq221_HTML.gif under the second group of step size parameters is plotted in Figure 6.
We observe in Figure 6 that RLS converges at a smaller speed than in the situation of Figure 5. It takes about 300 seconds for the scheduler to regulate the parameters and reach a stable approximate optimal average cost https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq223_HTML.gif . But the variation of the value of https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq224_HTML.gif in Figure 6 is much more moderate than that in Figure 5. These results indicate that the smaller the step size parameters, the slower the convergence speed, but the stabler the system performance. The selection of step size parameters https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq225_HTML.gif 's and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq226_HTML.gif 's depends on the actual network conditions. The scheduler should not only provide a sufficiently stable performance but also be fast reactive to follow the variability of outside environment.

5.3. Scheduling Performance

In the second experiment, we examine the performance of RLS in the aspects of bandwidth utilization, QoS guarantee, and fairness. We compare RLS with two existing scheduling algorithms CSDP-WRR [17] and CIF-Q [9]. CSDP-WRR is a combination of CSDP mechanism and Weighted Round Robin (WRR) scheduling algorithm, which has been demonstrated to be simple, robust, and effective [17]. The major feature of CIF-Q lies in the fairness it provides. CIF-Q integrates the well-known CIF-property, which takes into account both long-term and short-term fairness [9]. The simulation lasts for 1200 seconds. During the simulation time, channel errors occur only in the first 400 seconds. The remaining error-free time is to demonstrate the long-term fairness property of RLS. In the simulation, we keep track the delay, throughput, and service amount of the traffic flows. The results are reported in Figures 7, 8, and 9.
We know from Figures 7(a) and 7(b) that both the average and maximum delays of audio and video sessions in RLS are clearly smaller than in either of the other two algorithms. In other words, RLS is able to provide better QoS guarantee for realtime traffic flows. It should be noticed that the satisfying QoS guarantee for audio and video sessions in RLS is not obtained by sacrificing the throughput of the other sessions. This is validated by Figure 8.
Figure 8 consists of three subfigures. Figures 8(a) and 8(b), respectively, report the throughput of data traffic flows in the first 400 seconds and the entire 1200 seconds. Figure 8(c) reports the total throughput of all the three data traffic flows. In Figure 8(a), compared to the other two algorithms, RLS significantly improves the throughput of data-1 and data-2, except that the throughput of data-3 in CIF-Q is a little more than that in RLS. But we know from Figure 8(c) that the total throughput of the three data traffic flows in RLS is much more than that in the other two algorithms. More specifically, in the first 400 seconds, the total throughput of data traffic is 1.62 Mbps, which is https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq227_HTML.gif higher than that in CSDP-WRR (1.19 Mbps) and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq228_HTML.gif higher than that in CIF-Q (1.22 Mbps). In the entire 1200 seconds of simulation, the overall throughput of data traffic in RLS is 1.77 Mbps, which is https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq229_HTML.gif higher than that in CSDP-WRR (1.57 Mbps) and https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq230_HTML.gif higher than that in CIF-Q (1.63 Mbps). Results of Figure 8 indicate that RLS has a considerable improvement in bandwidth utilization.
To demonstrate the short-term and long-term fairness properties of RLS, we record the service received by the data sessions over the whole simulation time in Figure 9. It is observed that the curves diverge moderately in the error-prone period and then converge gradually in the error-free period. We give interpretation as follows. To promise short-term fairness, RLS does not force the leading sessions to give up all their leads but just makes them degrade gracefully. So data-1 session receives relatively more service in the first 400 seconds. However, when the system becomes error-free, the lagging sessions will gradually get back their lags, and finally all data sessions obtain same throughput. This observation validates that RLS offers both short-term and long-term fairness guarantee in the sense that it provides short-term fairness for error-free sessions and long-term fairness for error-prone sessions. This exactly coincides with the spirit of CIF properties introduced in [9].
Finally, we plot in Figure 10 the curve of approximate optimal average cost https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq231_HTML.gif in the second experiment. Note that we use the group of small step size parameters defined by (23) and (24). As we set in the simulation, the network conditions are different in the first 400 seconds and the last 800 seconds (the channel models of data-2 and data-3 change at the https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq232_HTML.gif th second). It is observed in Figure 10 that RLS, however, could still guarantee the convergence through online learning under the actual network conditions. Benefiting from its in-built learning ability, RLS could work adaptively according to the outside environment.

6. Conclusion

In this paper, we address the problem of packet scheduling in BWA networks. We cast the problem into the framework of semi-Markov Decision Process and solve it by the method of reinforcement learning. The proposed scheduling algorithm RLS simultaneously considers three performance issues in BWA networks: (i) QoS differentiation and guarantee, (ii) bandwidth utilization, and (iii) fair service. Being different from traditional scheduling algorithms, RLS is learning-based. It has the potential to learn from outside environment and adaptively regulate its scheduling policy. Simulation experiments are carried out to evaluate the performance of RLS. The numeric results indicate that RLS outperforms two existing scheduling algorithms in the sense that it provides better QoS for realtime traffic flows and meanwhile significantly improves the throughput of nonrealtime ones. Moreover, RLS achieves both long-term and short-term fairness.

Appendix

Proof of Theorem 1

Suppose that the total bandwidth of the system is https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq234_HTML.gif . Let https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq235_HTML.gif denote the proportion of bandwidth allocated by the scheduler to session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq236_HTML.gif under the Fluid limit, that is, https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq237_HTML.gif . The scheduling problem could be formulated as a linear programming:
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ25_HTML.gif
(A.1)
By introducing a positive Lagrangian multiplier https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq238_HTML.gif , (A.1) can be rewritten as the following unconstrained optimization problem:
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ26_HTML.gif
(A.2)
For session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq239_HTML.gif with nonzero service proportion https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq240_HTML.gif , we let https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq241_HTML.gif and get
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ27_HTML.gif
(A.3)
By substituting (A.3) into the constraint https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq242_HTML.gif , we have
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ28_HTML.gif
(A.4)
The proportion of service for session https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_IEq243_HTML.gif is finally given by
https://static-content.springer.com/image/art%3A10.1155%2F2009%2F482764/MediaObjects/13638_2009_Article_1676_Equ29_HTML.gif
(A.5)

Acknowledgment

The work in this paper is supported by programs of NSFC under Grant nos. 60903170, U0835003, and U0635001.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://​creativecommons.​org/​licenses/​by/​2.​0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Literature
1.
go back to reference Cao Y, Victor OK: Scheduling algorithms in broad-band wireless networks. Proceedings of the IEEE 2001, 89(1):76-87. 10.1109/5.904507CrossRef Cao Y, Victor OK: Scheduling algorithms in broad-band wireless networks. Proceedings of the IEEE 2001, 89(1):76-87. 10.1109/5.904507CrossRef
2.
go back to reference Sutton RS: Learning to predict by the methods of temporal differences. Machine Learning 1988, 3(1):9-44. Sutton RS: Learning to predict by the methods of temporal differences. Machine Learning 1988, 3(1):9-44.
3.
go back to reference Wischhof L, Lockwood JW: Packet scheduling for link-sharing and quality of service support in wireless local area. November 2001., (WUCS-01-35): Wischhof L, Lockwood JW: Packet scheduling for link-sharing and quality of service support in wireless local area. November 2001., (WUCS-01-35):
4.
go back to reference Shreedhar M, Varghese G: Efficient fair queuing using deficit round-robin. IEEE/ACM Transactions on Networking 1996, 4(3):375-385. 10.1109/90.502236CrossRef Shreedhar M, Varghese G: Efficient fair queuing using deficit round-robin. IEEE/ACM Transactions on Networking 1996, 4(3):375-385. 10.1109/90.502236CrossRef
5.
go back to reference Parekh AK, Gallage RG: A generalized processor sharing approach to fow control in integrated services networks: the single-node case. IEEE/ACM Transactions on Networking 1993, 1(3):344-357. 10.1109/90.234856CrossRef Parekh AK, Gallage RG: A generalized processor sharing approach to fow control in integrated services networks: the single-node case. IEEE/ACM Transactions on Networking 1993, 1(3):344-357. 10.1109/90.234856CrossRef
7.
go back to reference Elliott EO: Estimates of error rates for codes on burst-noise channels. Bell System Technical Journal 1963, 42(9):1977-1997.CrossRef Elliott EO: Estimates of error rates for codes on burst-noise channels. Bell System Technical Journal 1963, 42(9):1977-1997.CrossRef
8.
go back to reference Lu S, Bharghavan V, Srikant R: Fair scheduling in wireless packet networks. IEEE/ACM Transactions on Networking 1999, 7(4):473-489. 10.1109/90.793003CrossRef Lu S, Bharghavan V, Srikant R: Fair scheduling in wireless packet networks. IEEE/ACM Transactions on Networking 1999, 7(4):473-489. 10.1109/90.793003CrossRef
9.
go back to reference Eugen Ng TS, Stoica I, Zhang H: Packet fair queueing algorithms for wireless networks with location-dependent errors. Proceedings of the 17th Annual IEEE Conference on Computer Communications (INFOCOM '98), March 1998, San Francisco, Calif, USA 3: 1103-1111. Eugen Ng TS, Stoica I, Zhang H: Packet fair queueing algorithms for wireless networks with location-dependent errors. Proceedings of the 17th Annual IEEE Conference on Computer Communications (INFOCOM '98), March 1998, San Francisco, Calif, USA 3: 1103-1111.
10.
go back to reference Wong WK, Zhu H, Leung VCM: Soft QoS provisioning using the token bank fair queuing scheduling algorithm. IEEE Wireless Communications 2003, 10(3):8-16. 10.1109/MWC.2003.1209591CrossRef Wong WK, Zhu H, Leung VCM: Soft QoS provisioning using the token bank fair queuing scheduling algorithm. IEEE Wireless Communications 2003, 10(3):8-16. 10.1109/MWC.2003.1209591CrossRef
11.
go back to reference Andrews M, Kumaran K, Ramanan K, Stolyar A, Whiting P, Vijayakumar R: Providing quality of service over a shared wireless link. IEEE Communications Magazine 2001, 39(2):150-153. 10.1109/35.900644CrossRef Andrews M, Kumaran K, Ramanan K, Stolyar A, Whiting P, Vijayakumar R: Providing quality of service over a shared wireless link. IEEE Communications Magazine 2001, 39(2):150-153. 10.1109/35.900644CrossRef
12.
go back to reference Jalali A, Padovani R, Pankaj R: Data throughput of CDMA-HDR a high efficiency-high data rate personal communication wireless system. Proceedings of the IEEE Vehicular Technology Conference (VTC '00), 2000 3: 1854-1858. Jalali A, Padovani R, Pankaj R: Data throughput of CDMA-HDR a high efficiency-high data rate personal communication wireless system. Proceedings of the IEEE Vehicular Technology Conference (VTC '00), 2000 3: 1854-1858.
13.
go back to reference Liu X, Chong EKP, Shroff NB: Opportunistic transmission scheduling with resource-sharing constraints in wireless networks. IEEE Journal on Selected Areas in Communications 2001, 19(10):2053-2064. 10.1109/49.957318CrossRef Liu X, Chong EKP, Shroff NB: Opportunistic transmission scheduling with resource-sharing constraints in wireless networks. IEEE Journal on Selected Areas in Communications 2001, 19(10):2053-2064. 10.1109/49.957318CrossRef
14.
go back to reference Borst S, Whiting P: Dynamic rate control algorithms for HDR throughput optimization. Proceedings of the 20th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM '01), April 2001, Anchorage, Alaska, USA 2: 976-985. Borst S, Whiting P: Dynamic rate control algorithms for HDR throughput optimization. Proceedings of the 20th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM '01), April 2001, Anchorage, Alaska, USA 2: 976-985.
15.
go back to reference Park D, Seo H, Kwon H, Lee BG: Wireless packet scheduling based on the cumulative distribution function of user transmission rates. IEEE Transactions on Communications 2005, 53(11):1919-1929. 10.1109/TCOMM.2005.858675CrossRef Park D, Seo H, Kwon H, Lee BG: Wireless packet scheduling based on the cumulative distribution function of user transmission rates. IEEE Transactions on Communications 2005, 53(11):1919-1929. 10.1109/TCOMM.2005.858675CrossRef
16.
go back to reference Wongthavarawat K, Ganz A: Packet scheduling for QoS support in IEEE 802.16 broadband wireless access systems. International Journal of Communication Systems 2003, 16(1):81-96. 10.1002/dac.581CrossRefMATH Wongthavarawat K, Ganz A: Packet scheduling for QoS support in IEEE 802.16 broadband wireless access systems. International Journal of Communication Systems 2003, 16(1):81-96. 10.1002/dac.581CrossRefMATH
17.
go back to reference Bhagwat P, Bhattacharya P, Krishna A, Tripathi SK: Enhancing throughput over wireless LANs using channel state dependent packet scheduling. Proceedings of the 15th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM '96), March 1996 3: 1133-1140.CrossRef Bhagwat P, Bhattacharya P, Krishna A, Tripathi SK: Enhancing throughput over wireless LANs using channel state dependent packet scheduling. Proceedings of the 15th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM '96), March 1996 3: 1133-1140.CrossRef
18.
go back to reference Wang HS, Moayeri N: Finite-state Markov channel—a useful model for radio communication channels. IEEE Transactions on Vehicular Technology 1995, 44(1):163-171. 10.1109/25.350282CrossRef Wang HS, Moayeri N: Finite-state Markov channel—a useful model for radio communication channels. IEEE Transactions on Vehicular Technology 1995, 44(1):163-171. 10.1109/25.350282CrossRef
19.
go back to reference Zhang Q, Kassam SA: Finite-state markov model for Rayleigh fading channels. IEEE Transactions on Communications 1999, 47(11):1688-1692. 10.1109/26.803503CrossRef Zhang Q, Kassam SA: Finite-state markov model for Rayleigh fading channels. IEEE Transactions on Communications 1999, 47(11):1688-1692. 10.1109/26.803503CrossRef
20.
go back to reference Bertsekas DP: Dynamic Programming and Optimal Control, Vol. 1 and Vol. 2. Athena Scientific, Belmont, Mass, USA; 1995.MATH Bertsekas DP: Dynamic Programming and Optimal Control, Vol. 1 and Vol. 2. Athena Scientific, Belmont, Mass, USA; 1995.MATH
21.
go back to reference Liu P, Berry R, Honig ML: A fluid analysis of utility-based wireless scheduling policies. Proceedings of the 43rd IEEE Conference on Decision and Control (CDC '04), December 2004 3: 3283-3288. Liu P, Berry R, Honig ML: A fluid analysis of utility-based wireless scheduling policies. Proceedings of the 43rd IEEE Conference on Decision and Control (CDC '04), December 2004 3: 3283-3288.
22.
go back to reference Bertsekas DP, Tsitsiklis JN: Neuro-Dynamic Programming. Athena Scientific, Belmont, Mass, USA; 1996.MATH Bertsekas DP, Tsitsiklis JN: Neuro-Dynamic Programming. Athena Scientific, Belmont, Mass, USA; 1996.MATH
Metadata
Title
QoS Differentiated and Fair Packet Scheduling in Broadband Wireless Access Networks
Authors
Rong Yu
Yan Zhang
Shengli Xie
Publication date
01-12-2009
Publisher
Springer International Publishing
DOI
https://doi.org/10.1155/2009/482764

Other articles of this Issue 1/2009

EURASIP Journal on Wireless Communications and Networking 1/2009 Go to the issue

Premium Partner