A data-driven approximate dynamic programming approach based on association rule learning: Spacecraft autonomy as a case study

doi:10.1016/j.ins.2019.07.067

Information Sciences

Volume 504, December 2019, Pages 501-519

https://doi.org/10.1016/j.ins.2019.07.067 Get rights and content

Abstract

Dynamic programming (DP) and Markov Decision Process (MDP) offer powerful tools for formulating, modeling, and solving decision making problems under uncertainty. In real-world applications, the applicability of DP is limited by severe scalability issues. These issues can be addressed by Approximate Dynamic Programming (ADP) techniques. ADP methods are based on the assumption of having either a proper estimation of the underlying state transition probability distributions or a simulation mechanism with the capability of generating samples according to such probability distributions. In this paper, we present a data-driven ADP-based approach, which can offer an alternative in case such assumption cannot be guaranteed. In particular, when varying the set-up of the MDP state transition probability matrix, different policies can be calculated through exact DP or ADP methods. Such policies are then processed by an Apriori-based algorithm to find frequent association rules within them. A pruning procedure is used to select the most suitable association rules, and finally an Association Classifier infers the optimal policy in all the possible circumstances. We show a detailed application of the proposed approach for the calculation of a proper mission operations plan for spacecrafts with a high level of on-board autonomy.

Introduction

Dynamic programming (DP) and Markov Decision Process (MDP) offer powerful tools for formulating, modeling, and solving decision making problems under uncertainty [36]. In real-world applications, the applicability of DP-based methods is limited by severe scalability issues, also known as curse of dimensionality [36]. Approximate Dynamic Programming (ADP) proves to be a powerful approach in facing these issues for certain classes of multistage stochastic and dynamic problems [4]. ADP is a flourishing research area, coming from a fruitful cross fertilization of ideas from artificial intelligence, optimal control theory, and operations research. Very recent applications can be found in different fields, e.g., smart home energy management system [24], multi-agent robotic systems [14], and spacecraft operations mission [34].

ADP methods are based on the assumption of having either a good estimation of the underlying state transition probability distributions or a computer simulation framework with the capability of generating samples according to such probability distributions [4]. In ADP-based solutions, planning is actually performed through the construction of sub-optimal policies with respect to specific probability distributions. However, in real applications, they are often difficult to be defined due to many practical reasons, e.g., insufficient or conflicting data needed to estimate precisely state transition models, system modeling uncertainties, as well as partial observability of the system variables. In this context, MDPs are also referred to as Markov Decision Processes with Imprecise Probabilities (MDP-IP) [45]. Computing the exact solution of an MDP-IP problem implies solving a maxmin optimization problem [45]. Although many solutions have been proposed, they have proven to be effective only for a few special cases constituted by a small number of states [12], [18]. To counteract such difficulties, model-free approaches can be also adopted. In this respect, Reinforcement Learning-based solutions offer the possibility of producing autonomous agents interacting with their environments in order to improve their own behaviors over time through trial-and-error procedures [3]. However, such experience-driven learning methods cannot be always applied, e.g., in case of complex safety-critical systems. For instance, as for space mission projects, early phase design stages are usually supported by abstract spacecraft models, where complex system-level requirements have to be proved at a more conceptual level [33]. Moreover, spacecrafts can interact with their own final environment only during their long operational phase, when the probability distributions may change, thus invalidating the policy calculated for a specific stochastic set-up [34].

In this paper, we present a data-driven ADP-based approach, which can offer an alternative practical solution in case the above-mentioned ADP assumption cannot be guaranteed and model-free approaches cannot be used. Such solution enables the usage of ADP methods for MDP-IP, which, to the best of our knowledge, is an unexplored research field. The proposed framework leverages Data-Driven Computing, a new computational analysis field which uses gathered data to predict unknown results [25]. In particular, Machine Learning (ML) techniques have gained great relevance in solving several decision-making problems, and are essential in many application fields, such as natural language processing [47], human sentiment analysis [38], financial predictions [2], aeronautical crack detection [7], distributed computational services [9] and more [29]. ML takes its inspiration from many academic disciplines, including computer science, statistics, biology, and psychology. Its primary goal is to automatically extract useful information hidden in data to implement an event predictor based on past experiences. The capability to handle data including uncertainties and inaccuracies represents its main strength. In view of this, we adopt an ML-based approach to address the aforementioned ADP issues.

Our approach can be summarized as shown in Fig. 1. As depicted, given a MDP/ADP problem and the set of state transition probability matrices, at first different policies are calculated via exact DP or ADP methods. After that, the resulting policies are processed by an Apriori-based association rule mining algorithm to uncover relevant relationships hidden within them. A pruning procedure is used to select the most appropriate association rules, and finally an Association Classifier [44] infers the optimal policy in all the possible circumstances. In other words, a stochastic optimization approach is combined with an association rule learning method to select a proper policy scheme less vulnerable to changes in the underlying MDP state transition probability distributions [34].

This approach was firstly explored by the authors in [10]. Compared to that, this manuscript analyzes the motivations of the data-driven ADP approach, offers a more rigorous formulation, revisits the proposed solution in some parts, and provides a more detailed explanation on how to apply it as well as some performance measurements in terms of execution time. The paper has been organized as follows. Section 2 provides some background on the techniques used. In Section 3, we identify the basic assumptions for ADP to work with the intent of justifying the applicability of the proposed method, in case they cannot be guaranteed. Section 4 describes the overall solution which is based on combining well-established ADP techniques with an association rule learning-based approach. Section 5 shows a detailed application of the proposed approach for calculating a suboptimal mission operations plan for spacecrafts with a high level of on-board autonomy. Section 6 presents a comparative analysis of the related works, as well as some performance measurements and further remarks on the proposed solution. Section 7 concludes the paper and discusses some future developments.

Section snippets

Preliminaries

By addressing optimal control problems for dynamic systems under uncertainty, this section provides some background on how to formulate and solve them via stochastic DP frameworks and ADP techniques, respectively. Then, it also gives a short introduction about machine learning and association rules.

Motivations for the proposed methodology

This section analyzes the above-mentioned ADP assumption for the hard aggregation method, which is applied to the case study described in Section 5. As shown hereafter, this concern provides the basis for justifying the proposed data-driven ADP approach. Such justification can be extended to other ADP methods.

The proposed methodology

This section proposes a solution to deal with the issue of having neither enough statistical data to instantiate the right matrix P nor the possibility of constructing an accurate system simulator to calculate a proper policy for stochastic optimal control problems modeled via a DP/ADP framework. We remark that, by varying the state transition probability matrix, the selected policy could be quite different from the one calculated in a specific instantiation of such matrix. As a consequence, we

A case study: spacecraft model-based autonomy

This section shows how to apply the data-driven ADP approach to calculate a proper mission operations plan for spacecraft endowed with a high level of on-board autonomy. The spacecraft decision-making process is modeled via an MDP framework. After having outlined some important concepts for spacecraft on-board autonomy, we show how to use the ADP hard aggregation method to solve the curse of dimensionality. Then, we apply the solution proposed in the previous section to define an appropriate

Related works, performance evaluation, and further remarks

As mentioned earlier, the application of MDP frameworks for solving decision making problems in real cases often requires to work with imprecise state transition probabilities [13], [32], [45]. In other words, MDP frameworks cannot be defined by using specific state transition probability distributions, but they have to be instantiated through a set of probability distributions [11]. Although the MDP-IP formulation is meant to address this setting, its exact solution can be computed only for

Conclusion and future work

In this paper, we have presented a data-driven Approximate Dynamic Programming (ADP) approach to solve large-scale stochastic optimal control problems, wherein the main assumption for ADP to work is not guaranteed. In particular, we have addressed the case of having neither a good estimation of the underlying state transition probability distributions nor an accurate system simulator with the capability of generating samples according to such probability distributions. This issue can happen

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (50)

G. D’Angelo et al.
Detecting unfair recommendations in trust-based pervasive environments
Inf. Sci.
(2019)
K.V. Delgado et al.
Real-time dynamic programming for markov decision processes with imprecise probabilities
Artif. Intell.
(2016)
K.V. Delgado et al.
Efficient solutions to factored MDPs with imprecise transition probabilities
Artif. Intell.
(2011)
R. Givan et al.
Bounded-parameter markov decision processes
Artif. Intell.
(2000)
R. Givan et al.
Bounded-parameter markov decision processes
Artif. Intell.
(2000)
T. Kucukyilmaz et al.
A machine learning approach for result caching in web search engines
Inf. Process. Manage.
(2017)
V. Sawant et al.
Performance evaluation of distributed association rule mining algorithms
Procedia Comput. Sci.
(2016)
R. Agrawal et al.
Fast algorithms for mining association rules in large databases
Proceedings of the 20th International Conference on Very Large Data Bases
(1994)
A. Akansu et al.
Financial Signal Processing and Machine Learning
(2016)
K. Arulkumaran et al.
Deep reinforcement learning a brief survey
IEEE Signal Process. Mag.
(2017)

D. Bertsekas

Approximate policy iteration: a survey and some new methods

J. Control Theor. Appl.

(2011)

G. Carullo et al.

Feeltrust: providing trustworthy communications in ubiquitous mobile environment

2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA)

(2013)

S. Cui et al.

Solving uncertain markov decision problems: an interval-based method

Proceedings of Advances in Natural Computation, Second International Conference, ICNC

(2006)

G. D’Angelo et al.

Fast eddy current testing defect classification using lissajous figures

IEEE Trans. Instrum.Meas.

(2018)

G. D’Angelo et al.

An artificial intelligence-based trust model for pervasive computing

2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC)

(2015)

G. D’Angelo et al.

Spacecraft autonomy modeled via markov decision process and associative rule-based machine learning

2017 IEEE International Workshop on Metrology for AeroSpace (MetroAeroSpace)

(2017)

K. Delgado et al.

Representing and solving factored markov decision processes with imprecise probabilities

ISIPTA 2009 - Proceedings of the 6th International Symposium on Imprecise Probability: Theories and Applications

(2009)

K. Deng et al.

An approximate dynamic programming approach to multi-agent persistent monitoring in stochastic environments with temporal logic constraints

IEEE Trans. Autom. Control

(2017)

J. Eickhoff

Onboard Computers, Onboard Software and Satellite Operations: An Introduction

(2012)

R. Filho et al.

Multilinear and integer programming for markov decision processes with imprecise probabilities

ISIPTA 2007 - Proceedings of the 5th International Symposium on Imprecise Probability: Theories and Applications

(2007)

M. de G. Garca-Hernndez et al.

Acceleration of association-rule based markov decision processes

J. Appl. Res. Technol.

(2008)

M. Hauskrecht et al.

Hierarchical solution of markov decision processes using macro-actions

Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence

(1998)

J. Hipp et al.

Algorithms for association rule mining - a general survey and comparison

SIGKDD Explor. Newsl.

(2000)

J. Hoey et al.

SPUDD: stochastic planning using decision diagrams

Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence

(1999)

J. Hu et al.

An evolutionary random policy search algorithm for solving markov decision processes

INFORMS J. Comput.

(2007)

Cited by (20)

Optimal maintenance strategy for large-scale production systems under maintenance time uncertainty
2023, Reliability Engineering and System Safety
Many complex industrial systems have numerous operational states and oftentimes suffer from a variety of uncertainties. Determining how to address the uncertainty and efficiently reduce the number of system states has practical significance. This work is dedicated to developing a maintenance strategy based on the approximate dynamic programming (ADP) method for large-scale maintainable systems suffering from maintenance time uncertainties. In this work, an optimal schedule algorithm is proposed to address the optimal assignment problem of how to assign a certain number of components to several technicians, with the goal of minimizing the total maintenance time subject to maintenance time uncertainty. Thereafter, an optimal maintenance strategy for a system with a large number of states over a finite time horizon is developed based on the Markov decision process combined with ADP. To solve such a maintenance strategy, a solution algorithm is proposed where the system state reached in the next decision period is estimated by a simulation technique, and the post-decision state method is utilized to achieve the reduction of the number of system ergodic states. The results of numerical examples and a practical application demonstrate the effectiveness, high performance and applicability of the developed strategy.
Correlation knowledge extraction based on data mining for distribution network planning
2023, Global Energy Interconnection
Traditional distribution network planning relies on the professional knowledge of planners, especially when analyzing the correlations between the problems existing in the network and the crucial influencing factors. The inherent laws reflected by the historical data of the distribution network are ignored, which affects the objectivity of the planning scheme. In this study, to improve the efficiency and accuracy of distribution network planning, the characteristics of distribution network data were extracted using a data-mining technique, and correlation knowledge of existing problems in the network was obtained. A data-mining model based on correlation rules was established. The inputs of the model were the electrical characteristic indices screened using the gray correlation method. The Apriori algorithm was used to extract correlation knowledge from the operational data of the distribution network and obtain strong correlation rules. Degree of promotion and chi-square tests were used to verify the rationality of the strong correlation rules of the model output. In this study, the correlation relationship between heavy load or overload problems of distribution network feeders in different regions and related characteristic indices was determined, and the confidence of the correlation rules was obtained. These results can provide an effective basis for the formulation of a distribution network planning scheme.
A co-evolutionary genetic algorithm for robust and balanced controller placement in software-defined networks
2023, Journal of Network and Computer Applications
The controller placement problem (CPP) is one of the main issues that need to be addressed in the context of Software Defined Networking (SDN), especially when different aspects are being considered, such as latency, capacity, reliability, and load balancing. Most of the solutions in the literature address these aspects by considering a fixed load for each controller and attempting to equally distribute the traffic demand of the switches among the controllers, which also have a fixed common capacity. On the contrary, in this work, the CPP is studied by considering load, controller capacity, and the failure probability of controllers and links as varying over time. The CPP is formulated in terms of a robust optimization problem, which, by introducing the concept of scenario, takes into account changes in the network status due to failures, load variations, and changes in switches’ demand and controllers’ capacity. The provided solution is robust, that is, neither controllers’ re-placement nor switches’ re-assignment is required as network conditions change. Besides, a co-evolutionary algorithm is provided to solve the aforementioned optimization problem. Two populations coevolve based on the concept of complementary evolution of allied species in nature. Experimental results on a set of real-world network topologies and comparisons with the state-of-the-art have proven the superiority of the proposal, in terms of better latency, load balancing and resilience, in solving the CPP under different network status changes that might occur over time.
DNS tunnels detection via DNS-images
2022, Information Processing and Management
Citation Excerpt :
In the last decade, a valid alternative to the statistical-based approaches for DNS tunneling detection is represented by the usage of techniques based on Deep Learning (Zhang, Yang, Yu, & Ma, 2019), which has proven to be very effective for the classification of DNS queries in a more accurate and precise manner. Deep Learning, Machine Learning, and more in general Artificial Intelligence-based techniques have been increasingly applied in many heterogeneous contexts (D’Angelo & Palmieri, 2020a, 2020b, 2021; D’Angelo, Tipaldi, Palmieri, & Glielmo, 2019), such as image recognition, sentiment analysis, product and service recommendation systems, spam filtering, robot control systems, and human language processing. Again, recently such techniques are being applied more and more to the medical and industrial fields.
DNS tunneling is a typical attack adopted by cyber-criminals to compromise victims’ devices, steal sensitive data, or perform fraudulent actions against third parties without their knowledge. The fraudulent traffic is encapsulated into DNS queries to evade intrusion detection. Unfortunately, traditional defense systems based on Deep Packet Inspection cannot always detect such traffic. As a result, DNS tunneling is one problem that has worried the cybersecurity community over the past decade.
In this paper, we propose a robust and reliable Deep Learning-based DNS tunneling detection approach to mine valuable insight from DNS query payloads. More precisely, several features are first extracted by the DNS flow, and then they are arranged as bi-dimensional images. A Convolutional Neural Network is used to automatically and adaptively learn spatial hierarchies of features to be used in a fully connected neural network for traffic classification. The proposed approach may result in an extremely interesting task in predictive security approaches to attack detection.
The effectiveness of the proposal is evaluated in several experiments using a real-world traffic dataset. The obtained results show that our approach achieves 99.99% of accuracy and performs better than state-of-the-art solutions.
Neural-network estimators based fault-tolerant tracking control for AUV via ADP with rudders faults and ocean current disturbance
2020, Neurocomputing
Citation Excerpt :
The main contributions of this paper include: 1) The neural-network estimators (NNEs) are designed based on neural network to estimate the rudders faults and ocean current disturbance respectively. 2) The ADP scheme [43–45] is constructed by action neural network and critic neural network which transforms the fault-tolerant tracking control problem into the optimal control problem for AUV with rudders faults and ocean current disturbance. The weights of critic neural network and action neural network are updated online.
This paper investigates fault-tolerant tracking control problem for autonomous underwater vehicle (AUV) with rudders faults and ocean current disturbance. The adaptive dynamic programming (ADP) method is adopted to transform the fault-tolerant tracking control problem into an optimal control problem. Two neural-network estimators (NNEs) are designed to estimate rudders faults and ocean current disturbance respectively. The estimated rudders faults and the estimated ocean current disturbance are utilized to construct the performance index function. By using policy iteration (PI), critic neural network and action neural network are constructed to solve the Hamilton-Jacobi-Bellman (HJB) equation. The error tracking system of AUV is guaranteed to be uniformly ultimately bounded (UUB) based on the Lyapunov stability theorem. Simulation results are given to verify the effectiveness of the control scheme proposed in this paper.
A Least-Squares Temporal Difference based method for solving resource allocation problems
2020, IFAC Journal of Systems and Control
Citation Excerpt :
ADP is a flourishing research area, emerged through a fruitful cross fertilization of ideas from artificial intelligence, optimal control theory, and operations research. Very recent applications can be found in different fields, such as smart home energy management system (Keerthisinghe, Verbic, & Chapman, 2016), multi-agent robotic systems (Deng, Chen, & Belta, 2017), spacecraft mission operations planning (D’Angelo, Tipaldi, Palmieri, & Glielmo, 2019; Tipaldi & Glielmo, 2018), and power systems (Guo, Liu, Si, He, Harley, & Mei, 2016; Tang, He, Wen, & Liu, 2015). The ADP methods can be grouped into two main classes, that is to say, approximation in value space and approximation in policy space (Bertsekas, 2012a), chapter 6.
Value function approximation has a central role in Approximate Dynamic Programming (ADP) to overcome the so-called curse of dimensionality associated to real stochastic processes. In this regard, we propose a novel Least-Squares Temporal Difference (LSTD) based method: the “Multi-trajectory Greedy LSTD” (MG-LSTD). It is an exploration-enhanced recursive LSTD algorithm with the policy improvement embedded within the LSTD iterations. It makes use of multi-trajectories Monte Carlo simulations in order to enhance the system state space exploration.
This method is applied for solving resource allocation problems modeled via a constrained Stochastic Dynamic Programming (SDP) based framework. In particular, such problems are formulated as a set of parallel Birth–Death Processes (BDPs). Some operational scenarios are defined and solved to show the effectiveness of the proposed approach. Finally, we provide some experimental evidence on the MG-LSTD algorithm convergence properties in function of its key-parameters.

View all citing articles on Scopus

View full text

A data-driven approximate dynamic programming approach based on association rule learning: Spacecraft autonomy as a case study

Abstract

Introduction

Section snippets

Preliminaries

Motivations for the proposed methodology

The proposed methodology

A case study: spacecraft model-based autonomy

Related works, performance evaluation, and further remarks

Conclusion and future work

Declaration of interests

Inf. Sci.

Artif. Intell.

Artif. Intell.

Artif. Intell.

Artif. Intell.

Inf. Process. Manage.

Procedia Comput. Sci.

Fast algorithms for mining association rules in large databases

Proceedings of the 20th International Conference on Very Large Data Bases

Financial Signal Processing and Machine Learning

Deep reinforcement learning a brief survey

IEEE Signal Process. Mag.

Approximate policy iteration: a survey and some new methods

J. Control Theor. Appl.

Feeltrust: providing trustworthy communications in ubiquitous mobile environment

2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA)

Solving uncertain markov decision problems: an interval-based method

Proceedings of Advances in Natural Computation, Second International Conference, ICNC

Fast eddy current testing defect classification using lissajous figures

IEEE Trans. Instrum.Meas.

An artificial intelligence-based trust model for pervasive computing

2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC)

Spacecraft autonomy modeled via markov decision process and associative rule-based machine learning

2017 IEEE International Workshop on Metrology for AeroSpace (MetroAeroSpace)

Representing and solving factored markov decision processes with imprecise probabilities

ISIPTA 2009 - Proceedings of the 6th International Symposium on Imprecise Probability: Theories and Applications

An approximate dynamic programming approach to multi-agent persistent monitoring in stochastic environments with temporal logic constraints

IEEE Trans. Autom. Control

Onboard Computers, Onboard Software and Satellite Operations: An Introduction

Multilinear and integer programming for markov decision processes with imprecise probabilities

ISIPTA 2007 - Proceedings of the 5th International Symposium on Imprecise Probability: Theories and Applications

Acceleration of association-rule based markov decision processes

J. Appl. Res. Technol.

Hierarchical solution of markov decision processes using macro-actions

Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence

Algorithms for association rule mining - a general survey and comparison

SIGKDD Explor. Newsl.

SPUDD: stochastic planning using decision diagrams

Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence

An evolutionary random policy search algorithm for solving markov decision processes

INFORMS J. Comput.