A data-driven approximate dynamic programming approach based on association rule learning: Spacecraft autonomy as a case study
Introduction
Dynamic programming (DP) and Markov Decision Process (MDP) offer powerful tools for formulating, modeling, and solving decision making problems under uncertainty [36]. In real-world applications, the applicability of DP-based methods is limited by severe scalability issues, also known as curse of dimensionality [36]. Approximate Dynamic Programming (ADP) proves to be a powerful approach in facing these issues for certain classes of multistage stochastic and dynamic problems [4]. ADP is a flourishing research area, coming from a fruitful cross fertilization of ideas from artificial intelligence, optimal control theory, and operations research. Very recent applications can be found in different fields, e.g., smart home energy management system [24], multi-agent robotic systems [14], and spacecraft operations mission [34].
ADP methods are based on the assumption of having either a good estimation of the underlying state transition probability distributions or a computer simulation framework with the capability of generating samples according to such probability distributions [4]. In ADP-based solutions, planning is actually performed through the construction of sub-optimal policies with respect to specific probability distributions. However, in real applications, they are often difficult to be defined due to many practical reasons, e.g., insufficient or conflicting data needed to estimate precisely state transition models, system modeling uncertainties, as well as partial observability of the system variables. In this context, MDPs are also referred to as Markov Decision Processes with Imprecise Probabilities (MDP-IP) [45]. Computing the exact solution of an MDP-IP problem implies solving a maxmin optimization problem [45]. Although many solutions have been proposed, they have proven to be effective only for a few special cases constituted by a small number of states [12], [18]. To counteract such difficulties, model-free approaches can be also adopted. In this respect, Reinforcement Learning-based solutions offer the possibility of producing autonomous agents interacting with their environments in order to improve their own behaviors over time through trial-and-error procedures [3]. However, such experience-driven learning methods cannot be always applied, e.g., in case of complex safety-critical systems. For instance, as for space mission projects, early phase design stages are usually supported by abstract spacecraft models, where complex system-level requirements have to be proved at a more conceptual level [33]. Moreover, spacecrafts can interact with their own final environment only during their long operational phase, when the probability distributions may change, thus invalidating the policy calculated for a specific stochastic set-up [34].
In this paper, we present a data-driven ADP-based approach, which can offer an alternative practical solution in case the above-mentioned ADP assumption cannot be guaranteed and model-free approaches cannot be used. Such solution enables the usage of ADP methods for MDP-IP, which, to the best of our knowledge, is an unexplored research field. The proposed framework leverages Data-Driven Computing, a new computational analysis field which uses gathered data to predict unknown results [25]. In particular, Machine Learning (ML) techniques have gained great relevance in solving several decision-making problems, and are essential in many application fields, such as natural language processing [47], human sentiment analysis [38], financial predictions [2], aeronautical crack detection [7], distributed computational services [9] and more [29]. ML takes its inspiration from many academic disciplines, including computer science, statistics, biology, and psychology. Its primary goal is to automatically extract useful information hidden in data to implement an event predictor based on past experiences. The capability to handle data including uncertainties and inaccuracies represents its main strength. In view of this, we adopt an ML-based approach to address the aforementioned ADP issues.
Our approach can be summarized as shown in Fig. 1. As depicted, given a MDP/ADP problem and the set of state transition probability matrices, at first different policies are calculated via exact DP or ADP methods. After that, the resulting policies are processed by an Apriori-based association rule mining algorithm to uncover relevant relationships hidden within them. A pruning procedure is used to select the most appropriate association rules, and finally an Association Classifier [44] infers the optimal policy in all the possible circumstances. In other words, a stochastic optimization approach is combined with an association rule learning method to select a proper policy scheme less vulnerable to changes in the underlying MDP state transition probability distributions [34].
This approach was firstly explored by the authors in [10]. Compared to that, this manuscript analyzes the motivations of the data-driven ADP approach, offers a more rigorous formulation, revisits the proposed solution in some parts, and provides a more detailed explanation on how to apply it as well as some performance measurements in terms of execution time. The paper has been organized as follows. Section 2 provides some background on the techniques used. In Section 3, we identify the basic assumptions for ADP to work with the intent of justifying the applicability of the proposed method, in case they cannot be guaranteed. Section 4 describes the overall solution which is based on combining well-established ADP techniques with an association rule learning-based approach. Section 5 shows a detailed application of the proposed approach for calculating a suboptimal mission operations plan for spacecrafts with a high level of on-board autonomy. Section 6 presents a comparative analysis of the related works, as well as some performance measurements and further remarks on the proposed solution. Section 7 concludes the paper and discusses some future developments.
Section snippets
Preliminaries
By addressing optimal control problems for dynamic systems under uncertainty, this section provides some background on how to formulate and solve them via stochastic DP frameworks and ADP techniques, respectively. Then, it also gives a short introduction about machine learning and association rules.
Motivations for the proposed methodology
This section analyzes the above-mentioned ADP assumption for the hard aggregation method, which is applied to the case study described in Section 5. As shown hereafter, this concern provides the basis for justifying the proposed data-driven ADP approach. Such justification can be extended to other ADP methods.
The proposed methodology
This section proposes a solution to deal with the issue of having neither enough statistical data to instantiate the right matrix P nor the possibility of constructing an accurate system simulator to calculate a proper policy for stochastic optimal control problems modeled via a DP/ADP framework. We remark that, by varying the state transition probability matrix, the selected policy could be quite different from the one calculated in a specific instantiation of such matrix. As a consequence, we
A case study: spacecraft model-based autonomy
This section shows how to apply the data-driven ADP approach to calculate a proper mission operations plan for spacecraft endowed with a high level of on-board autonomy. The spacecraft decision-making process is modeled via an MDP framework. After having outlined some important concepts for spacecraft on-board autonomy, we show how to use the ADP hard aggregation method to solve the curse of dimensionality. Then, we apply the solution proposed in the previous section to define an appropriate
Related works, performance evaluation, and further remarks
As mentioned earlier, the application of MDP frameworks for solving decision making problems in real cases often requires to work with imprecise state transition probabilities [13], [32], [45]. In other words, MDP frameworks cannot be defined by using specific state transition probability distributions, but they have to be instantiated through a set of probability distributions [11]. Although the MDP-IP formulation is meant to address this setting, its exact solution can be computed only for
Conclusion and future work
In this paper, we have presented a data-driven Approximate Dynamic Programming (ADP) approach to solve large-scale stochastic optimal control problems, wherein the main assumption for ADP to work is not guaranteed. In particular, we have addressed the case of having neither a good estimation of the underlying state transition probability distributions nor an accurate system simulator with the capability of generating samples according to such probability distributions. This issue can happen
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (50)
- et al.
Detecting unfair recommendations in trust-based pervasive environments
Inf. Sci.
(2019) - et al.
Real-time dynamic programming for markov decision processes with imprecise probabilities
Artif. Intell.
(2016) - et al.
Efficient solutions to factored MDPs with imprecise transition probabilities
Artif. Intell.
(2011) - et al.
Bounded-parameter markov decision processes
Artif. Intell.
(2000) - et al.
Bounded-parameter markov decision processes
Artif. Intell.
(2000) - et al.
A machine learning approach for result caching in web search engines
Inf. Process. Manage.
(2017) - et al.
Performance evaluation of distributed association rule mining algorithms
Procedia Comput. Sci.
(2016) - et al.
Fast algorithms for mining association rules in large databases
Proceedings of the 20th International Conference on Very Large Data Bases
(1994) - et al.
Financial Signal Processing and Machine Learning
(2016) - et al.
Deep reinforcement learning a brief survey
IEEE Signal Process. Mag.
(2017)
Approximate policy iteration: a survey and some new methods
J. Control Theor. Appl.
Feeltrust: providing trustworthy communications in ubiquitous mobile environment
2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA)
Solving uncertain markov decision problems: an interval-based method
Proceedings of Advances in Natural Computation, Second International Conference, ICNC
Fast eddy current testing defect classification using lissajous figures
IEEE Trans. Instrum.Meas.
An artificial intelligence-based trust model for pervasive computing
2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC)
Spacecraft autonomy modeled via markov decision process and associative rule-based machine learning
2017 IEEE International Workshop on Metrology for AeroSpace (MetroAeroSpace)
Representing and solving factored markov decision processes with imprecise probabilities
ISIPTA 2009 - Proceedings of the 6th International Symposium on Imprecise Probability: Theories and Applications
An approximate dynamic programming approach to multi-agent persistent monitoring in stochastic environments with temporal logic constraints
IEEE Trans. Autom. Control
Onboard Computers, Onboard Software and Satellite Operations: An Introduction
Multilinear and integer programming for markov decision processes with imprecise probabilities
ISIPTA 2007 - Proceedings of the 5th International Symposium on Imprecise Probability: Theories and Applications
Acceleration of association-rule based markov decision processes
J. Appl. Res. Technol.
Hierarchical solution of markov decision processes using macro-actions
Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence
Algorithms for association rule mining - a general survey and comparison
SIGKDD Explor. Newsl.
SPUDD: stochastic planning using decision diagrams
Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence
An evolutionary random policy search algorithm for solving markov decision processes
INFORMS J. Comput.
Cited by (20)
Optimal maintenance strategy for large-scale production systems under maintenance time uncertainty
2023, Reliability Engineering and System SafetyCorrelation knowledge extraction based on data mining for distribution network planning
2023, Global Energy InterconnectionA co-evolutionary genetic algorithm for robust and balanced controller placement in software-defined networks
2023, Journal of Network and Computer ApplicationsDNS tunnels detection via DNS-images
2022, Information Processing and ManagementCitation Excerpt :In the last decade, a valid alternative to the statistical-based approaches for DNS tunneling detection is represented by the usage of techniques based on Deep Learning (Zhang, Yang, Yu, & Ma, 2019), which has proven to be very effective for the classification of DNS queries in a more accurate and precise manner. Deep Learning, Machine Learning, and more in general Artificial Intelligence-based techniques have been increasingly applied in many heterogeneous contexts (D’Angelo & Palmieri, 2020a, 2020b, 2021; D’Angelo, Tipaldi, Palmieri, & Glielmo, 2019), such as image recognition, sentiment analysis, product and service recommendation systems, spam filtering, robot control systems, and human language processing. Again, recently such techniques are being applied more and more to the medical and industrial fields.
Neural-network estimators based fault-tolerant tracking control for AUV via ADP with rudders faults and ocean current disturbance
2020, NeurocomputingCitation Excerpt :The main contributions of this paper include: 1) The neural-network estimators (NNEs) are designed based on neural network to estimate the rudders faults and ocean current disturbance respectively. 2) The ADP scheme [43–45] is constructed by action neural network and critic neural network which transforms the fault-tolerant tracking control problem into the optimal control problem for AUV with rudders faults and ocean current disturbance. The weights of critic neural network and action neural network are updated online.
A Least-Squares Temporal Difference based method for solving resource allocation problems
2020, IFAC Journal of Systems and ControlCitation Excerpt :ADP is a flourishing research area, emerged through a fruitful cross fertilization of ideas from artificial intelligence, optimal control theory, and operations research. Very recent applications can be found in different fields, such as smart home energy management system (Keerthisinghe, Verbic, & Chapman, 2016), multi-agent robotic systems (Deng, Chen, & Belta, 2017), spacecraft mission operations planning (D’Angelo, Tipaldi, Palmieri, & Glielmo, 2019; Tipaldi & Glielmo, 2018), and power systems (Guo, Liu, Si, He, Harley, & Mei, 2016; Tang, He, Wen, & Liu, 2015). The ADP methods can be grouped into two main classes, that is to say, approximation in value space and approximation in policy space (Bertsekas, 2012a), chapter 6.