Introduction
Related work
Symbol | Quantity |
---|---|
Reinforcement learning | |
S | State-Space, Set of states s |
A | Action-Space, Set of actions a |
\(t \in {\mathbb {N}}\) | Time-step within an episode |
\(K \in {\mathbb {R}}^+\) | Maximum length of episode/Processing path |
\(P(s_{t+1}|s_t,a_t)\) | Probability of the transition to \(s_{t+1}\) when taking \(a_t\) in \(s_t\) |
\(R(s_t, a_t, s_{t+1})\) | Reward for the transition from \(s_t\) to \(s_{t+1}\) via \(a_t\) |
\(\gamma \in [0,1]\) | Discount factor |
\(\pi , \pi ^*\) | (possibly stochastic) control policy, optimal \(\pi \) |
\(\mathcal {V}_\pi :S\rightarrow {\mathbb {R}}\) | Expected future reward for \(\pi \) |
\(\mathcal {V}^*:S\rightarrow {\mathbb {R}}\) | Expected future reward for \(\pi ^*\) |
\(\mathcal {Q}^*:S\times A\rightarrow {\mathbb {R}}\) | Expected future reward for \(\pi ^*\) |
x | Experience tuple \((s_t, a_t, s_{t+1}, R)\) |
\(\mathcal {D}\) | Replay memory, consisting of experience tuples x |
\(\alpha \in [0,1]\) | Q-Learning learning rate |
\(\varvec{\theta },\varvec{\theta }^{{-}}\) | Function approximation weights |
Application | |
\(\mathcal {S}\) | Space of microstructures \(\sigma \), \(\sigma _t\), \(\sigma ^*\) |
\(\mathcal {K}\) | Space of material properties \(\kappa \) |
\(\mathcal {P}\) | Process path |
\(\mathcal {P}^*\) | Optimal process path |
\(\mathcal {G}\) | Set of equivalent target-microstructures \({\check{\sigma }}_\mathcal {G}\) |
\(\mathcal {S}_\mathcal {P}\subseteq \mathcal {S}\) | Reachable Microstructures |
\(d_\sigma :\mathcal {S}\times \mathcal {S}\rightarrow {\mathbb {R}}\) | microstructure distance-function |
f(h) | Orientation Density Function |
Taylor model | |
\(T^{(i)}\) | Cauchy stress of ith crystal |
\({\varvec{F}}\) | Deformation gradient |
\({\varvec{L}}\) | Velocity gradient |
\({\varvec{R}}\) | Rotation matrix |
E | Young’s modulus |
Contribution
Paper structure
Background
Markov decision process and dynamic programming
Deep reinforcement learning
Taylor-type material model
Representation of crystallographic textures
Method
Objective
Single-goal structure-guided processing path optimization
Multi-equivalent-goal structure-guided processing path optimization
Application to crystallographic texture evolution
ODF distance
Application scenario
Implementation details
-
A discount factor of \(\gamma =1.0\).
-
Deep Q-learning as described in Sect. 2.2 as basic algorithm where the target-network is updated every \(n_\theta =250\) time-steps.
-
Double-Q-learning and dueling-Q-learning as described in Sect. 2.2.
-
An \(\epsilon \)-greedy policy, with an initial exploration rate \(\epsilon _\text {0}=0.5\) and the final exploration-rate \(\epsilon _\text {f}=0.1\), with \(n_\epsilon =50\).
-
Q-networks with hidden layer sizes of [128, 64, 32], layer normalization and ReLU activation functions. The learning process starts after 100 control-steps. The networks are trained after each control-step with a mini-batch of size 32.
-
ADAM is used as optimizer for neural network training, with a learning rate of \(5e^{-4}\).
-
Due to the higher data complexity we expanded the neural network to 4 hidden layers of sizes [128, 256, 256, 128].
-
Per time-step the network is trained with four mini-batches instead of one.
-
For improved stability, we reduced the target-network update frequency to \(n_\theta = 500\) steps.
-
Exploration parameters are \(\epsilon _\text {0}=0.5\) and the final exploration-rate \(\epsilon _\text {f}=0.0\), with \(n_\epsilon =190\). The \(\epsilon _\text {G}\)-greedy target choice is parametrized with \(\epsilon ^\text {G}_0=1.0\), \(\epsilon ^\text {G}_f=0.0\), \(n_{\epsilon ^\text {G}}=190\).
Results and discussion
Orientation histogram parameters
k | ||||
---|---|---|---|---|
1 | 3 | 25 | ||
256 | 574 | 238 | 401 | |
J | 512 | 493 | 201 | 275 |
8192 | 196 | 77 | 57 |
Single-goal processing path optimization
Goal |
\(E_{11}\)
|
\(E_{22}\)
|
\(E_{33}\)
|
---|---|---|---|
\({\check{\sigma }}^{0}\)
| 221 | 223 | 221 |
\({\check{\sigma }}^{1}\)
| 216 | 221 | 212 |
\({\check{\sigma }}^{2}\)
| 223 | 219 | 214 |
\({\check{\sigma }}^{3}\)
| 222 | 218 | 223 |
\({\check{\sigma }}^{4}\)
| 224 | 218 | 219 |
\({\check{\sigma }}^{5}\)
| 227 | 226 | 233 |
Multi-equivalent-goal processing path optimization
-
\(\mathcal {G}^{\sigma \text {4-equiv}}\) consists of 10 distinctive \({\check{\sigma }}^{4}\)-equivalent microstructures \({\check{\sigma }}^{g}_{4\text {-equiv}}\) and
-
\(\mathcal {G}^{\sigma \text {2-equiv}}\) consists of 10 distinctive \({\check{\sigma }}^{2}\)-equivalent microstructures \({\check{\sigma }}^{g}_{2\text {-equiv}}\).