Introduction
Literature review
Theoretical framework
Inventory management
Buffer
Red zone
Yellow zone
Green zone
Qualified demand
Net flow inventory
Optimal level of inventory
Purchase order
Reinforcement learning
Q-Learning
Shaping function
Proposed model
Purchase order
Our optimal inventory level
Markov decision process
Actions
Rewards
States
Variables and assumptions
-
Demand: given that the historical data of the demand was very limited in the proposed scenarios, it was decided to generate pseudo-random data for learning the model by means of the Mersenne Twister algorithm, from the maximum and minimum demand identified in the historical data. The Mersenne Twister algorithm was selected for two reasons: first, because it is one of the best generators of pseudo-random numbers (Matsumoto & Nishimura, 1998), and second, because its characteristics can significantly favor the convergence time of an algorithm (Bonato et al., 2013).
-
ADU-OSH: For the real scenario, these variables were calculated from the demand of the previous number, based on a 60-day moving average.
-
DLT-Lead Time-OSH: taken from the median of the Lead Time of the last year of the historical data.
-
Initial OH: the initial OH was determined from the final inventory of the period prior to the testing of the historical data.
-
MOQ: calculated as the minimum purchase order present in the historical data.
Experimentation
Case studies
Case study | Min demand | Mean demand | Median demand | Max demand | Standard dev. demand | LT | MOQ |
---|---|---|---|---|---|---|---|
P1 | 2 | 10.53 | 9 | 30 | 6.03 | 7 | 20 |
P2 | 0 | 2.52 | 0 | 72 | 8.20 | 9 | 40 |
P3 | 0 | 3.61 | 0 | 256 | 14.78 | 15 | 16 |
P4 | 0 | 1.05 | 0 | 26 | 2.61 | 7 | 4 |
P1: theoretical case study
P2: case study of product 39,933
P3: case study of product 28,440
P4: case study of product 43,387
Evaluation metrics
Logistic metrics
RL metrics
Learning and evaluation periods
Results analysis
P1: Theoretical case study
(a) Results based on DDMRP OH* | ||||||
---|---|---|---|---|---|---|
Episodes | P1R1 | P1R2 | P1R3 | |||
AAR | PBAR | AAR | PBAR | AAR | PBAR | |
100 | 0.06 | 0.93 | 0.21 | 0.80 | 0.10 | 0.91 |
200 | 0.02 | 1.00 | 0.19 | 0.80 | 0.17 | 0.98 |
500 | 0.07 | 1.00 | 0.15 | 0.80 | 0.33 | 0.99 |
1000 | 0.05 | 1.00 | 0.08 | 0.85 | 0.39 | 0.99 |
2000 | 0.04 | 1.00 | 0.04 | 0.85 | 0.51 | 1.00 |
5000 | 0.20 | 1.00 | 0.23 | 0.92 | 0.44 | 1.00 |
10,000 | 0.58 | 1.00 | 0.48 | 0.94 | 0.61 | 1.00 |
20,000 | 0.89 | 1.00 | 0.84 | 1.00 | 0.90 | 1.00 |
30,000 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | 1.00 |
(b) Results based on our proposed RL OH* | ||||||
---|---|---|---|---|---|---|
Episodes | P1R1P | P1R2P | P1R3P | |||
AAR | PBAR | AAR | PBAR | AAR | PBAR | |
100 | 0.06 | 0.93 | 0.21 | 0.80 | 0.42 | 0.97 |
200 | 0.02 | 1.00 | 0.19 | 0.80 | 0.42 | 0.97 |
500 | 0.07 | 1.00 | 0.15 | 0.80 | 0.39 | 0.97 |
1000 | 0.05 | 1.00 | 0.08 | 0.85 | 0.37 | 1.00 |
2000 | 0.04 | 1.00 | 0.04 | 0.85 | 0.39 | 1.00 |
5000 | 0.20 | 1.00 | 0.23 | 0.92 | 0.60 | 1.00 |
10,000 | 0.58 | 1.00 | 0.48 | 0.94 | 0.64 | 1.00 |
20,000 | 0.89 | 1.00 | 0.84 | 1.00 | 0.83 | 1.00 |
30,000 | 1.00 | 1.00 | 0.99 | 1.00 | 0.92 | 1.00 |
Product | Reward function | AAR | AOHD | BS | REL |
---|---|---|---|---|---|
(a) Results based on DDMRP OH* | |||||
P1 | R1 | 0.15 | 15 | 0 | 8.96 |
R2 | 0.16 | 13 | 0 | 5.82 | |
R3 | 0.88 | 15 | 0 | 5.82 | |
DDMRP | N/A | 16 | 0 | 10.05 | |
(b) Results based on our proposed RL OH* | |||||
P1 | R1P | 0.69 | 33 | 0 | 27.31 |
R2P | 0.09 | 48 | 0 | 25.42 | |
R3P | 0.70 | 33 | 0 | 25.42 |
P2: Product 39,933
(a) Results based on DDMRP OH* | ||||||
---|---|---|---|---|---|---|
Episodes | P2R1 | P2R2 | P2R3 | |||
AAR | PBAR | AAR | PBAR | AAR | PBAR | |
100 | 0.09 | 0.32 | 0.01 | 0.06 | 0.07 | 0.29 |
200 | 0.11 | 0.49 | 0.01 | 0.07 | 0.11 | 0.34 |
500 | 0.36 | 0.49 | 0.01 | 0.07 | 0.24 | 0.70 |
1000 | 0.39 | 0.51 | 0.01 | 0.07 | 0.56 | 0.87 |
2000 | 0.43 | 0.54 | 0.01 | 0.10 | 0.79 | 0.92 |
5000 | 0.53 | 0.79 | 0.14 | 0.73 | 0.92 | 0.94 |
10,000 | 0.69 | 0.88 | 0.56 | 0.87 | 0.96 | 0.99 |
20,000 | 0.87 | 0.97 | 0.89 | 1.00 | 0.99 | 1.00 |
30,000 | 0.95 | 1.00 | 0.95 | 1.00 | 0.99 | 1.00 |
(b) Results based on our proposed RL OH* | ||||||
---|---|---|---|---|---|---|
Episodes | P2R1P | P2R2P | P2R3P | |||
AAR | PBAR | AAR | PBAR | AAR | PBAR | |
100 | 0.09 | 0.23 | 0.01 | 0.04 | 0.01 | 0.10 |
200 | 0.13 | 0.25 | 0.01 | 0.04 | 0.01 | 0.10 |
500 | 0.25 | 0.42 | 0.02 | 0.24 | 0.02 | 0.11 |
1000 | 0.44 | 0.56 | 0.20 | 0.48 | 0.03 | 0.18 |
2000 | 0.63 | 0.73 | 0.43 | 0.60 | 0.09 | 0.42 |
5000 | 0.83 | 0.88 | 0.62 | 0.70 | 0.59 | 0.81 |
10,000 | 0.92 | 0.88 | 0.75 | 0.89 | 0.83 | 0.91 |
20,000 | 0.97 | 1.00 | 0.89 | 0.96 | 0.92 | 0.98 |
30,000 | 0.99 | 1.00 | 0.95 | 0.98 | 1.00 | 1.00 |
Product | Reward function | AAR | AOHD | BS | REL |
---|---|---|---|---|---|
(a) Results based on DDMRP OH* | |||||
P2 | R1 | 0.71 | 24 | 1 | 1.47 |
R2 | 0.14 | 17 | 14 | 1.38 | |
R3 | 0.23 | 50 | 1 | 1.38 | |
DDMRP | N/A | 30 | 1 | 1.42 | |
(b) Results based on our proposed RL OH* | |||||
P2 | R1P | 0.71 | 20 | 4 | 1.38 |
R2P | 0.15 | 17 | 0 | 1.38 | |
R3P | 0.51 | 34 | 1 | 1.8 |
P3: Product 28,440
(a) Results based on DDMRP OH* | ||||||
---|---|---|---|---|---|---|
Episodes | P3R1 | P3R2 | P3R3 | |||
AAR | PBAR | AAR | PBAR | AAR | PBAR | |
100 | 0.09 | 0.23 | 0.01 | 0.04 | 0.04 | 0.39 |
200 | 0.13 | 0.25 | 0.01 | 0.04 | 0.04 | 0.39 |
500 | 0.25 | 0.42 | 0.02 | 0.24 | 0.04 | 0.39 |
1000 | 0.44 | 0.56 | 0.20 | 0.48 | 0.09 | 0.47 |
2000 | 0.63 | 0.73 | 0.43 | 0.60 | 0.27 | 0.72 |
5000 | 0.83 | 0.88 | 0.62 | 0.70 | 0.57 | 0.89 |
10,000 | 0.92 | 0.88 | 0.75 | 0.89 | 0.76 | 0.97 |
20,000 | 0.97 | 1.00 | 0.89 | 0.96 | 0.94 | 1.00 |
30,000 | 0.99 | 1.00 | 0.95 | 0.98 | 0.98 | 1.00 |
(b) Results based on our proposed RL OH* | ||||||
---|---|---|---|---|---|---|
Episodes | P3R1P | P3R2P | P3R3P | |||
AAR | PBAR | AAR | PBAR | AAR | PBAR | |
100 | 0.01 | 0.11 | 0.02 | 0.08 | 0.04 | 0.39 |
200 | 0.01 | 0.13 | 0.02 | 0.08 | 0.04 | 0.39 |
500 | 0.07 | 0.31 | 0.02 | 0.09 | 0.04 | 0.39 |
1000 | 0.20 | 0.34 | 0.05 | 0.39 | 0.09 | 0.47 |
2000 | 0.39 | 0.56 | 0.20 | 0.52 | 0.27 | 0.72 |
5000 | 0.70 | 0.69 | 0.50 | 0.63 | 0.57 | 0.89 |
10,000 | 0.84 | 0.83 | 0.73 | 0.84 | 0.76 | 0.97 |
20,000 | 0.95 | 0.90 | 0.92 | 0.91 | 0.94 | 1.00 |
30,000 | 1.00 | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 |
Product | Reward function | AAR | AOHD | BS | REL |
---|---|---|---|---|---|
(a) Results based on DDMRP OH* | |||||
P3 | R1 | 0.82 | 50 | 0 | 0.96 |
R2 | 0.48 | 67 | 0 | 0.86 | |
R3 | 0.48 | 67 | 0 | 0.86 | |
DDMRP | N/A | 68 | 0 | 2.13 | |
(b) Results based on our proposed RL OH* | |||||
P3 | R1P | 0.84 | 23 | 0 | 0.96 |
R2P | 0.26 | 17 | 0 | 1.05 | |
R3P | 0.48 | 67 | 0 | 0.86 |
P4: Product 43,387
(a) Results based on DDMRP OH* | ||||||
---|---|---|---|---|---|---|
Episodes | P4R1 | P4R2 | P4R3 | |||
AAR | PBAR | AAR | PBAR | AAR | PBAR | |
100 | 0.56 | 0.78 | 0.07 | 0.29 | 0.98 | 0.88 |
200 | 0.76 | 0.84 | 0.11 | 0.34 | 0.99 | 0.96 |
500 | 0.92 | 0.94 | 0.24 | 0.70 | 0.98 | 0.96 |
1000 | 0.98 | 0.98 | 0.56 | 0.87 | 0.97 | 0.96 |
2000 | 0.99 | 0.98 | 0.79 | 0.92 | 0.97 | 0.96 |
5000 | 1.00 | 0.98 | 0.92 | 0.94 | 0.96 | 0.98 |
10,000 | 1.00 | 0.98 | 0.96 | 0.99 | 0.96 | 0.98 |
20,000 | 0.99 | 1.00 | 0.99 | 1.00 | 0.96 | 1.00 |
30,000 | 0.99 | 1.00 | 0.99 | 1.00 | 0.96 | 1.00 |
(b) Results based on our proposed RL OH* | ||||||
---|---|---|---|---|---|---|
Episodes | P4R1P | P4R2P | P4R3P | |||
AAR | PBAR | AAR | PBAR | AAR | PBAR | |
100 | 0.56 | 0.78 | 0.64 | 0.64 | 0.73 | 0.80 |
200 | 0.76 | 0.84 | 0.80 | 0.70 | 0.88 | 0.82 |
500 | 0.92 | 0.94 | 0.91 | 0.85 | 0.97 | 0.88 |
1000 | 0.98 | 0.98 | 0.96 | 0.87 | 0.98 | 0.88 |
2000 | 0.99 | 0.98 | 0.98 | 0.90 | 0.99 | 0.92 |
5000 | 1.00 | 0.98 | 0.99 | 0.92 | 1.00 | 0.92 |
10,000 | 1.00 | 0.98 | 1.00 | 0.92 | 1.00 | 0.95 |
20,000 | 0.99 | 1.00 | 1.00 | 0.93 | 1.00 | 0.95 |
30,000 | 0.99 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Product | Model | AAR | AOHD | BS | REL |
---|---|---|---|---|---|
(a) Results based on DDMRP OH* | |||||
P4 | R1 | 0.85 | 4 | 5 | 1.25 |
R2 | 0.52 | 3 | 14 | 1.54 | |
R3 | 0.82 | 4 | 5 | 1.21 | |
DDMRP | N/A | 6 | 2 | 1.86 | |
(b) Results based on our proposed RL OH* | |||||
P4 | R1P | 0.66 | 1.5 | 0 | 1.44 |
R2P | 0.66 | 1.5 | 0 | 1.44 | |
R3P | 0.90 | 1.5 | 1 | 1.47 |
Episodes | PBAR | ||
---|---|---|---|
R1 | R2 | R3 | |
100 | 0.55 | 0.31 | 0.60 |
200 | 0.67 | 0.34 | 0.61 |
500 | 0.72 | 0.53 | 0.79 |
1000 | 0.77 | 0.67 | 0.88 |
2000 | 0.86 | 0.73 | 0.92 |
5000 | 0.93 | 0.89 | 0.93 |
10,000 | 0.95 | 0.93 | 0.98 |
20,000 | 1.00 | 1.00 | 1.00 |
30,000 | 1.00 | 1.00 | 1.00 |
P1 | P2 | P3 | P4 | |||||
---|---|---|---|---|---|---|---|---|
DDMRP´s OH* | Our OH* | DDMRP´s OH* | Our OH* | DDMRP´s OH* | Our OH* | DDMRP´s OH* | Our OH* | |
R1 | 0 | 0 | 1 | 4 | 0 | 0 | 5 | 0 |
R2 | 0 | 0 | 14 | 0 | 0 | 0 | 14 | 0 |
R3 | 0 | 0 | 1 | 1 | 0 | 0 | 5 | 1 |
DDMRP | 0 | 1 | 0 | 2 |
P1 | P2 | P3 | P4 | |||||
---|---|---|---|---|---|---|---|---|
DDMRP´s OH* | Our OH* | DDMRP´s OH* | Our OH* | DDMRP´s OH* | Our OH* | DDMRP´s OH* | Our OH* | |
R1 | 8.96 | 27.31 | 1.47 | 1.38 | 0.96 | 0.96 | 1.25 | 1.44 |
R2 | 5.82 | 25.42 | 1.38 | 1.38 | 0.86 | 1.05 | 1.54 | 1.44 |
R3 | 5.82 | 25.42 | 1.38 | 1.80 | 0.86 | 0.86 | 1.21 | 1.47 |
DDMRP | 10.05 | 1.42 | 2.13 | 1.86 |
Comparison with other works
-
Technique: the techniques used.
-
Bullwhip effect: it evaluates if the proposed model has a strategy to avoid distortions associated with the bullwhip effect.
-
Adaptability: it evaluates if the proposed method can be applied in demanding scenarios with different seasonal and trend behaviors.
Paper | Techniques | Bullwhip effect | Adaptability |
---|---|---|---|
Ours | DDMRP and Q Learning | Yes | High |
Giannoccaro and Pontrandolfo (2002) | Q Learning | No | Medium |
Karimi et al. (2017) | Q Learning | No | Low |
Kara and Dogan (2018) | Q-Learning y Sarsa | No | Medium |
Paraschos et al. (2020) | Q Learning | No | Medium |
Wang et al. (2020) | Economic Order Quantity (EOQ), Optimization | No | Low |
Abdelhalim et al. (2021) | Optimization techniques | No | Low |
Saputro et al. (2021) | Optimization Techniques | Yes | Medium |
Ran (2021) | Optimization techniques with Prediction methods | No | Low |
Aguilar et al. (2022) | Optimization techniques with Prediction methods | No | Low |
Thürer et al.(2022) | Optimization techniques | No | Medium |
Paper | Metrics | Average value |
---|---|---|
Ours | REL BS AAR PBAR | 4.62 2.30 0.92 0.98 |
Kara and Dogan (2018) | Average cost % | 5.06 |
Paraschos et al. (2020) | Profit Average rewards | 6.1 × 10–6 5.2 × 1013 |
Saputro et al. (2021) | Minimization of average annual total costs associated with supplier selection related cost, plant inventory cost, transportation cost, and imperfect quality-related cost | Minimize 94.2% |
Abdelhalim et al. (2021) | Minimization of Storing costs | Minimize 97.1% |
Ran (2021) | Average relative error | 13.5% |
Aguilar et al. (2022) | % error among the ideal inventory and the inventory obtained | 5.4% |
Thürer et al. (2022) | Service Level: fraction of the number of customer orders delivered on time | 94.3% |