Introduction
Related work
Recommender system
Knowledge distillation
Reinforcement learning in education
Preliminaries
Problem statement
General RL structure
-
State \({\mathcal {S}}\): At each design point t, the current state \(s_t \in {\mathcal {S}}\) denotes the preceding exercising history of a student as well as each exercise \((e, k, \textrm{y})\) is also considered.
-
Action \({\mathcal {A}}\): An action a is a vector. Based on state \(s_t\), taking action \(a_t \in {\mathcal {A}}\) is defined as select an exercise \(e_t\) for a certain student, after which agent enters a new state \(s_{t+1}\).
-
Reward \({\mathcal {R}}\): An immediate reward \(r_t\) is a scalar value which is obtained from the environment. When an exercise e is selected, we get the reward \(r(s_t, a_t)\) according to the feedback of various objectives.
-
Transitions \({\mathcal {P}}\): Once the test’s feedback is collected, the agent will enter the next state based on the transition probability \(p(s_{t+1} \mid s_t, a_t)\).
Remarks
Proposed methodologies
Model description
The teacher network
-
ER-LOAF [33]: It designs a hybrid many-objective framework to recommend suitable exercises that accord with learners’ mastery level and knowledge concept coverage.
-
HB-DeepCF [34]: It embeds the students and exercises into a low-dimensional continuous vector spaces via auto-encoder techniques, and then integrated both recommender component and auto-encoder component into a new hybrid recommendation model for adaptively recommending exercises to each student.
-
DKVMN-RL [35]: It first acquires students’ mastery level of skills using the improved Dynamic Key-Value Memory Network (DKVMN), and a Q-learning algorithm is then used to learn an exercise recommendation policy.
-
LSTMCQP [36]: It uses a personalized LSTM approach to trace and model students’ knowledge mastery states and further designs a “recommend non-mastered exercises” recommendation strategy.
-
KCP-ER [13]: It develops a knowledge concept prediction-based Top-N recommendation model for finding a set of recommendation lists which are the trade-off among accuracy, coverage, novelty, and diversity.
-
TP-GNN [37]: It applies graph neural network to Top-N recommendation task, in which the aggregate functions and attention mechanism are employed together to generate a high-quality ranking list.
The student network
Network structure of the proposed DDQN agents
Definition of reinforcement learning components
Training mechanism
Theoretical analysis of TERD
Experiment
-
RQ 1: Whether the student network plays a critical role in advancing the performance of Top-enhanced recommendation?
-
RQ 2: Comparing with existing well-known learning to rank model and RL-based exercise recommendation technique, how does our proposed TERD perform when K takes different values?
-
RQ 3: How does TERD perform in terms of the model efficiency compared to other state-of-the-art methods?
-
RQ 4: How do the key hyper-parameter settings affect TERD?
-
RQ 5: How about the interpretation of TERD on top-enhanced recommendation scenario?
Experimental settings
Dataset descriptions
Dataset | |||
---|---|---|---|
Statistics | ASSISTments0910 | Algebra0506 | Statics2011 |
No. of concepts | 110 | 436 | 87 |
No. of students | 4151 | 574 | 335 |
No. of exercises | 16,891 | 1084 | 300 |
No. of records | 325,637 | 607,025 | 45,002 |
Teacher and student settings
ASSISTments2009 | ASSISTments2015 | Statics2011 | ||
---|---|---|---|---|
ER-LOAF | \(\eta _{1}\) | 0.001 | 0.001 | 0.01 |
g | 0 | 0 | 0 | |
HB-DeepCF | \(\eta _{1}\) | 0.01 | 0.01 | 0.01 |
\(\lambda _{l}\) | 4 | 4 | 4 | |
DKVMN-RL | \(\eta _{1}\) | 0.001 | 0.001 | 0.001 |
D | 200 | 100 | 50 | |
\(\beta _{1}\) | 0.9 | 0.9 | 0.9 | |
LSTMCQP | \(\eta _{1}\) | 0.001 | 0.001 | 0.001 |
dr | 0.4 | 0.5 | 0.4 | |
KCP-ER | \(\eta _{1}\) | 0.001 | 0.001 | 0.01 |
g | 0 | 0 | 0 | |
J | 100 | 100 | 100 | |
c | 0.095 | 0.095 | 0.195 | |
TP-GNN | \(\eta _{1}\) | 0.0005 | 0.0001 | 0.001 |
n | 50 | 30 | 10 | |
p | 2 | 2 | 2 |
Evaluation protocols
Competitive models
TERD evaluation results and analysis (RQ1)
-
We notice that, compared to the metric results of recommendation on other two datasets, the improvements of three metrics on the Statics2011 are more significant. This indicates that our proposed TERD can achieve better performance in dense datasets. On the sparsest data set, ASSISTments0910, TERD framework also achieves significantly large performance improvement.
-
The performance of the pure KCP-ER method is the closest to ours among all the benchmark models, as it carefully designed four flexible optimization goals. Especially, the difficulty goal emphasized by the KCP-ER method shows advantages in promoting the performance of the algorithm.
-
The results also reveal that Top-N recommendation models based on cognitive diagnosis, such as KCP-ER and LSTMCQP, can achieve superior performances than the representative recommendation methods, such as ER-LOAF and HB-DeepCF. This is due to the representative recommendation methods focus only on the student-exercise explicit interaction information, while the recommendation methods of the cognitive diagnosis paradigm (i.e., KCP-ER and LSTMCQP) require to provide exercise that cohering with student’s proficiency level.
-
All above evidences indicate that TERD can generate excellent Top-N recommendation by making it flexible to replace the teacher network without redesigning the strategy. This is the strongest validation of the advantage of being fully adaptive.
Precision | MAP | NDCG | Avg | |||||||
---|---|---|---|---|---|---|---|---|---|---|
@2 | @5 | @10 | @2 | @5 | @10 | @2 | @5 | @10 | ||
ER-LOAF | 0.3731 | 0.3654 | 0.3645 | 0.3310 | 0.2691 | 0.2342 | 0.3745 | 0.5345 | 0.6049 | |
TERD | 0.5084 | 0.4491 | 0.4352 | 0.4556 | 0.3515 | 0.3016 | 0.5083 | 0.6897 | 0.7730 | |
Improve(%) | 36.26 | 22.91 | 19.40 | 37.64 | 30.62 | 28.78 | 35.73 | 29.04 | 27.79 | 29.7967 |
HB-DeepCF | 0.3359 | 0.3400 | 0.3478 | 0.2901 | 0.2382 | 0.2118 | 0.3365 | 0.4895 | 0.5596 | |
TERD | 0.4864 | 0.4374 | 0.4211 | 0.4254 | 0.3314 | 0.2836 | 0.4851 | 0.6659 | 0.7430 | |
Improve(%) | 44.81 | 28.65 | 21.08 | 46.64 | 39.13 | 33.90 | 44.16 | 36.04 | 32.77 | 36.3533 |
DKVMN-RL | 0.3588 | 0.3726 | 0.3763 | 0.3214 | 0.2769 | 0.2452 | 0.3608 | 0.5319 | 0.6052 | |
TERD | 0.4921 | 0.4565 | 0.4402 | 0.4440 | 0.3568 | 0.3048 | 0.4964 | 0.69 | 0.7709 | |
Improve(%) | 37.15 | 22.52 | 16.98 | 38.15 | 28.86 | 24.31 | 37.58 | 29.72 | 27.38 | 29.1833 |
LSTMCQP | 0.3626 | 0.3841 | 0.3964 | 0.3180 | 0.2836 | 0.2611 | 0.3608 | 0.5389 | 0.6204 | |
TERD | 0.4859 | 0.4573 | 0.4537 | 0.4284 | 0.3522 | 0.3134 | 0.4828 | 0.6782 | 0.7679 | |
Improve(%) | 34.00 | 19.06 | 14.46 | 34.72 | 24.19 | 20.03 | 33.81 | 25.85 | 23.77 | 25.5433 |
KCP-ER | 0.4288 | 0.4136 | 0.4053 | 0.3867 | 0.3231 | 0.2800 | 0.4296 | 0.6104 | 0.6863 | |
TERD | 0.5321 | 0.4827 | 0.4575 | 0.4779 | 0.3864 | 0.3283 | 0.5311 | 0.7327 | 0.8138 | |
Improve(%) | 24.09 | 16.71 | 12.88 | 23.58 | 19.59 | 17.25 | 23.63 | 20.04 | 18.58 | 19.5944 |
TP-GNN | 0.3307 | 0.3531 | 0.3699 | 0.2912 | 0.255 | 0.2324 | 0.3286 | 0.4932 | 0.5707 | |
TERD | 0.4638 | 0.4507 | 0.4428 | 0.4128 | 0.3413 | 0.2990 | 0.4633 | 0.6605 | 0.7439 | |
Improve(%) | 40.25 | 27.64 | 19.71 | 41.76 | 33.84 | 28.66 | 40.99 | 33.92 | 30.35 | 33.0133 |
Precision | MAP | NDCG | Avg | |||||||
---|---|---|---|---|---|---|---|---|---|---|
@2 | @5 | @10 | @2 | @5 | @10 | @2 | @5 | @10 | ||
ER-LOAF | 0.2018 | 0.2128 | 0.2145 | 0.1686 | 0.1319 | 0.1143 | 0.2018 | 0.2958 | 0.3315 | |
TERD | 0.3349 | 0.2839 | 0.2696 | 0.2815 | 0.1954 | 0.1594 | 0.3395 | 0.4473 | 0.4909 | |
Improve(%) | 65.96 | 33.41 | 25.69 | 66.96 | 48.14 | 39.46 | 68.24 | 51.22 | 48.08 | 49.8775 |
HB-DeepCF | 0.2099 | 0.2154 | 0.2264 | 0.176 | 0.1354 | 0.1194 | 0.2096 | 0.3031 | 0.3465 | |
TERD | 0.3383 | 0.3012 | 0.2874 | 0.2821 | 0.2028 | 0.1683 | 0.3344 | 0.4533 | 0.4996 | |
Improve(%) | 61.17 | 39.83 | 26.94 | 60.28 | 49.78 | 40.95 | 59.54 | 49.55 | 44.18 | 48.0244 |
DKVMN-RL | 0.2523 | 0.2644 | 0.2675 | 0.2076 | 0.1646 | 0.143 | 0.2507 | 0.3710 | 0.4185 | |
TERD | 0.3601 | 0.3154 | 0.301 | 0.2947 | 0.2119 | 0.1743 | 0.3533 | 0.4795 | 0.5301 | |
Improve(%) | 42.73 | 19.29 | 12.52 | 41.96 | 28.74 | 21.89 | 40.93 | 29.25 | 26.67 | 29.3311 |
LSTMCQP | 0.2764 | 0.2631 | 0.2692 | 0.2311 | 0.1709 | 0.1466 | 0.2746 | 0.3867 | 0.4390 | |
TERD | 0.3704 | 0.3057 | 0.3050 | 0.3154 | 0.2147 | 0.1771 | 0.3686 | 0.4845 | 0.5450 | |
Improve(%) | 34.01 | 16.19 | 13.30 | 36.48 | 25.63 | 20.80 | 34.23 | 25.29 | 24.15 | 25.5644 |
KCP-ER | 0.2798 | 0.2768 | 0.2715 | 0.2317 | 0.1795 | 0.1497 | 0.2793 | 0.4006 | 0.4470 | |
TERD | 0.3727 | 0.3314 | 0.3057 | 0.3096 | 0.2254 | 0.1814 | 0.3709 | 0.5056 | 0.5520 | |
Improve(%) | 33.20 | 19.73 | 12.60 | 33.62 | 25.57 | 21.18 | 32.80 | 26.21 | 23.49 | 25.3778 |
TP-GNN | 0.2241 | 0.2160 | 0.2323 | 0.1950 | 0.1401 | 0.1224 | 0.2269 | 0.3206 | 0.3715 | |
TERD | 0.3402 | 0.2724 | 0.2809 | 0.2956 | 0.1901 | 0.1583 | 0.3421 | 0.4429 | 0.5054 | |
Improve(%) | 51.81 | 26.11 | 20.92 | 51.81 | 35.69 | 29.33 | 50.77 | 38.15 | 36.04 | 37.8478 |
Precision | MAP | NDCG | Avg | |||||||
---|---|---|---|---|---|---|---|---|---|---|
@2 | @5 | @10 | @2 | @5 | @10 | @2 | @5 | @10 | ||
ER-LOAF | 0.2169 | 0.2024 | 0.2316 | 0.1884 | 0.1358 | 0.1191 | 0.2269 | 0.3080 | 0.3620 | |
TERD | 0.4504 | 0.3752 | 0.3224 | 0.3934 | 0.2795 | 0.2104 | 0.4483 | 0.5888 | 0.6257 | |
Improve(%) | 107.65 | 85.38 | 39.21 | 108.81 | 105.82 | 76.66 | 97.58 | 91.17 | 72.85 | 87.2367 |
HB-DeepCF | 0.2140 | 0.2020 | 0.2437 | 0.1863 | 0.136 | 0.1271 | 0.224 | 0.2989 | 0.3578 | |
TERD | 0.4594 | 0.3968 | 0.3403 | 0.405 | 0.3039 | 0.2321 | 0.4607 | 0.6089 | 0.6416 | |
Improve(%) | 114.67 | 96.44 | 39.64 | 117.39 | 123.46 | 82.61 | 105.67 | 103.71 | 79.32 | 95.8788 |
DKVMN-RL | 0.1937 | 0.2082 | 0.3055 | 0.1522 | 0.1345 | 0.1565 | 0.1875 | 0.2628 | 0.3516 | |
TERD | 0.4483 | 0.3890 | 0.3421 | 0.3921 | 0.3016 | 0.2399 | 0.4496 | 0.5857 | 0.6135 | |
Improve(%) | 131.44 | 86.84 | 11.98 | 157.62 | 124.24 | 53.29 | 139.79 | 122.87 | 74.49 | 100.2844 |
LSTMCQP | 0.2224 | 0.2980 | 0.3371 | 0.1811 | 0.1782 | 0.1813 | 0.2203 | 0.3701 | 0.4418 | |
TERD | 0.4522 | 0.3980 | 0.3625 | 0.3925 | 0.2977 | 0.2268 | 0.4505 | 0.6076 | 0.6412 | |
Improve(%) | 103.33 | 33.56 | 7.50 | 116.73 | 67.06 | 25.10 | 104.49 | 64.17 | 45.13 | 63.0078 |
KCP-ER | 0.3474 | 0.3347 | 0.3161 | 0.3042 | 0.2361 | 0.1926 | 0.3537 | 0.4961 | 0.5423 | |
TERD | 0.4743 | 0.395 | 0.3525 | 0.4191 | 0.2978 | 0.2340 | 0.4768 | 0.6245 | 0.6727 | |
Improve(%) | 36.53 | 18.02 | 11.52 | 37.77 | 26.13 | 21.50 | 34.80 | 25.88 | 24.05 | 26.2445 |
TP-GNN | 0.2165 | 0.2369 | 0.2925 | 0.1849 | 0.1565 | 0.1632 | 0.2238 | 0.3156 | 0.3795 | |
TERD | 0.4425 | 0.3610 | 0.3208 | 0.3975 | 0.2850 | 0.2322 | 0.4542 | 0.5737 | 0.6002 | |
Improve(%) | 104.39 | 52.38 | 9.68 | 114.98 | 82.11 | 42.28 | 102.95 | 81.78 | 58.16 | 72.0789 |
Comparative results (RQ2)
Precision | MAP | NDCG | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@2 | @5 | @10 | @2 | @5 | @10 | @2 | @5 | @10 | ||
ASSISTments2009 | SQL-Rank | 0.3896 | 0.3856 | 0.3816 | 0.3514 | 0.2909 | 0.2502 | 0.3929 | 0.5640 | 0.6365 |
DeepRank | 0.4506 | 0.4185 | 0.4087 | 0.4082 | 0.3300 | 0.2844 | 0.4517 | 0.6290 | 0.7068 | |
DQN | 0.3562 | 0.3538 | 0.3530 | 0.3042 | 0.2482 | 0.2179 | 0.3543 | 0.5110 | 0.5796 | |
MOOCERS | 0.3632 | 0.3592 | 0.3517 | 0.3046 | 0.2511 | 0.2229 | 0.3497 | 0.5119 | 0.5834 | |
DDQN | 0.3877 | 0.3751 | 0.3680 | 0.3427 | 0.2771 | 0.2369 | 0.3894 | 0.5532 | 0.6223 | |
DDPG | 0.3800 | 0.3717 | 0.3636 | 0.3232 | 0.2615 | 0.2365 | 0.3806 | 0.5494 | 0.6066 | |
Algebra0506 | SQL-Rank | 0.2741 | 0.2709 | 0.2587 | 0.2322 | 0.1786 | 0.1455 | 0.2769 | 0.3957 | 0.4356 |
DeepRank | 0.3211 | 0.2943 | 0.2885 | 0.2706 | 0.2012 | 0.1678 | 0.3201 | 0.4427 | 0.4947 | |
DQN | 0.2477 | 0.2489 | 0.2433 | 0.2093 | 0.1610 | 0.1312 | 0.2482 | 0.3582 | 0.3991 | |
MOOCERS | 0.2489 | 0.2484 | 0.2472 | 0.2099 | 0.1589 | 0.1318 | 0.2470 | 0.3567 | 0.3997 | |
DDQN | 0.3234 | 0.2787 | 0.2655 | 0.2781 | 0.1904 | 0.1515 | 0.3265 | 0.4360 | 0.4818 | |
DDPG | 0.2706 | 0.2557 | 0.2575 | 0.2242 | 0.1624 | 0.1381 | 0.2686 | 0.3764 | 0.4251 | |
Statics2011 | SQL-Rank | 0.3235 | 0.3094 | 0.2878 | 0.2858 | 0.2196 | 0.1751 | 0.3277 | 0.4565 | 0.4939 |
DeepRank | 0.4191 | 0.3796 | 0.3470 | 0.3722 | 0.2835 | 0.2271 | 0.4216 | 0.5759 | 0.6248 | |
DQN | 0.3456 | 0.3186 | 0.3029 | 0.2969 | 0.2221 | 0.1815 | 0.3448 | 0.4738 | 0.5215 | |
MOOCERS | 0.3180 | 0.3112 | 0.2672 | 0.2270 | 0.1943 | 0.1513 | 0.2868 | 0.4220 | 0.4458 | |
DDQN | 0.4283 | 0.3737 | 0.3172 | 0.3888 | 0.2820 | 0.2082 | 0.4362 | 0.5829 | 0.6148 | |
DDPG | 0.3291 | 0.3103 | 0.2917 | 0.2719 | 0.2183 | 0.1993 | 0.3367 | 0.4631 | 0.5179 |
Comparisons of efficiency (RQ3)
Model | Phase | ASSISTments0910 | Algebra0506 | Statics2011 | |||
---|---|---|---|---|---|---|---|
Time(s) | #Parameters | Time(s) | #Parameters | Time(s) | #Parameters | ||
SQL-Rank | Train | 552s | 0.2195M | 65s | 0.0329M | 35s | 0.0214M |
Test | 31s | 4.3s | 2.9s | ||||
DeepRank | Train | 698s | 0.2679M | 81s | 0.0542M | 50s | 0.0496M |
Test | 36s | 5.9s | 4.6s | ||||
DQN | Train | 493s | 1.3134M | 67s | 0.1228M | 41s | 0.0308M |
Test | 25s | 5.3s | 3.6s | ||||
MOOCERS | Train | 764s | 1.3265M | 83s | 0.1358M | 43s | 0.0438M |
Test | 40s | 5.6s | 4.1s | ||||
DDQN | Train | 513s | 2.6269M | 89s | 0.2455M | 43s | 0.0616M |
Test | 29s | 5.6s | 3.8s | ||||
DDPG | Train | 493s | 1.3134M | 67s | 0.1228M | 41s | 0.0308M |
Test | 25s | 5.3s | 3.6s | ||||
TERD | Train | 468s | 0.1324M | 49s | 0.0471M | 31s | 0.0186M |
Test | 13s | 3.2s | 2.3s |