Introduction
-
Design a novel PGIS algorithm and calculate the weights of paths.
-
Agents can deal with multi-entities and multi-relations and switch paths intelligently.
-
Our system considers long rational paths, which ensures better recommendation performance and interpretability.
-
Our method has a definite terminal goal that enables the agent to learn the optimal path.
Related work
Deep recommendation algorithms
Reinforcement learning for recommendation systems
Knowledge graph applying in recommendation
Problem formulation
Methodology
Framework
Knowledge graph embedding
Notations | Descriptions |
---|---|
G | Knowledge graph |
I | Set of items |
U | Set of users |
E | Set of entities |
R | Set of relations |
S | Set of states |
u | User |
s | State |
a | Action |
\({\bar{R}}\) | rewards |
P | Transition probability |
A | Set of actions |
Deep reinforcement learning
Path-guided intelligent switching
Intelligent switching
The path-switching based on weight
Experiments
Experimental settings
Dataset description
Dataset | Users | Items | Interaction | Entities | Relations |
---|---|---|---|---|---|
Movie | 2720 | 63,300 | 1,066,247 | 62,600 | 123,260 |
Book | 1633 | 48,800 | 102,386 | 48,100 | 4230 |
KKBOX | 2000 | 64,555 | 142,004 | 55,360 | 23,340 |
Evaluation protocols
-
NDCG The most frequently used list evaluation measure that takes into account the position of correctly recommended items. NDCG is averaged across all the testing users.
-
HR Hit Radio, which is the percentage of users that have at least one correctly recommended item in the list.
-
Precision Percentage of correctly recommended items in a user’s recommendation list, averaged across all testing users.
-
Recall Percentage of purchased items that are really recommended in the list. Recall is averaged across all the testing users.
Baselines
-
User-based CF method This method predicts the rating a user will give an item based on the aggregation of their ratings on similar items.
-
LFM [40] Latent factor model (LFM) is a state-of-the-art feature-based factorization model widely used in recommendation systems. In our test, we used the user’s rating of each movie as the input feature.
-
NCF [6] Neural collaborative Filtering is an extended CF framework. It has been proven to be a powerful DNN-based CF and consists of a generalized matrix factorization layer (GMF) and multi-layer perceptron (MLP). Both GMF and MLP layers are fed the model with randomly initialized user-item latent vectors. All NCF network parameters are learned based on observed user-item interactions.
-
DeepCF [19] Deep Collaborative Filtering is a deep version of CF. This approach uses CF methods employing representation learning and a matching function. It uses an MLP to learn the complex matching and low-rank relations between user-item interaction functions.
-
DeepWalk [41] The DeepWalk algorithm consists of two main components. First, it uses random walk as generator, and second, an update procedure. The random walk takes a graph and samples a random vertex as the root of the random walk. A walk sample from the neighbors of the last vertex visited until the maximum length is reached.
-
One-Hot One-Hot Encoding, also known as one bit effective encoding, mainly uses n-bit status register to encode N states. In the experiment, we record the item that the user has seen as 1, otherwise it is 0.
-
MetaPath [38] The metapah2vec is to maximize preserving both the structures and semantics of a heterogeneous network, which formalizes the network representation learning problem, the objective is to learn to the low-dimensional and latent embeddings for multiple-type of nodes.
-
DQN [39] Deep Q-learning network model the complex dynamic user–item interactions and user preference. DQN method considers current reward and future reward to pursuit better recommendation.
-
ECKG [35] In this paper, the propagation path is constructed by meta-path, and the internode influence scores among paths are calculated by Self-Attention Network (SAN), and the series of the whole piece is completed by “interpretability”.
-
MSAN [36] In this paper, knowledge graphs are used to explore the relational semantics of entity granularity behind user-item interactions. Based on user–item interaction and object–entity connection, multiple meta-paths from user to entity are constructed.
-
PGPR Xian et al. [10] proposed policy-guided path reasoning (PGPR) for interpretability recommendation in a knowledge graph. However, they lack predefined target and have a too large action space, so we compared our work with PGPR.
-
PDN [14] Path-based Deep Network (PDN) incorporates both personalization and diversity to enhance matching performance. This model aggregates the relevance weights of the related two hop paths.
Parameter settings
Dataset | Methods | NDCG | HR | Prec | Recall |
---|---|---|---|---|---|
Movie | User-based CF | 0.2967 | 0.2395 | 0.2341 | 0.8670 |
LFM | 0.3056 | 0.2402 | 0.2335 | 0.8730 | |
DeepWalk | 0.3191 | 0.2524 | 0.2429 | 0.8828 | |
DQN | 0.3633 | 0.2663 | 0.2897 | 0.8870 | |
One-Hot encoding | 0.3334 | 0.2785 | 0.2669 | 0.8947 | |
MetaPath | 0.3436 | 0.2846 | 0.2745 | 0.8993 | |
NCF | 0.3498 | 0.2907 | 0.2787 | 0.9057 | |
DeepCF | 0.3535 | 0.2932 | 0.2881 | 0.9089 | |
ECKG | 0.3503 | 0.2947 | 0.2987 | 0.9098 | |
MSAN | 0.3605 | 0.2985 | 0.2956 | 0.9109 | |
PGPR | 0.3647 | 0.3029 | 0.2998 | 0.9139 | |
PDN | 0.3698 | 0.3090 | 0.3015 | 0.9193 | |
DKRL | 0.3712 | 0.3473 | 0.3171 | 0.9221 | |
Imp | 3.7% | 12.39% | 5.17% | 3.01% | |
Book | User-based CF | 0.0520 | 0.04919 | 0.0351 | 0.8345 |
LFM | 0.0533 | 0.0525 | 0.0344 | 0.8369 | |
DeepWalk | 0.0667 | 0.0550 | 0.0413 | 0.8392 | |
DQN | 0.1162 | 0.1024 | 0.0988 | 0.8478 | |
One-Hot encoding | 0.0820 | 0.0796 | 0.0627 | 0.8414 | |
MetaPath | 0.0958 | 0.0948 | 0.0741 | 0.8434 | |
NCF | 0.1032 | 0.0998 | 0.0820 | 0.8450 | |
DeepCF | 0.1151 | 0.1015 | 0.0939 | 0.8463 | |
ECKG | 0.1204 | 0.1018 | 0.0984 | 0.8466 | |
MSAN | 0.1253 | 0.1028 | 0.1001 | 0.8478 | |
PGPR | 0.1276 | 0.1039 | 0.1022 | 0.8491 | |
PDN | 0.1288 | 0.1046 | 0.1076 | 0.8605 | |
DKRL | 0.1297 | 0.1358 | 0.1093 | 0.8967 | |
Imp | 9.3% | 20.8% | 3.57% | 4.2% | |
KKBOX | User-based CF | 0.2227 | 0.2132 | 0.1815 | 0.8957 |
LFM | 0.2301 | 0.2153 | 0.1926 | 0.9090 | |
DeepWalk | 0.2341 | 0.2166 | 0.1959 | 0.9122 | |
DQN | 0.2488 | 0.2234 | 0.2065 | 0.9174 | |
One-Hot encoding | 0.2369 | 0.2178 | 0.1981 | 0.9256 | |
MetaPath | 0.2402 | 0.2194 | 0.1998 | 0.9288 | |
NCF | 0.2438 | 0.2194 | 0.2026 | 0.9317 | |
DeepCF | 0.2469 | 0.2208 | 0.2048 | 0.9348 | |
ECKG | 0.2477 | 0.2231 | 0.2057 | 0.9371 | |
MSAN | 0.2485 | 0.2244 | 0.2066 | 0.9382 | |
PGPR | 0.2500 | 0.2254 | 0.2078 | 0.9397 | |
PDN | 0.2514 | 0.2272 | 0.2096 | 0.9418 | |
DKRL | 0.2526 | 0.2587 | 0.2109 | 0.9736 | |
Imp | 4.7% | 13.86% | 6.2% | 3.37% |
Performance comparison
-
User-based CF performed the second worse performance among all the methods because user-based CF is a traditional CF-based method with low efficiency.
-
DeepCF and NCF had better results than the user-based CF and LFM methods, which suggests that deep models are effective in capturing non-linear relations and improved recommendation performance.
-
One-Hot Encoding has the worst performance among all the methods, the embedding of MetaPath method had better performance than One-Hot Encoding, because the data in One-Hot Encoding is too sparse, so the performance is very low.
-
DQN had better performance than MetaPath and DeepWalk, because the next state has the maximum value in DQN model, while DeepWalk only random selects the next state. MethPath uses the embedding of entity to calculate the similarity between entities and recommend items for user, so DQN model has more accurate than MetaPath.
-
DQN performs slightly better than DeepCF and NCF, indicating that deep reinforcement learning can handle complex dynamic user-item interactions compared to neural networks.
-
PGPR performs better than DQN, because PGPR using knowledge graph in reinforcement learning, so the performance is better.
-
Finally, DKRL’s advantage over user-based CF and LFM shows that knowledge graph embedding performed better than similar users’ rating and factorization models. LFM and CF do not consider the context information between nodes, and the information between nodes is not shared. However, our model makes full use of the rich contextual semantic information of the knowledge graph, and it is easy to find the similarity between nodes using semantic information. Therefore, the performance of recommendation is improved. DKRL’s advantage over DeepWalk, MetaPath, and DQN shows that applying knowledge graphs into reinforcement learning improved recommendation accuracy. DeepWalk and MetaPath use node embedding to make recommendations and randomly select the next node. Without the interaction information with the external environment, the expected maximum reward of the node cannot be obtained. However, DQN only considers reinforcement learning as recommendation, without external knowledge as auxiliary information, so the recommendation performance is relatively low. Our model fully combines knowledge graph with reinforcement learning. As an external feedback environment, knowledge graph interacts with agents and gives information feedback. Therefore, agents can obtain the best recommendation performance in the environment of constant trial and error. DKRL’s better than PGPR, because in PGPR, there is no pre-defined targeted items, so some paths may not exist, and the recommended result is poor compared to DKRL. Our model has pre-defined target, each time the agent explores the path, it will explore the appropriate path. PDN has only two hop paths, while DKRL has a multi-hop path with a length of 10 hops. Therefore, the performance of DKRL is higher than that of PDN. This also shows that the recommended performance of long path (less than or equal to 10 hops) is greater than that of short path.
Ablation study
Influence of action sizes
Dataset | Movie | Book | ||
---|---|---|---|---|
Action size | NDCG@10 | Prec@10 | NDCG@10 | Prec@10 |
PGPR | 0.3698 | 0.3015 | 0.1288 | 0.1076 |
2-actions | 0.3712 | 0.3171 | 0.1297 | 0.1093 |
3-actions | 0.3798 | 0.3215 | 0.1324 | 0.1152 |
4-actions | 0.3842 | 0.3327 | 0.1442 | 0.1231 |
10-actions | 0.3911 | 0.3444 | 0.1521 | 0.1288 |
Influence of long rational paths
Dataset | Movie | Book | ||||
---|---|---|---|---|---|---|
Action size | NDCG@10 | Prec@10 | PLR | NDCG@10 | Prec@10 | PLR |
2-actions | 0.3712 | 0.3171 | (2,130) | 0.1297 | 0.1093 | (2,140) |
3-actions | 0.3798 | 0.3215 | (2,79) | 0.1324 | 0.1152 | (2,132) |
4-actions | 0.3842 | 0.3327 | (2,59) | 0.1442 | 0.1231 | (2,42) |
10-actions | 0.3911 | 0.3444 | (2,25) | 0.1521 | 0.1288 | (2,30) |
Dataset | Movie | Book | ||||
---|---|---|---|---|---|---|
Action size | NDCG@10 | Prec@10 | PLR | NDCG@10 | Prec@10 | PLR |
2-actions | 0.3655 | 0.3679 | (2,145) | 0.1210 | 0.1004 | (2,155) |
3-actions | 0.3702 | 0.3110 | (2,102) | 0.1226 | 0.1111 | (2,140) |
4-actions | 0.3749 | 0.3207 | (2,78) | 0.1340 | 0.1198 | (2,48) |
10-actions | 0.3816 | 0.3355 | (2,37) | 0.1432 | 0.1206 | (2,40) |