nach oben

Complex & Intelligent Systems

Erschienen in:

Open Access 29.06.2023 | Original Article

Path-guided intelligent switching over knowledge graphs with deep reinforcement learning for recommendation

verfasst von: Shaohua Tao, Runhe Qiu, Yan Cao, Guoqing Xue, Yuan Ping

Erschienen in: Complex & Intelligent Systems | Ausgabe 6/2023

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

Online recommendation systems process large amounts of information to make personalized recommendations. There has been some progress in research on incorporating knowledge graphs in reinforcement learning for recommendation; however, some challenges still remain. First, in these approaches, an agent cannot switch paths intelligently, because of which, the agent cannot cope with multi-entities and multi-relations in knowledge graphs. Second, these methods do not have predefined targets and thus cannot discover items that are closely related to user-interacted items and latent rich semantic relationships. Third, contemporary methods do not consider long rational paths in knowledge graphs. To address these problems, we propose a deep knowledge reinforcement learning (DKRL) framework, in which path-guided intelligent switching was implemented over knowledge graphs incorporating reinforcement learning; this model integrates predefined target and long logic paths over knowledge graphs for recommendation systems. Specifically, the designed novel path-based intelligent switching algorithm with predefined target enables an agent to switch paths intelligently among multi-entities and multi-relations over knowledge graphs. In addition, the weight of each path is calculated, and the agent switches paths between multiple entities according to path weights. Furthermore, the long logic path has better recommendation performance and interpretability. Extensive experiments with actual data demonstrate that our work improves upon existing methods.The experimental results indicated that DKRL improved the baselines of NDCG@10 by 3.7%, 9.3%, and 4.7%; of HR@10 by 12.39%, 20.8%, and 13.86%; of Prec@10 by 5.17%, 3.57%, 6.2%; of Recall@10 by 3.01%, 4.2%, and 3.37%. The DKRL model achieved more effective recommendation performance using several large benchmark data sets compared with other advanced methods.

Supplementary file 1 (docx 92 KB)

Supplementary Information

The online version contains supplementary material available at https://doi.org/10.1007/s40747-023-01124-1.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

With explosive growth of online information, users have numerous choices. The extensive online content and services can overwhelm users. Personalized online service recommendation can guide users in discovering services or products more suited to their personal interests. Researchers have proposed several approaches to optimize online personalized recommendations, such as collaborative filtering (CF) [1, 2], matrix factorization (MF) [3], and MF-based models [4]. Recently, new state-of-the-art methods, such as deep learning models [5, 6], knowledge graphs, reinforcement learning [7, 8], and reinforcement learning incorporating knowledge graphs, have become popular due to their abilities to model complex user-item interactions and provide recommendations.

However, these newer methods do not effectively address three challenges. First, although existing recommendation methods combine knowledge graphs with deep reinforcement learning, they do not have predefined targets, and thus, cannot discover the items that are most similar to the user-interacted items and cannot fully discover latent rich semantic relationships among entities. Knowledge graphs are heterogeneous graphs with rich semantic relations in context, and their entities are connected by attributes. Therefore, for any item, in a knowledge graph, there will be multiple paths to a predefined target. In particular, the recommended items should be consistent with the predefined targets, so, it is expected that the recommendation guides to predefined target. For example, Fig. 1 shows a user who watched a movie titled The Wandering Earth; its attributes are director Fan Guo, actors Jin Wu, Xiaoran Chu, and Jingmai Zhao, and genre fiction. After a while, the user decides to watch Avatar; so from The Wandering Earth to Avatar, there is a path (dashed red line) linking relevance movies from the starting node to target. Based on their multiple attributes, there are strong connections among these items, and so from the starting node to the predefined target, those items that are most similar to user-interacted items can be explored, and rich semantic relations can be discovered between the entities. Many methods [9, 10] do not consider predefined targets, and so, rich relationships between entities cannot be discovered; moreover, because of the uncertainty of target item, some paths may not existing.

Second, in existing recommendation methods, intelligent path-switching between entities is challenging, especially when there are multi-entities and multi-relations over knowledge graphs. Some contemporary studies on the integration of reinforcement learning and knowledge graphs include reasoning path [10], knowledge-guided reinforcement for sequential recommendation proposed by Wang [11], knowledge graph-enhanced reinforcement learning proposed by Zhou [12]. However, these methods cannot deal with large action spaces in reinforcement learning, and so, they use truncated strategies [10, 13] to decrease the number of entities, which may lead to loss of important attributes of a knowledge graph. Furthermore, agents do not intelligently switch paths among entities, which makes it impossible to explore all the paths in large-scale knowledge graphs.

Third, contemporary methods do not consider the length of multi-hop paths, and there are no long rational paths for analyzing knowledge graphs. Many studies on recommendation with knowledge graphs only consider short path, such as the length of 2 hops and 3 hops, without considering the recommendation performance of the long path. For example, li [14, 15] and Xia [16] considered path lengths of 2 hops and 3 hops, respectively. However, few studies have considered recommendation based on long logical path. Therefore, analyzing the impact of long logical paths on recommendation performance in knowledge graphs is a crucial research topic.

In view of these three challenges and inspired by the wide successful application of knowledge graphs and reinforcement learning, we propose a framework called deep knowledge reinforcement learning (DKRL), which combines deep reinforcement learning and knowledge graphs with predefine target and long rational paths to improve recommendation performance. Several existing many approaches [10‐12, 17] cannot address multi-entities and multi-relations in real-world knowledge graphs. In our model, we design a novel long path-based intelligent switching (PBIS) algorithm with predefine target, in which weights are assigned to paths, which enables an agent to switch intelligently among multi-entities and multi-relations without truncating attributes of knowledge graphs, thus preserving important properties of attributes. Our approach has three advantages. First, consider a predefined target with a long path of multiple hops from the start node to the target node in our model. This enables discovery of nodes that are most similar to user-interacted items and provide users with a variety of recommended items. Second, the weight of each path is calculated and the path switch is carried out according to the weight to realize the intelligent switch on multi-entities and multi-relationships in knowledge graph. Third, we consider long rational paths over knowledge graphs, which improve the recommendation performance and enables better interpretability than short paths. Furthermore, we use reinforcement learning [18] to better model the dynamic nature of items and personal user preferences. In addition, rather than considering only the user’s feedback rating [6, 19], we consider multiple user-item interactions as feedback.

Study contributions are highlighted as follows:

Design a novel PGIS algorithm and calculate the weights of paths.
Agents can deal with multi-entities and multi-relations and switch paths intelligently.
Our system considers long rational paths, which ensures better recommendation performance and interpretability.
Our method has a definite terminal goal that enables the agent to learn the optimal path.

The paper is arranged as follows: the first section is the abstract, the second section is related work, the third section shows the relevant research methods, the fourth section is the experiment, and the last part is the conclusion.

Deep recommendation algorithms

Traditional recommendation most commonly use CF, MF, and MF-based methods. Collaborative filtering methods typical use historical ratings from the current user or similar users to make prediction. MF and MF-based models typically use the latent factor model for recommendations. Recent improved versions of CF and MF-based models employ deep neural networks [6, 19]. Some researchers have used autoencoders in deep neural networks to extract short and dense features from the raw features of users and items [20]. DeepFM [21] combines factorization machines and deep learning to generate recommendations. Neural Collaborative Filtering [6] uses a neural architecture to model the interaction between users and items. Our algorithm differs from these by considering the dynamic nature of items and users and modeling current and cumulative future rewards.

Reinforcement learning for recommendation systems

Reinforcement learning has attracted substantial attention and achieved successful application in many scenarios. A series of model using the MDP have been proposed for recommendation tasks. MDP-based methods model the recommendation procedure as a sequential interaction between users and items [22]?. In reality, practical recommendation systems record millions of discrete actions. This leads to reinforcement learning-based models becoming inefficient due to an inability to scale to such large datasets. Other methods have achieved better results [7, 8, 23, 24], but these approaches are hard to apply in practice without incorporating the knowledge base. Reinforcement learning using knowledge graphs setting has also been explored for tasks such as question answering (QA) [25‐27] and explaining recommendation [10]. Our method differs from these methods [10‐12] in its proposal of an intelligent switching path among multi-entities and multi-relations that enables agents to explore paths effectively over knowledge graphs.

Knowledge graph applying in recommendation

Knowledge graphs are composed of entities and relations and have been used extensively in many applications [28‐30]. Knowledge graphs offer rich semantic and logical relation information with a goal of learning a low-dimensional vector space. This continuous vector space is easily accommodated in a deep neural network. Researchers have proposed many knowledge graph approaches for recommendation systems. For example, Yang et al. [31] proposed a knowledge-enhanced deep recommendation framework incorporating GAN-based models. Wang et al. [32] employed a deep knowledge-aware network incorporating a knowledge graph representation for news recommendations. Huang et al. [33] adopted memory network using knowledge graph embedding. Other researches have explored the entity and path information in the knowledge graph to make reasoning decisions. For example, Ai et al. [34] incorporated the entities and the relations embedding of the knowledge graph for explaining recommendations. Xian et al. [10] proposed policy-guided path reasoning for interpretability recommendation by proving paths in a knowledge graph. Zheng et al. [35] proposed recommendation based on meta-path. The meta-path method cannot guarantee the diversity of recommended articles, and there is no long path research for meta-path. Huang et al. [36] proposed knowledge graphs are used to explore the relational semantics of entity granularity behind user-item interactions. Based on user-item interaction and object-entity connection, multiple meta-paths from user to entity are constructed. However, these methods lack predefined target and have a too large action space that is often impractical in real-world large knowledge graphs. Our approach differs these methods [10‐12, 35, 36] by applying knowledge graphs in the deep reinforcement learning model and in having predefined terminal items of user-item interaction.

Problem formulation

A knowledge graph is composed of an entity set, E, and relation set, R, and defined as $G=\{(e^{h},r,e^{t}) \mid e^{h}, e^{t} \in E, r \in R \}$, where $e^{h}$, r, and $e^{t}$ represent the head, relation, and tail of a triple, respectively. We define two types of edges within G. The first type is a reverse edge: if $(e^{h},r,e^{t}) \in G$, then $(e^{t},r,e^{h})\in G$. The second type is a self-loop edge associated with the no operation relation: if $e^{h} \in E$, then $(e^{h}, r_\textrm{noop}, e^{h}) \in G$. We integrate the knowledge graph with reinforcement learning. The entities and relations in the knowledge graph are the state and action space in reinforcement learning.

Briefly, given a predefined target item, we want the agent to be able to explore the path that connects user-interacted items with the target to ensure better performance. In this paper, we consider a predefined target as a possible item that a user wants to visit in the future (e.g., a movie that the user wants to watch or a book that the user wants to read) and denote it as $s_{T}$. A recommendation starts from an initial state that is usually randomly picked from user-interacted items. At each step where the agent makes an action, it has access to history paths comprising a series of states, $s_{1:t}={s_{1},\ldots ,s_{t}}$. After receiving feedback from the external environment knowledge graph, the agent moves from the current state $s_{t}$ to the next state $s_{t+1}$. It aims to satisfy (1), that is, intelligent path-switching from many action spaces and states in reinforcement learning, and (2), from multi-hop long rational paths connecting user-interacted items with the predefined target. Based on a multi-hop paths, PGIS aggregates the entity correlations between a user and target items by explicitly considering the user’s different interests through the rich context semantics of a knowledge graph. Furthermore, using long logical paths, can give users a more reasonably explanation of recommended items, and can improve the performance of recommendation. In addition, other more complex and meaningful measures for specific practical applications can be considered. Then the task of path-based recommendation over knowledge graph can be formulated as:

Task description Given the user-item interaction and the knowledge graph, G, our task of long path recommendation with predefine target over knowledge graph is to learn a function that can predict how likely a user would adopt an item.

Methodology

In this section, we present our proposed DKRL model in detail. We first introduce the overall framework of DKRL and then discuss the processes of embedding knowledge graphs, obtaining optimal policies from deep reinforcement learning, and performing intelligent path-switching according path weights over a knowledge graph.

Framework

The DKRL is illustrated in Fig. 2. The knowledge graphs constitute the external environment and interact with agents. In particular, according users-interacted items, relevant terms are extracted to construct a knowledge graph [37], and the knowledge graph becomes the external environment. Reinforcement learning regards exploring paths as a trial-and-error process, and it can capture dynamic changes over items and maximize long-term cumulative rewards. In this process, the agent chooses an action from action spaces and uses the PGIS algorithm for exploring paths from the starting node to the predefined target over the knowledge graph. The agent switches paths intelligently according to the weights of paths over knowledge graphs. Finally, the long rational path with the highest weight is selected, and the ranked paths of these diverse items are recommended to users.

The notations and descriptions as used in this paper are shown in Table 7.

Knowledge graph embedding

We construct knowledge sub-graphs from Douban, a popular web service in China, by extracting relevant entities and relations from the open Chinese repository called CN-DBpedia¹ [37], CN-DBpedia is similar to DBpedia.² In general, an item represents an entity, and an item’s neighbors are its attributes, such as directors, actors, genres, and countries. These attributes represent the relationships between entities. The knowledge graph is generally recognized as a heterogeneous information network involving a diversity of nodes and relations between entities [31, 32]. Multiple attributes shared between entities are typically represented as multiple paths connecting the entities in the knowledge graph. Figure 3 illustrates an example of multiple attributes shared between entities. The graph provides multiple paths containing rich semantic cues that are absolutely useful in representing the relationships between entities. For example, the movies Red Sorghum and To Live are the most similar because they have the same director, actor, genre, and country attributes. The movie To Live does not resemble the movie Farewell My Concubine because they only share one attribute actor between them. We note that we allow reverse edges in the knowledge graph.

Table 1

Notations and their descriptions used in this paper

Notations	Descriptions
G	Knowledge graph
I	Set of items
U	Set of users
E	Set of entities
R	Set of relations
S	Set of states
u	User
s	State
a	Action
${\bar{R}}$	rewards
P	Transition probability
A	Set of actions

Within the knowledge graph, two similar entities can be connected through multiple paths traversing multiple attributes shared between them. A large number of paths between entities indicate greater similar between entities. Although the knowledge graph effectively represents structured data, the underlying symbolic nature of triplets makes it different for deep neural networks to manipulate entities and relations. To tackle this issue, it is necessary to represent knowledge graphs with an embedding. The goal of knowledge graph embedding is the preservation the proximity between an entity and its neighbor in the original knowledge graph and learning of the low-dimensional latent embedding for each entity and relation.

Given the extracted knowledge graph, we use Metapath2Vec [38] for the knowledge graph embedding algorithm where each node and its neighbors are represented as a continuous low-dimension vector space, a new state-of-the-art algorithm for heterogeneous information network embedding. For two entities correlated structurally and semantically, this algorithm embeds them close together in the low-dimensional vector space. Multi-attributes of an entity are usually closely related to an entity structurally and semantically. To help learn latent embedding for each entity, the extraction of additional contextual information could complement the identifiability of the entity. In addition to the entities’ embedding, we also include its attributes such as directors, actors, genres, and countries. Usually, the context embedding of entity is calculated as the average of its multiple attributes

$$\begin{aligned} {\bar{e}}= \frac{1}{{N(e)}}\sum \limits _{{e_i} \in N(e)} {{e_i}}, \end{aligned}$$

(1)

where N(e) is the neighbor of entity e in the knowledge graph, and $e_i$, $i \in N$ is the entity embedding of $e_i$ represented by knowledge graph embedding.

Deep reinforcement learning

The recommendation task in which our model as a Markov Decision Process (MDP), which is defined as a tuple $(S,A,{\bar{R}},P)$, where S denotes states, A denotes actions, ${\bar{R}}$ denotes a reward function, and P denotes the transition.

We define the set of states as S. A state $s\in S$ is defined as a tuple $(u, e_{t})$, where $ u\in U $ is the starting user, U is the set of users. The current state at step t is defined as ${s}_{t}$. The initial state is represented as $s_{0}$ and the terminal state as $s_{T}$.

In reinforcement learning, the action is a, we define the set of actions as A, which is all the relations in the knowledge graph.

We also define the set of rewards ${\bar{R}}$. For each current state s, there is a reward value ${\bar{r}}$. The cumulative future reward is obtained by multiplying the reward after the current state by the corresponding discount factor $\gamma $ to obtain the final return value ${\bar{R}}$. For any user, there is a known target $s_{T}$, the terminal reward ${\bar{R}}$ is defined as

$$\begin{aligned} {\bar{R}}= {\left\{ \begin{array}{ll} 10 &{} \text {if} \quad s_{n} \in s_{T} \\ -10 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(2)

Reinforcement learning learns good policies from sequential actions by optimizing the cumulative future reward. Considering the aforementioned dynamic feature of item recommendation and the need to achieve a future reward, we apply DQN [39] to produce the recommendation list for a user. DQN is a multi-layered neural network for a state space with n dimensions and an action space with m actions. The DQN algorithm uses the target network with parameters ${\tilde{\theta }}$. Therefore, we can model the target network as Eq. (3):

$$\begin{aligned} y_t={\bar{r}}_{t+1}+\gamma \mathop {\max }\limits _{a} Q(s_{t+1},a;{\tilde{\theta }}_t), \end{aligned}$$

(3)

where the state s is represented by entity and user features, action a is represented by the relations of entities, ${\bar{r}}_{t+1}$ denotes the rewards for the current state, and $\mathop {\max }\limits _{a}Q(s_{t+1},a;{\tilde{\theta }}_t)$ is the maximum value of the agent producing future rewards. $\gamma $ is a discount factor to adjust the relative importance of current and future rewards.

To avoid overestimated values and overoptimistic value estimates in DQN, DDQN [18] decomposes the max operation into the target and evaluation networks. We use the DDQN target to obtain the cumulative reward by taking action a as given in Equation (4):

$$\begin{aligned} y_t = {\bar{r}}_{t+1}+\gamma Q(s_{t+1},\arg \max \limits _{a}Q(s_{t+1},a;\theta _{t});\tilde{\theta _{t}}), \end{aligned}$$

(4)

where ${\bar{r}}_{t+1}$ represents the current reward by taking action a. Here, $\theta _{t}$ and $\tilde{\theta _{t}}$ are two different sets of parameters of the DDQN. In this formulation, given action a is selected from the set of actions $\{a\}$, and the agent reaches the next state $s_{t+1}$ by interacting with the knowledge graph. Based on parameter $\theta _{t}$, we obtain the maximum future reward from the action $\{a\}$. An optimal policy is easily obtained from the optimal values by selecting the maximum valued Q of the action in each state as the agent interacts with the knowledge graph. The future reward is calculated by the parameter $\tilde{\theta _{t}}$ in the evaluation network. We update the weights of the evaluation network $\tilde{\theta _{t}}$ using the weights of the target network $\theta _{t}$ after a few iterations. DDQN has been proven to prevent the overoptimistic value estimates of DQN.

DDQN decomposes the operator into a target network and evaluation network, and we feed the knowledge graph embedding into the two networks. When agent takes action a randomly at a given state s, the knowledge graph will feed back the reward ${\bar{r}}$ and the next state $s_{t+1}$ to the agent. Algorithm 1 presents the combination of knowledge graph and reinforcement learning with path-based intelligent switching algorithm.

Path-guided intelligent switching

Intelligent switching

In the fusion of knowledge graphs and reinforcement learning, there are large of entities and relations, and there are few studies on switching the path over knowledge graphs. Studies [10, 13] have not considered large action spaces in reinforcement learning, and so they have to used truncated strategies to decrease the number of entities. However, these methods may lose important properties of attributes of knowledge graphs. To address this research gap, we design a new path-based intelligent switching algorithm among many entities and many relations from a starting node to the target item in this section. In real-world applications, one node has a large number of neighbors, and there are multiple edges between nodes. Based on this, we introduce the concept of multi-hop path k.

Definition (Multi-hop path) The multi-hop path consists of n entities connected by k relations $\{e_{0}, r_{0}, e_{1}, r_{1},\ldots ,r_{k}, e_{n}\}$, from entity $e_{1}$ to entity $e_{n-1}$, $\{e_{1}$, $e_{2}$,..., $e_{n-1}\}$ in the path with the maximum Q value.

Each time the user requires an item, given the initial state(entity) $s_{0}$, the agent explores over the knowledge graph and selects action(relation) r from the R. When traveling to the next state, it checks whether the state is the target state, and, if so, returns the reward ${\bar{R}}$ and next state $s_{t+1}$, and terminates the task. If the explore continues, it will pass through a large number E and R, with agent switching paths intelligently from any current state $s_{t}$ to the next state $s_{t+1}$ among the E and R in knowledge graphs. In the knowledge graph, each entity has a large number of neighbors. From the current state, $s_{t}$, to the next item, $s_{t+1}$, the agent has to choose from action spaces, and so, there will be a multi-hop path until the target.

The path-switching based on weight

Although an agent can intelligently switch paths between multi-entities and multi-relations, this switching can not truly reflect the close relationship between the recommended items and the user-interacted items. Because each entity has many neighbors, we should be able to find identify items closest to the user-interacted items among the many neighbors on multi-hop paths. Therefore, we calculate weights of the paths to identify the item closest to the user-interacted item. On the right in Fig. 2, the recommendation problem is summarized in the form of multi-hop paths between the user and the predefined target with path weights. In our model, $u_\textrm{info}$ represents user information, for example, click times or watch time. $e_\textrm{info}$ represents item information, for example, in case of a movie, it can be movie name, actors, directors, release time, and $r_{e}$ represents the context information of entity e. There is a link between the starting state and the predefined target (indicated by the dashed red line), which indicates the final relationship between a typical user interest and the target. We can further form multi-hop paths from the starting state to the target. The first hop represents interest in a user–item pair, and the remaining hops represent semantic relationships between user-interacted items and the predefined target. Different items have different attractiveness to users, so users pay different attention to items. Users’ interests are used to assign weights to items to express the weights of different entities, for example, attributes, rates, and counts; eventually, the agent selects the node whose path has the highest weight as the next state. An agent recommending an item can be formulated as:

$$\begin{aligned} {\hat{y}}_{uT}=f({u_\textrm{info}},{e_\textrm{info}},\{ {r_e}\} ) \end{aligned}$$

(5)

According to Eq. (5), the basic workflow of PGIS can be formulated as follows:

$$\begin{aligned} {\hat{y}}_{ue} = {\text {AGG}}({f_\textrm{path}}(u,{e_h}),\{ {\text {path}}_T \} ) \end{aligned}$$

(6)

where $f_\textrm{path}$ is a function to determine the weight of one-hop, ${\text {path}}_{T}$ represents the total weight of the multi-hop, AGG is the scoring function for obtaining the final score between the initial state and the predefined target by summing the weights of multi-hop paths.

$$\begin{aligned} {f_\textrm{path}}(u,{e_h}) = \frac{{{n_{u{e_h}}}}}{{\sum \nolimits _{e_h^{'} \in N(e)} {{n_{ue_h^{'}}}} }} \end{aligned}$$

(7)

where $n_{u{e_h}}$ is the count of user interactions with an item. ${{\sum \nolimits _{e_h^{'} \in N(e)} {{n_{ue_h^{'}}}}}}$ is the total interactions of a user, ${\text {path}}_{T}$ is the total weight of each hop, written as Eq. (8),

$$\begin{aligned} {\text {path}}_{T}={\sum {{\text {path}}} _r} = \sum {({e_h}},{e_t}) \end{aligned}$$

(8)

According to the context information in the knowledge graph, the weight of each path is calculated as follows:

$$\begin{aligned} {\text {path}}_{T}=\sum {(\frac{{\exp ({e_h},{e_t})}}{{\sum \limits _{e_t^{'} \in N(e_t^{'})} {\exp ({e_t},e_t^{'})} }}} ) \end{aligned}$$

(9)

where $N(e_{t}{'})$ is the context information of entity $e_{t}$.

Algorithm 2 presents the intelligent path-switching algorithm based on paths weights, $h_{1}$, $h_{2}$, $h_{3}$, $h_{4}$ comprise the set of states corresponding to each action. The selections of each intermediate state on the paths are selected using the PGIS algorithm, which helps refine the recommended items for the user. Therefore, according to the rules of PGIS, the agent will automatically select the node with the highest weight as the next state, that is, intelligent path-switching based on path weight is realized, as shown in Fig. 4. In Fig. 4, $s_{0}$ is the initial state, and the $s_{n}$ is the target state. $\{h_{1},h_{2},h_{3},h_{4}\}$ are the entities. $\hat{r_{n}}$ represents the weight of each path.

Experiments

In this section, we present our model experiments and the corresponding results, including dataset analysis, baseline methods, comparison of different models, ablation study, the influence of different action sizes, and the influence of the k-hop-path. We also provide a case study of agents can switching paths over E and R of the knowledge graph.

Experimental settings

Dataset description

All compared models are evaluated on the following three realistic datasets.

KKBOX³: The dataset is from the famous music service KKBOX, which includes the historical data recording of many users listening to music. The musical attributes used in our experiments include genre, artist, language, and composer.

Movie The dataset is published on https://doi.org/10.7910/DVN/WCXPQA. The movie attributes include rating, actors, directors, and genre.

Book The dataset is published on https://doi.org/10.7910/DVN/WCXPQA. The attributes of of book used in our experiments include title, author, publication, and publisher.

Each of the three datasets is mapped to CN-DBpedia to build the corresponding sub-knowledge graphs for the datasets [37]. Similar to DBpedia, CN-DBpedia can be accessed through the API of the Knowledge works website.

Table 2

Statistics of experimental datasets

Dataset	Users	Items	Interaction	Entities	Relations
Movie	2720	63,300	1,066,247	62,600	123,260
Book	1633	48,800	102,386	48,100	4230
KKBOX	2000	64,555	142,004	55,360	23,340

Evaluation protocols

We used two popular metrics evaluation measures to evaluate the recommendation performance of the tested model:

NDCG The most frequently used list evaluation measure that takes into account the position of correctly recommended items. NDCG is averaged across all the testing users.
HR Hit Radio, which is the percentage of users that have at least one correctly recommended item in the list.
Precision Percentage of correctly recommended items in a user’s recommendation list, averaged across all testing users.
Recall Percentage of purchased items that are really recommended in the list. Recall is averaged across all the testing users.

We provide top-N recommendation list for each user in the testing set, where $N=10$ is token to report the numbers and compare different algorithms.

Baselines

We compared our approach with the following methods:

User-based CF method This method predicts the rating a user will give an item based on the aggregation of their ratings on similar items.
LFM [40] Latent factor model (LFM) is a state-of-the-art feature-based factorization model widely used in recommendation systems. In our test, we used the user’s rating of each movie as the input feature.
NCF [6] Neural collaborative Filtering is an extended CF framework. It has been proven to be a powerful DNN-based CF and consists of a generalized matrix factorization layer (GMF) and multi-layer perceptron (MLP). Both GMF and MLP layers are fed the model with randomly initialized user-item latent vectors. All NCF network parameters are learned based on observed user-item interactions.
DeepCF [19] Deep Collaborative Filtering is a deep version of CF. This approach uses CF methods employing representation learning and a matching function. It uses an MLP to learn the complex matching and low-rank relations between user-item interaction functions.
DeepWalk [41] The DeepWalk algorithm consists of two main components. First, it uses random walk as generator, and second, an update procedure. The random walk takes a graph and samples a random vertex as the root of the random walk. A walk sample from the neighbors of the last vertex visited until the maximum length is reached.
One-Hot One-Hot Encoding, also known as one bit effective encoding, mainly uses n-bit status register to encode N states. In the experiment, we record the item that the user has seen as 1, otherwise it is 0.
MetaPath [38] The metapah2vec is to maximize preserving both the structures and semantics of a heterogeneous network, which formalizes the network representation learning problem, the objective is to learn to the low-dimensional and latent embeddings for multiple-type of nodes.
DQN [39] Deep Q-learning network model the complex dynamic user–item interactions and user preference. DQN method considers current reward and future reward to pursuit better recommendation.
ECKG [35] In this paper, the propagation path is constructed by meta-path, and the internode influence scores among paths are calculated by Self-Attention Network (SAN), and the series of the whole piece is completed by “interpretability”.
MSAN [36] In this paper, knowledge graphs are used to explore the relational semantics of entity granularity behind user-item interactions. Based on user–item interaction and object–entity connection, multiple meta-paths from user to entity are constructed.
PGPR Xian et al. [10] proposed policy-guided path reasoning (PGPR) for interpretability recommendation in a knowledge graph. However, they lack predefined target and have a too large action space, so we compared our work with PGPR.
PDN [14] Path-based Deep Network (PDN) incorporates both personalization and diversity to enhance matching performance. This model aggregates the relevance weights of the related two hop paths.

We further note that the user-based CF method employs traditional collaborative filtering, LFM employs the factorization model, and NCF, DQN, PGPR, and PDN are all state-of-the-art deep learning models. We have published the program code implementing our framework at https://github.com/shaohuatao/DKRN.

Parameter settings

We implement our DKRL and all baselines in Tensorflow and carefully turn the key parameters. For a fair comparison, we fix the embedding size to 50 for all models, and the Episode step is 3000. We optimize our method with Adam and set the batch size to 64. We turn the learning rate is 0.01 and the memory size is 2000. We set the discount factor is 0.95 and decay parameter is 0.995. For DeepCF and NCF, according to the original paper set, the modal parameters are randomly initialized with a Gaussian distribution and the negative instances are uniformly sampled from unobserved interactions. For PGPR and PDN, the length of depth is set 3 according to the original paper on two datasets. For DQN method, we turn the learning rate is 0.01, the memory size is 2000, the discount factor is 0.95 and decay parameter is 0.995 and the batch size is 64. For MetaPath, DeepWalk and LFM, the loss coefficient is set according to the original paper on two datasets. We reproduce the One-hot and User-based CF method according to the original paper set.

Table 3

Results of comparison among different models

Dataset	Methods	NDCG	HR	Prec	Recall
Movie	User-based CF	0.2967	0.2395	0.2341	0.8670
	LFM	0.3056	0.2402	0.2335	0.8730
	DeepWalk	0.3191	0.2524	0.2429	0.8828
	DQN	0.3633	0.2663	0.2897	0.8870
	One-Hot encoding	0.3334	0.2785	0.2669	0.8947
	MetaPath	0.3436	0.2846	0.2745	0.8993
	NCF	0.3498	0.2907	0.2787	0.9057
	DeepCF	0.3535	0.2932	0.2881	0.9089
	ECKG	0.3503	0.2947	0.2987	0.9098
	MSAN	0.3605	0.2985	0.2956	0.9109
	PGPR	0.3647	0.3029	0.2998	0.9139
	PDN	0.3698	0.3090	0.3015	0.9193
	DKRL	0.3712	0.3473	0.3171	0.9221
	Imp	3.7%	12.39%	5.17%	3.01%
Book	User-based CF	0.0520	0.04919	0.0351	0.8345
	LFM	0.0533	0.0525	0.0344	0.8369
	DeepWalk	0.0667	0.0550	0.0413	0.8392
	DQN	0.1162	0.1024	0.0988	0.8478
	One-Hot encoding	0.0820	0.0796	0.0627	0.8414
	MetaPath	0.0958	0.0948	0.0741	0.8434
	NCF	0.1032	0.0998	0.0820	0.8450
	DeepCF	0.1151	0.1015	0.0939	0.8463
	ECKG	0.1204	0.1018	0.0984	0.8466
	MSAN	0.1253	0.1028	0.1001	0.8478
	PGPR	0.1276	0.1039	0.1022	0.8491
	PDN	0.1288	0.1046	0.1076	0.8605
	DKRL	0.1297	0.1358	0.1093	0.8967
	Imp	9.3%	20.8%	3.57%	4.2%
KKBOX	User-based CF	0.2227	0.2132	0.1815	0.8957
	LFM	0.2301	0.2153	0.1926	0.9090
	DeepWalk	0.2341	0.2166	0.1959	0.9122
	DQN	0.2488	0.2234	0.2065	0.9174
	One-Hot encoding	0.2369	0.2178	0.1981	0.9256
	MetaPath	0.2402	0.2194	0.1998	0.9288
	NCF	0.2438	0.2194	0.2026	0.9317
	DeepCF	0.2469	0.2208	0.2048	0.9348
	ECKG	0.2477	0.2231	0.2057	0.9371
	MSAN	0.2485	0.2244	0.2066	0.9382
	PGPR	0.2500	0.2254	0.2078	0.9397
	PDN	0.2514	0.2272	0.2096	0.9418
	DKRL	0.2526	0.2587	0.2109	0.9736
	Imp	4.7%	13.86%	6.2%	3.37%

Performance comparison

In this section, we present the results comparing different baselines as well as the variants of our framework. Table 3 shows the results of the different models. We can observe that:

User-based CF performed the second worse performance among all the methods because user-based CF is a traditional CF-based method with low efficiency.
DeepCF and NCF had better results than the user-based CF and LFM methods, which suggests that deep models are effective in capturing non-linear relations and improved recommendation performance.
One-Hot Encoding has the worst performance among all the methods, the embedding of MetaPath method had better performance than One-Hot Encoding, because the data in One-Hot Encoding is too sparse, so the performance is very low.
DQN had better performance than MetaPath and DeepWalk, because the next state has the maximum value in DQN model, while DeepWalk only random selects the next state. MethPath uses the embedding of entity to calculate the similarity between entities and recommend items for user, so DQN model has more accurate than MetaPath.
DQN performs slightly better than DeepCF and NCF, indicating that deep reinforcement learning can handle complex dynamic user-item interactions compared to neural networks.
PGPR performs better than DQN, because PGPR using knowledge graph in reinforcement learning, so the performance is better.
Finally, DKRL’s advantage over user-based CF and LFM shows that knowledge graph embedding performed better than similar users’ rating and factorization models. LFM and CF do not consider the context information between nodes, and the information between nodes is not shared. However, our model makes full use of the rich contextual semantic information of the knowledge graph, and it is easy to find the similarity between nodes using semantic information. Therefore, the performance of recommendation is improved. DKRL’s advantage over DeepWalk, MetaPath, and DQN shows that applying knowledge graphs into reinforcement learning improved recommendation accuracy. DeepWalk and MetaPath use node embedding to make recommendations and randomly select the next node. Without the interaction information with the external environment, the expected maximum reward of the node cannot be obtained. However, DQN only considers reinforcement learning as recommendation, without external knowledge as auxiliary information, so the recommendation performance is relatively low. Our model fully combines knowledge graph with reinforcement learning. As an external feedback environment, knowledge graph interacts with agents and gives information feedback. Therefore, agents can obtain the best recommendation performance in the environment of constant trial and error. DKRL’s better than PGPR, because in PGPR, there is no pre-defined targeted items, so some paths may not exist, and the recommended result is poor compared to DKRL. Our model has pre-defined target, each time the agent explores the path, it will explore the appropriate path. PDN has only two hop paths, while DKRL has a multi-hop path with a length of 10 hops. Therefore, the performance of DKRL is higher than that of PDN. This also shows that the recommended performance of long path (less than or equal to 10 hops) is greater than that of short path.

Ablation study

Our model DKRL makes several important extensions to integrate knowledge graphs into reinforcement learning for recommendation. In this section, we conduct ablation study experiments to analyze their impact.

Influence of action sizes

In this experiment, we evaluated how the recommendation performance of our model varied with different action sizes.

The knowledge graphs contained a large number of attributes, which we pruned according to the users’ attention to each attribute. Attributes attracting more attention from users, such as movie director and actors were more likely to be preserved. In other words, the larger action space contained more attributes with less user interested. In general, the user’s focus on the item attributes ranged narrowly from 2 to 4. We preserved at most four action sizes where users were paying attention in the experiment. In the work of Xian et al. [10], the action space ranged from 100 to 500 in the knowledge graphs, which was extremely inappropriate and impractical from the user’s perspective.

We experimented on two datasets and using the default settings, and varying the action sizes from 2 to 10, as shown in Table 4. The baseline method PGPR is also reported in the table for comparison. The results show that our model performed better than PGPR which uses pruned action sizes. The NDCG@10 and Prec@10 of our method were consistently above those of PGPR for all action sizes between 2 and 4. The results further demonstrate that our model was effective compared to other baselines. As shown in two datasets, action size is 10, which led to better performance. This means that entities have not so much relations, and it is easier to get to the target from the initial state. Therefore, recommendation performance is better.

Table 4

Comparison of different action sizes

Dataset	Movie		Book
Action size	NDCG@10	Prec@10	NDCG@10	Prec@10
PGPR	0.3698	0.3015	0.1288	0.1076
2-actions	0.3712	0.3171	0.1297	0.1093
3-actions	0.3798	0.3215	0.1324	0.1152
4-actions	0.3842	0.3327	0.1442	0.1231
10-actions	0.3911	0.3444	0.1521	0.1288

Influence of long rational paths

In this experiment, we studied how the path length with predefined target influences the recommendation performance of our model.

According to Table 3, larger action spaces were more likely to have better recommendation performance. This experiment demonstrated whether the path length influenced the recommendation performance. The results for the two datasets are plotted in Figs. 5 and 6. We ran the experiments on the movie and book datasets using the parameter settings given previously.

We make several observations about these results. First, knowledge graphs have more attributes, which resulted in closer connections between entities, the path length is greater than or equal 2 hops, and improved recommendation performance. Second, long rational paths with predefined target can discover those items that are most similar to the user-interacted items and can fully discover rich semantic contexts among entities. Third, according to our statistics in the experiment, path lengths of 2 hops to 10 hops accounted for $90\%$ of the total, indicating that the recommended performance is the best when the path length is between 2 hops and 10 hops. In a actual large-scale knowledge graph, the path length is easy to reach and easily traversed, and various items are recommended for users accordingly. In studies [10, 11], the maximum path length was only 3 hops. They did not consider a long rational path and did not consider predefined targets in knowledge graphs; consequently, they could not reveal the deep relationships between the entities, and did not provide diverse recommendations.

Table 5

Influence of PGIS algorithm

Dataset	Movie			Book
Action size	NDCG@10	Prec@10	PLR	NDCG@10	Prec@10	PLR
2-actions	0.3712	0.3171	(2,130)	0.1297	0.1093	(2,140)
3-actions	0.3798	0.3215	(2,79)	0.1324	0.1152	(2,132)
4-actions	0.3842	0.3327	(2,59)	0.1442	0.1231	(2,42)
10-actions	0.3911	0.3444	(2,25)	0.1521	0.1288	(2,30)

Table 6

Influence of without PGIS algorithm

Dataset	Movie			Book
Action size	NDCG@10	Prec@10	PLR	NDCG@10	Prec@10	PLR
2-actions	0.3655	0.3679	(2,145)	0.1210	0.1004	(2,155)
3-actions	0.3702	0.3110	(2,102)	0.1226	0.1111	(2,140)
4-actions	0.3749	0.3207	(2,78)	0.1340	0.1198	(2,48)
10-actions	0.3816	0.3355	(2,37)	0.1432	0.1206	(2,40)

Influence of the path-guided intelligent switching

In this section, we analyze the influence of using and without using the PGIS algorithm on the path length and recommendation performance.

According to Table 5, in the movie dataset, for action sized of 2, 3, 4, and 10, path lengths are in the ranges (2–130), (2–79), (2–59), and (2–25), respectively. In the book dataset, for action sizes 2, 3, 4, and 10, path lengths are in the ranges (2–140), (2–132), (2–42), and (2–30), respectively. Thus, the PGIS algorithm can effectively switch between multi-entities and multi-relations according to weights of paths and identify a path effectively from the starting state to the target state. Furthermore, for an action size of 10, the NDCG and Prec. are higher than those for action sizes, which are 2, 3, and 4. This shows that in real-world knowledge graphs, for large action sizes, PGIS can switch paths efficiently according to path weight, improving recommendation performance, and achieving accurate recommendation for users.

Tables 5 and 6 demonstrates that recommendation perforce is better with the PGIS algorithm than without PGIS algorithm. The longest path length with PIBS is shorter than the longest path length without PGIS. This demonstrates that search paths can be identified more efficiently with PGIS than without it. Knowledge graphs contain diverse heterogeneous structured information, and the PGIS algorithm enables identifying a relationship between connected and unconnected entities on a path. Thus, it improves the diversity and performance of recommendation.

Case study

To demonstrate the efficacy of the knowledge graph as well as deep reinforcement learning more intuitively, we randomly sampled a user as an example, and selected a movie The Cradle of Life that user has watched. The attributes were: director Jan De Bont, genre Action / Adventure, and main actors Angelina Jolie, Gerard Butler. Here, the title was the state with the three attributes of director, genre, and actors. We first studied ${\hat{E}}$ and ${\hat{R}}$, and path in the case. Each action corresponded to a set of ${\hat{E}}$ of the same type. There are about 20 movies in the movie set ${\hat{E}}$ that belong to the same actor as the movie The Cradle of Life, and we only selected some of them $\{Kung Fu Panda, Girl, Interrupted, Kung Fu Panda 2, $$ Kung Fu Panda 3\} $, and so on. Similarly, there are about 10 movies belonging to the same director and more than 20 movies belonging to the same adventure attribute. According to the attribute weight value, the path explored by the agent is shown in Fig. 7. The initial state is The Cradle of Life, the next state is Speed2, and the next state is Maleficent, and then finally state is The Great Wall and so on.

The agent interacts with the external environment knowledge graph. The long path from the initial starting state to the target state was discovered as $movie{\mathop {\longrightarrow }\limits ^{0.546}}movie{\mathop {\longrightarrow }\limits ^{0.517}}movie {\mathop {\longrightarrow }\limits ^{0.489}}movie{\mathop {\longrightarrow }\limits ^{0.527}} movie{\mathop {\longrightarrow }\limits ^{0.602}}movie{\mathop {\longrightarrow }\limits ^{0.574}}movie$, which shows three advantages of our model. First, the agent switched paths intelligently according to the weight of path. Second, from the starting node to the predefined item, there is a long logical path. Third, the selected actions of $\{actor, director, genre, actor, director,actor, actor\}$ show that our model implements the Epsilon-greedy algorithm in reinforcement learning using exploration and exploitation, which made each action as likely to be selected as possible. This case also showed that our model improved the recommendation performance and diversity.

Conclusion

In this study, we propose a method of path-based intelligent switching over knowledge graphs incorporating deep reinforcement learning for improved and diverse personalized recommendation. Compared with existing methods, our model integrates knowledge graphs and reinforcement learning and considers the impact of long logical paths on recommendation performance, for providing diverse recommendations to users. The designed novel PGIS algorithm can switch among multi-entities and multi-relations over knowledge graphs. Furthermore, we calculate the weights of paths, because of which, an agent can switch paths according to the path weights. In addition, we consider predefined target. From the initial to the predefined target, there are multi-hop paths, with path lengths of 10 or more. We demonstrate that long paths have better recommendation performance and interpretability. Experiments also showed that our model improves accuracy and diversity. The proposed framework can be applied to many other recommendation platforms. In the future, we will design models for intermediate nodes when an agent interacts with the knowledge graph. The findings of this study provide more insights for combining knowledge graphs and deep reinforcement learning.

Acknowledgements

This work is supported by Science and Technology Key Project in He’nan Province under Grant 222102210048, Key Research and Development in He’nan Province under Grant 22A520040 and 20B520033, the Plan For Scientific Innovation Talent of He’nan Province under Grant 184100510012, Key Technologies R &D Program of He’nan Province under Grant 212102210084, and Innovation Scientists and Technicians Troop Construction Projects of He’nan Province Key Technologies R & D Program of He’nan Province under Grant NO.192102210295.

Declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel Addressing feature selection and extreme learning machine tuning by diversity-oriented social network search: an application for phishing websites detection

Nächster Artikel Rockburst prediction based on optimization of unascertained measure theory with normal cloud

Appendix

Table 7

Control parameters

Parameter settings
Embedding size	50
Episode step	50
Batch size	64
Learning rate	0.01
Memory size	2000
Discount factor	0.95
Decay parameter	0.995

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (docx 92 KB)

http://kw.fudan.edu.cn/cndbpedia.

http://wiki.dbpedia.org/.

https://www.kaggle.com/c/kkbox-music-recommendation-challenge/data.

Zhang H-W, Shen WLF-M, He X-N et al (2016) Discrete collaborative filtering. In: Proceedings of the 39th international ACM SIGIR conference, pp 325–334

Sarwar B, Karypis JKG, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of international conference on world wide web, pp 285–295

Yehuda KBR, Chris V (2009) Matrix factorization techniques for recommender systems. Computer 42:30–37CrossRef

He X-N, Zhang M-YKH, Chua T-S (2017) Fast matrix factorization for online recommendation with implicit feedback. In: arXiv:1708.0502

Zhao Q, YS, Hong L (2017) Gb-cent: Gradient boosted categorical embedding and numerical trees, pp 1311–1319

Wang X, He MW X-N, Feng F-L et al (2019) Neural graph collaborative filtering. In: Proceedings of the 42rd international ACM SIGIR conference on research and development in information retrieval, pp 165–174

Zhao X-Y, Xia LZL, Ding Z-Y et al (2018) Deep reinforcement learning for page-wise recommendations. arXiv:1805.0234

Zheng G-J, Zhang Z-HZF-Z, Nicholas Y-X et al (2018) Drn: a deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 world wide web conference, pp 126–137

Zou L-X, Xia Z-YDL, Song J-X (2019) Reinforcement learning to optimize long-term user engagement in recommender system. In: Proceedings of the 23th ACM SIGKDD international conference on knowledge discovery and data mining, pp 2810–2818

10.

Xian Y-K, Fu SMZ-H, Melo G-D et al (2019) Reinforcement knowledge graph reasoning for explainable recommendation. In: Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, pp 285–294

11.

Wang P-F, Fan LXY, Zhao W-X et al (2020) Kerl: a knowledge-guided reinforcement learning model for sequential recommendation. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 209–218

12.

Zhou S-J, Dai H-KCX-Y, Zhang W-N et al (2020) Interactive recommender system via knowledge graph-enhanced reinforcement learning. In: Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval, pp 179–188

13.

Wang X, Xu X-NHY-K, Cao Y-X et al (2020) Reinforced negative sampling over knowledge graph for recommendation. In: Proceedings of the web conference 2020, pp 99–109

14.

Li H-Y, Chen C-LLZ-H, Xiao R et al (2021) Path-based deep network for candidate item matching in recommenders. arXiv:2105.0824

15.

Wang X, Huang D-XWT-L, Liu Z-G (2021) Learning intents behind interactions with knowledge graph for recommendation. In: Proceedings of the international world wide web conference committee 2021

16.

Xia L-H, Huang YXC, Dai P et al (2020) Knowledge-enhanced hierarchical graph transformer network for multi-behavior recommendation. In: Proceedings of the 34th association for the advancement of artificial intelligence, pp 4486–4493

17.

Wang X, Wang C-RXD-X, He X-N (2018) Explainable reasoning over knowledge graphs for recommendation. arXiv:1811.0454

18.

Hasselt H-V, AG, Silver D (2015) Deep reinforcement learning with double q-learning. arXiv:1509.0646

19.

Deng Z-H, Huang C-DWL, Lai J-H (2019) Deepcf: a unified framework of representation learning and matching function learning in recommender system. arXiv:1901.0470

20.

Wang H, Y N-YW, Yeung D (2015) Collaborative deep learning for recommender systems. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1235–1244

21.

Guo H-F, Tang Y-MYR-M, Li Z-G et al (2017) Deepfm: a factorization machine based neural network for ctr prediction. In: IJCAI, pp 1725–1731

22.

Lu Z-Q, Yang Q (2016) Partially observable Markov decision process for recommender systems. arXiv:1608.07793

23.

Theocharous G, P-ST, Ghavamzadeh M (2015) Personalized ad recommendation systems for life-time value optimization with guarantees. In: IJCAI, pp 1806–1812

24.

Wang X-T, Chen JYY-R, Wu L et al (2018) A reinforcement learning framework for explainable recommendation. In: IEEE international conference on data mining, pp 587–596

25.

Xiong W-H TH, Wang W-Y (2017) Deeppath: a reinforcement learning method for knowledge graph reasoning. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 564–573

26.

Das R, Dhuliawala MZS, Vilnis L-K et al (2017) Go for a walk and arrive at the answer: reasoning over paths in knowledge bases with reinforcement learning. arXiv:1711.0585

27.

Lin X-V RS, Xiong C (2018) Multi-hop knowledge graph reasoning with reward shaping. arXiv:1808.1056

28.

Yang B-S, T M (2019) Leveraging knowledge bases in lstms for improving machine reading. arXiv:1902.09091

29.

Tao S-H, Qiu YPR-H, Ma H (2021) Multi-modal knowledge-aware reinforcement learning network for explainable recommendation. Knowl Based Syst 227:107217

30.

Cao Y-X, Wang X-NHX, Hu Z-K (2019) Unifying knowledge graph learning and recommendation: towards a better understanding of user preferences. In: Proceedings of the web conference 2019, pp 151–161

31.

Yang D-Q, Guo Z-YWZ-K, Jian J-Y et al (2018) A knowledge-enhanced deep recommendation framework incorporating gan-based models. In: Proceedings of IEEE international conference on data mining, pp 1368–1373

32.

Wang H-W, Zhang XXF-Z, Guo M-Y (2018) Dkn: deep knowledge-aware network for news recommendation. arXiv:1808.0828

33.

Huang J, Zhao H-JD W-X, Wen J-R et al (2018) Improving sequential recommendation with knowledge-enhanced memory networks. In: Proceedings of the 41nd international ACM SIGIR conference on research and development in information retrieval, pp 505–514

34.

Ai Q-Y, Azizi XCV, Zhang Y-F (2018) Learning heterogeneous knowledge base embeddings for explainable recommendation, algorithms. Algorithms 11:137–153CrossRef

35.

Zheng J-Y, J-YM, Wen Y-L (2022) Explainable session-based recommendation with meta-path guided instances and self-attention mechanism. In: Proceedings of the 46nd international ACM SIGIR conference on research and development in information retrieval, pp. 2555–2559

36.

Huang R-r, C-QH, Cui L (2021) Entity-aware collaborative relation network with knowledge graph for recommendation. In: Proceedings of the CIKM 2021, pp 3098–3102

37.

Wang H-W, Zhang XX F-Z, Guo M-Y (2018) Cndbpedia: a never-ending Chinese knowledge extraction system. In: Proceedings of the international conference on industrial, engineering and other applications of applied intelligent systems, pp 428–438

38.

Dong Y-X, N-VC, Swami A (2017) metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23th ACM SIGKDD international conference on knowledge discovery and data mining, pp 135–144

39.

Mnih V, Silver KK et al (2015) Human-level control through deep reinforcement learning. Nature 518:529–533CrossRef

40.

Rendle S (2012) Factorization machines with libfm. In: ACM transactions on intelligent systems and technology, vol 3, p 57

41.

Perozzi B, RA-RR, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining, pp 701–710

Titel: Path-guided intelligent switching over knowledge graphs with deep reinforcement learning for recommendation
verfasst von: Shaohua Tao
Runhe Qiu
Yan Cao
Guoqing Xue
Yuan Ping
Publikationsdatum: 29.06.2023
Verlag: Springer International Publishing
Erschienen in: Complex & Intelligent Systems / Ausgabe 6/2023
Print ISSN: 2199-4536
Elektronische ISSN: 2198-6053
DOI: https://doi.org/10.1007/s40747-023-01124-1

Springer Professional

Path-guided intelligent switching over knowledge graphs with deep reinforcement learning for recommendation

Abstract

Supplementary Information

Publisher's Note

Introduction

Deep recommendation algorithms

Reinforcement learning for recommendation systems

Knowledge graph applying in recommendation

Problem formulation

Methodology

Framework

Knowledge graph embedding

Deep reinforcement learning

Path-guided intelligent switching

Intelligent switching

The path-switching based on weight

Experiments

Experimental settings

Dataset description

Evaluation protocols

Baselines

Parameter settings

Performance comparison

Ablation study

Influence of action sizes

Influence of long rational paths

Influence of the path-guided intelligent switching

Case study

Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Appendix

Supplementary Information

Premium Partner

Springer Professional

Abstract

Supplementary Information

Publisher's Note

Introduction

Related work

Deep recommendation algorithms

Reinforcement learning for recommendation systems

Knowledge graph applying in recommendation

Problem formulation

Methodology

Framework

Knowledge graph embedding

Deep reinforcement learning

Path-guided intelligent switching

Intelligent switching

The path-switching based on weight

Experiments

Experimental settings

Dataset description

Evaluation protocols

Baselines

Parameter settings

Performance comparison

Ablation study

Influence of action sizes

Influence of long rational paths

Influence of the path-guided intelligent switching

Case study

Conclusion

Acknowledgements

Declarations

Conflict of interest

Publisher's Note

Appendix

Supplementary Information

Weitere Artikel der Ausgabe 6/2023

DMBR-Net: deep multiple-resolution bilateral network for real-time and accurate semantic segmentation

A practical type-3 Fuzzy control for mobile robots: predictive and Boltzmann-based learning

Compact interactive dual-branch network for real-time semantic segmentation

Causal calibration: iteratively calibrating LiDAR and camera by considering causality and geometry

Linear local tangent space alignment with autoencoder

Addressing feature selection and extreme learning machine tuning by diversity-oriented social network search: an application for phishing websites detection

Premium Partner