Abstract

Nowadays, scholar recommender systems often recommend academic papers based on users’ personalized retrieval demands. Typically, a recommender system analyzes the keywords typed by a user and then returns his or her preferred papers, in an efficient and economic manner. In practice, one paper often contains partial keywords that a user is interested in. Therefore, the recommender system needs to return the user a set of papers that collectively covers all the queried keywords. However, existing recommender systems only use the exact keyword matching technique for recommendation decisions, while neglecting the correlation relationships among different papers. As a consequence, it may output a set of papers from multiple disciplines that are different from the user’s real research field. In view of this shortcoming, we propose a keyword-driven and popularity-aware paper recommendation approach based on an undirected paper citation graph, named PRkeyword+pop. At last, we conduct large-scale experiments on the real-life Hep-Th dataset to further demonstrate the usefulness and feasibility of PRkeyword+pop. Experimental results prove the advantages of PRkeyword+pop in searching for a set of satisfactory papers compared with other competitive approaches.

1. Introduction

With the increasing maturity of recommender systems [1], users are apt to employ existing academic paper recommender websites (e.g., Google Scholar and Baidu Academic) to search for their interested papers based on a set of keywords typed by the users. Generally, an academic paper often contains only partial keywords that a user is interested in. Therefore, a paper recommender system needs to analyze the user’s search requirements to return a set of papers that collectively covers all the queried keywords.

Next, we use Figure 1 to introduce the common process of paper recommendation [2]. This process mainly consists of three phases. The first phase is entering keywords; users analyze their research requirements and enter all query keywords (e.g., k1, k2, k3, and k6) to a recommender system. The second phase is paper discovery [3]; the recommender system automatically identifies diverse sets of candidate papers. The third phase is paper selection [4, 5]; the recommender system recommends candidate papers containing query keywords to users. Frankly, the returned papers may fail to satisfy users’ requirements on deep and continuous research on a certain content or topic as these papers may belong to the variety of research domains.

Keyword search methods [6, 7] have long been popularized in searching for papers, but these methods are hardly to find a set of satisfactory papers. In fact, a set of satisfactory papers must satisfy following requirements: on the one hand, these papers are collectively covering users’ query keywords [810]; on the other hand, one candidate paper containing query keywords has direct or indirect correlation relationships [11] with other candidate papers containing diverse query keywords. In short, recommending a set of satisfactory papers still needs in-depth analyses and study [12].

To recommend a set of satisfactory papers, we propose PRkeyword+pop (keywords-driven and popularity-aware paper recommendation) approach that assists users in searching for a set of satisfactory papers, i.e., these papers not only cover all queried keywords but also have higher popularity and correlation among papers. Moreover, PRkeyword+pop runs on an undirected paper relationships graph, where paper is modeled as node and a connected edge represents whether there has been correlation relationship among papers. In practice, PRkeyword+pop may return one or multiple subgraphs of the paper relationships graph according to users’ query keywords; the returned subgraphs include keywords papers covering query keywords, bridging papers (if any) needed however not specified by query keywords, and the composability and popularity of these recommended papers. Note that we speak interchangeably of a paper and its corresponding node in the remainder of this paper, both denoted as .

In summary, we make the following contributions:(1)We propose a novel keyword-driven and popularity-aware paper recommendation approach, which efficiently recommends a set of satisfactory papers.(2)We build an undirected paper citation graph [13], and the users’ keyword query problem is regarded as the Steiner tree problem. Finally, we employ papers’ popularity to find optimal solutions.(3)We conduct large-scale experiments on the Hep-Th dataset [14] to evaluate the usefulness and feasibility of PRkeyword+pop.

The rest of this paper is structured as follows: Section 2 demonstrates the research motivation, Section 3 defines an undirected paper citation graph, Section 4 formulates main research problems, Section 5 introduces how PRkeyword+pop answers users’ keyword query on the undirected paper citation graph, Section 6 evaluates PRkeyword+pop by using experimental results, Section 7 reviews related works, and Section 8 further concludes this paper and points out the future research directions.

2. Research Motivation

In Section 2, we use examples of Figures 2 and 3 for demonstrating the research motivation. Figure 2 shows that a user needs to perform the following keywords research tasks [15] before his creation: (1) paper recommendation (i.e., k1) for paper recommendation process research [16]; (2) keyword search (i.e., k2) for keyword search research and applying it to paper recommendation process; (3) Steiner tree (i.e., k3) for Steiner algorithm [17] research and applying it to keyword search; (4) dynamic programming (i.e., k6) for dynamic programming technique research and applying it to solve Steiner tree problem. In Figure 2, the user obtains four corresponding keywords (i.e., Q = {k1, k2, k3, k6}) by the preliminary analysis of his research content [18]. Next, the user can search some corresponding papers from Figure 3.

Figure 3 is a part of an undirected paper citation graph and contains 14 nodes covering diverse keywords, i.e., . Furthermore, the notation {k11, k13} indicates that node offers keywords k11 and k13, and the edge indicates that nodes and have a correlation relationship. Thus, given a query Q = {k1, k2, k3, k6}, the user easily searches a set of papers from Figure 3, i.e.,  = .

Even if this user fortunately obtains a set of papers covering all query keywords, however, it is possible that he is still having no idea of whether these papers can finish his creation as correlation relationships among these papers are both transparent to him. In fact, each user must manually find a set of required papers from massive candidate papers [19, 20]; worse still, this process is very time consuming and challenging. To tackle these issues, we propose a novel keyword-driven and popularity-aware paper recommendation approach, named PRkeyword+pop, which will be a detailed presentation in Section 5.

3. Undirected Paper Citation Graph

The citation relationships of paper citation graph [21] can sufficiently attest the correlation among papers’ research content. If we use the paper citation graph in our proposal, the direction of knowledge information will be considered flowed in one direction. In fact, the knowledge information can be bidirectionally transferred in the paper citation graph. Thus, we use undirected citation relationships to denote the papers’ correlations. For example, an undirected citation relationship {} indicates the correlation among papers and . As more citation relationships are mined and papers are included in an undirected paper citation graph, this graph will grow larger and denser [22], offering a solid base for recommending a set of satisfactory papers.

PRkeyword+pop runs any undirected paper citation graph that fulfills requirements specified by the following definitions:

Definition 1. (nodes). For each paper, an undirected paper citation graph has a corresponding node . Each node contains multiple keywords (i.e., ) representing the research contents of paper. Furthermore, denotes a set of nodes of .

Definition 2. (edges). For a pair of nodes , contains a corresponding edge . denotes the correlation among nodes and . Furthermore, denotes a set of edges of the .

Definition 3. (undirected paper citation graph). An undirected paper citation graph is expressed as , where and denote its sets of nodes and edges, respectively.
According to Definition 2, relevant papers in same domain are connected, either directly or indirectly, forming a connected undirected paper citation graph. Note that if users enter entirely irrelevant query keywords (e.g., privacy-preserving [2325] and protein engineering), we will fail to recommend a set of satisfactory papers to users.
To answer users’ query, PRkeyword+pop prebuilds an inverted index S(K) [17] on , i.e., if nodes contain same query keywords, nodes are stored in common inverted index. For example, nodes , and are both containing keyword in Figure 3, the . This way, given an individual keyword k, PRkeyword+pop can easily find all papers that perform the research of keyword k.

4. Problem Formulation

In fact, our proposal includes a key point, recommending a set of satisfactory papers. Specifically, answering a keyword query Q mainly consists of two steps: (1) to find Steiner trees based on an undirected paper citation graph , denoted as T(Q), where T(Q) not only covers all query keywords but also has the fewest number of nodes (i.e., the higher correlation); (2) to obtain optimal Steiner trees T1(Q) based on T(Q), where T1(Q) has the highest popularity (i.e., the more trust [26]). To better clarify our paper, we summarize the symbols in Table 1.

Likewise, our proposal recommends papers to the user based on the undirected paper citation graph of Figure 3 and the query keywords of Figure 2 (i.e., ). Here, nodes and contain query keyword ; nodes , , and contain query keyword ; nodes and contain query keywords and ; node contains query keyword . Thus, given Q = {k1, k2, k3, k6}, we are looking for a Steiner tree that connects one node from , one node from , one node from , and one node from . Furthermore, the Steiner tree also connects nodes that do not cover any query keywords, e.g., nodes , , and . Therefore, the Steiner tree of Figure 4 (i.e., ) can satisfy users’ requirements on deep and continuous research on a certain content or topic.

Thus, Steiner tree is defined as follows.

Definition 4. (Steiner tree). Given an undirected paper citation graph and a set of nodes . When covers all nodes of and it is a connected subgraph, forms a Steiner tree.
Given a query keyword k in Q = {k1, …, kl}, we use the inverted indexes of Section 3 for identifying multiple sets of nodes, denoted as , where at least contains keyword . Next, we need to find a group Steiner tree, and the group Steiner tree is formally defined as follows.

Definition 5. (group Steiner tree). Given the and multiple sets of nodes , where each group contains the query keyword . When is a Steiner tree and it contains exactly one node of each group , forms a group Steiner tree.
Firstly, we may obtain multiple group Steiner trees according to the Q. Next, PRkeyword+pop aims to find minimum group Steiner trees that not only cover the users’ query keywords but also have the higher correlation (i.e., the fewer nodes). Thus, a minimum group Steiner tree is defined as follows.

Definition 6. (minimum group Steiner tree). Given a set of exact group Steiner trees, i.e., , when , is a minimum group Steiner tree. represents the number of nodes (papers) of .

5. PRkeyword+pop Approach

The basic step of PRkeyword+pop is as follows (see Figure 5): first, we generate multiple minimum group Steiner trees (i.e., T(Q)) by employing the DP (dynamic programming) technique [17]; then, we generate optimal solutions (i.e., T1(Q)) by employing the PP (paper popularity) method.

Step 1. Minimum group Steiner trees generation based on an undirected paper citation graph.
This section mainly discusses employing the DP technique to solve a MGST (minimal group Steiner trees) problem. Specifically, the DP technique firstly breaks up the MGST problem into a series of simpler subproblems; next, each of the same subproblems is solved only once and the corresponding results are stored; finally, multiple solutions are effectively provided via combining the stored results, i.e., T(Q).
In this section, we treat all query keywords as K, i.e., K = Q. In the DP model, rooted at node is a state and it contains the users’ query keywords . Moreover, represents the number of nodes in . The state-transition equation of the DP model is as follows:where is a set of ’s neighbors in , i.e., , and . Formula (1) indicates the weight of a tree is 1 in the DP model owing to only covering one keyword node [27]. Formula (2) indicates that is obtained by using the following two operations: tree growth operation (i.e., formulas (3) and (4)) and tree merging operation (i.e., formulas (5) and (6)). In Figure 6(a), tree growth operation generates new by adding new node u (i.e., one of ’s neighbors) to . In Figure 6(b), tree merging operation generates new by merging two trees that are both having same root node. The pseudocode of these two operations is specified more formally in Algorithm 1 and Algorithm 2, respectively.

Input: K = {}
Output: Q1
(1)For each u ∈  do
(2)  If 1 +  < 
(3)    = 1 + 
(4)   
(5)   enqueue into Q1
(6)   update Q1
(7)  End If
(8)  If
(9)    +  < 
(10)    =  + 
(11)   enqueue into Q1
(12)   update Q1
(13)  End If
(14)  Return Q1
(15)End For
Input: K = 
Output: Q1
(1)For each u ∈  do
(2)  If
(3)   
(4)   enqueue into Q1
(5)   update Q1
(6)  End If
(7)  Return Q1
(8)End For
(9)
(10)For each is contained in K s.t do
(11)  If  < 
(12)   
(13)   enqueue into Q1
(14)   update Q1
(15)  End If
(16)  Return Q1
(17)End For

In Step 1, we repeat tree growth operation and tree merging operation to obtain a queue Q1. The pseudocode of obtaining Q1 and T(Q) is specified more formally in Steiner tree algorithm (Algorithm 3).

Input: K = 
Output: Q1and T(Q)
(1)Let Q1 = Ф
(2)For each do
(3)  If contains any nonempty keyword set
(4)  enqueue (, ) into Q1
(5)  End If
(6)End for
(7) Min_cou = ∞//the number of nodes
(8)While Q1 ≠ Ф do
(9)  dequeue Q1 to (,)
(10)  If  = K
(11)   If  < Min_cou
(12)    Min_cou = 
(13)   End If
(14)    Break
(15)  End If
(16)  Else tree growth
(17)  Else tree merging
(18)  Return Q1
(19)  Return T(Q)
(20)End While

Next, an intuitive example of Figure 7 shows the T(Q) generation processes according to K = . The trees rooted at nodes containing , or are enqueued firstly, i.e., , and are added in Figure 7(b). Since these eight trees only contain one node, the tree growth operation is performed in Figure 7(b). Fortunately, these nodes are both containing neighbor nodes, so trees connecting any one of nodes are generated in Figure 7(c). Next, tree growth operation is performed on Figure 7(c). For example, trees and can generate a new tree rooted at node , i.e., , but this operation is not tree merging operation as this tree does not contain new query keywords. Furthermore, some new generated trees are deleted as these trees are of no use, e.g., while tree can generate new tree , the new tree contains not only same query keywords but also more nodes. Therefore, five required trees are both retained in Figure 7(d). Next, we execute tree merging operations in Figures 7(d) and 7(f) and tree growth operation in Figure 7(e). Finally, the user obtains four minimal group Steiner trees in Figure 7(g).

Note that we consider the output results of Steiner trees algorithm may be entire graph, i.e., ; furthermore, the worst-case scenario is that our algorithm fails to recommend papers to users.

Step 2. Optimal solutions generation based on the minimum group Steiner trees.
According to the abovementioned algorithm, it is possible to return multiple qualified candidates, e.g., the output result of Figure 7. To ease the heavy burden of users’ paper selection decisions, we will select optimal solutions (i.e., T1(Q)) from the output results of Step 1. Generally, a higher citation frequency of papers often means a higher popularity of the papers. Thus, we use the PP (paper popularity) [28] method for selecting T1(Q) as follows:where nodes and belong to T(Q) and , respectively.  = 1 if cites in paper citation graph (i.e., ) and 0 otherwise.
Finally, we produce a ranking list in descending order according to the popularity of each candidate. Thus, PRkeyword+pop returns T1(Q) having the highest popularity among candidates. Note that we consider the recommendation result of PRkeyword+pop could be T(Q) as all candidates have same popularity.

6. Experiments

To demonstrate the usefulness of PRkeyword+pop, large-scale experiments are designed and tested.

6.1. Experimental Settings

Paper citation graph is extracted from the Hep-Th dataset [14], where the graph covers 8721 papers and each paper contains keyword information.

Generally, an author is allowed creating up to 6 index terms (i.e., keywords) in an article, so we create query keywords with up to 6 in our research. Here, we firstly set a series of experiments, i.e., set A, set B, and set C. In set A, all keywords of a paper are used as a query Q. This scenario emulates that users exactly provide query keywords for their research content. In set B, the query keywords are selected from different papers (in excess of one paper) randomly. This scenario emulates that users randomly provide query keywords. In set C, query keywords are selected from two papers randomly, which further verifies the feasibility of the Steiner trees algorithm. Here, we do not execute the PP method in set C. In addition, each experiment set is repeated 50 times and the average experimental results are adopted.

Currently, we conduct the following experimental evaluation:(1)Number of nodes: the less amount of recommended papers in a tree, (i.e., the higher correlation of the tree), the better of recommendation approach.(2)Success rate [17]: the number of recommended papers is smaller than twice the number of query keywords, and the recommendation result is successful.(3)Average paper popularity (APP) [29]: the APP is defined as follows:where m is the number of T1(Q) and is the number of nodes in T1(Q).(4)Computation time: the consumed time for generating T1(Q) in sets A and B and T(Q) in set C, respectively.(5)Precision [30]: precision is calculated as follows:where TP denotes a set of papers containing query keywords.(6)Recall [30]: recall is calculated as follows:where in set C. is a set of papers cited by .(7)F1 score: F1 score is calculated as follows:

To the best of our knowledge, some of approaches address the papers recommendation issue by using papers’ relationships. Thus, we compare PRkeyword+pop with four approaches that are adapted from [17, 31, 32].Baseline 1 (Paper-Random [17]): this approach randomly selects a set of nodes that collectively cover all query keywords. Next, the approach finds minimum Spanning trees that interconnect the selected nodes. Finally, we can obtain optimal minimum Spanning trees by executing the PP method.Baseline 2 (Paper-Greedy [17]): likewise, the approach is randomly selecting a set of nodes that collectively cover all query keywords. Next, the approach regards the selected nodes as initial root nodes and continuously grows trees until these nodes are interconnected. Furthermore, the greedy heuristic algorithm is applied in the tree grow process. Finally, we also use the PP method for obtaining optimal solutions.Baseline 3 (Random Walk (RW) [32]): RW runs on 2-layer graph, i.e., the undirected paper citation graph and the built paper-keywords graph. In addition, each query only uses users’ entered keywords: q = , and this approach only executes the keywords query of set C.Baseline 4 (Random Walk Restart (RWR) [31]): RWR runs on same 2-layer graph. Furthermore, this approach only executes the query keywords of set C, i.e., q = . Here, if the state vector of RWR has been growing linearly in the experiments, the approach achieves linear convergence.

The experiments are conducted on a machine with Intel(R) Core(R) CPU @3.0 GHz, 16 GB RAM and Windows 10 @ 1809. The software configuration environment: Windows 10 @ 1809 and Python 3.6.

6.2. Experimental Results
6.2.1. Profile 1: The Number of Recommended Nodes of Different Approaches

In this profile, we contrast the number of returned papers of PRkeyword+pop with two approaches (i.e., Paper-Greedy and Paper-Random). As shown in Figure 8, the number of the users’ query keywords ranges from 2 to 6. Furthermore, the quantity of recommended papers in our approach increases with the number of query keywords increasing, which is because the returned solutions including more papers can satisfy more query keywords requirements of users. For Paper-Greedy and Paper-Random, when the number of query keywords equals to 6 in sets A and C, or the number of queried keywords equals to 4 in set B, and they obtain maximum papers quantity. In addition, these experiments results show that our proposal acquires a smaller number of recommended papers than these two approaches. As the smaller number of recommended papers can guarantee higher correlation among papers, PRkeyword+pop is superior to Paper-Greedy and Paper-Random.

6.2.2. Profile 2: The Success Rate of Different Approaches

In the profile, we compare the success rates of different approaches. As shown in Figure 9, the experiment results of the different approaches are very different in different experiment sets. Facing to the different experiment scenarios, Figure 9 presents that our proposal can effectively answer the users’ keyword query and the success rate is 100%. However, Paper-Random and Paper-Greedy are difficult to get successful solutions as the number of the users’ query keywords increasing; especially, the success rates of these two approaches are both equal to 0 in set B. Again, the experiment results present that our proposal can effectively acquire solutions than Paper-Greedy and Paper-Random.

6.2.3. Profile 3: The Average Paper Popularity of Different Approaches

In both sets A and B, we compare different approaches by utilizing the average paper popularity. As shown in Figure 10, these experiment figures show that the average paper popularity of Paper-Random and Paper-Greedy are both larger than PRkeyword+pop. That is because the number of recommended papers obtained by Paper-Random and Paper-Greedy are both in excess of our approach; moreover, each recommended paper is cited more than once. In practice, the solutions of Paper-Random and Paper-Greedy will be seldom selected as these solutions take users a serious amount of time and energy to do some unnecessary research studies. In my opinion, Figure 10 presents that the average paper popularity of PRkeyword+pop is allowable and receivable in the case of satisfying users’ query keywords requirements.

6.2.4. Profile 4: The Computation Time of Different Approaches

In Profile 4, we contrast the time consumption of different recommendation approaches. As shown in Figure 11, PRkeyword+pop, Paper-Random, and Paper-Greedy spend more time getting solutions with the number of the users’ query keywords increasing. Furthermore, we only calculate the time of RW and RWR in set C, and their time is a constant value. As Paper-Random and Paper-Greedy use extremely simple heuristic for selecting papers, these two approaches spend fewer time than PRkeyword+pop in most cases. In addition, RW and RWR are both spending a lot of time than our proposal as these two approaches need to do a significant amount of iterative operations and matrix operations in experiments. While our proposal takes time to obtain solutions, the time consumption of PRkeyword+pop is allowable and receivable in most really cases for users. That is because this is the price to pay if users take fewer time and energy to effectively achieve their research goal.

6.2.5. Profile 5: The Precision of Different Approaches

As shown in Figure 12, these three figures present that the precision of different approaches, respectively. Luckily, our proposal can accurately answer the users’ keyword query, and the precision of three different experiment sets are both 100%.

For Paper-Random and Paper-Greedy, their precision ranges from 10% to 45%. Therefore, whether users can accurately or randomly offer query keywords, our approach can accurately answer the users’ keyword query, and the recommended results better satisfy users’ query requirements. Furthermore, these experiment results show further that users may spend fewer time and energy on realizing their research aim.

6.2.6. Profile 6: The Recall Rate and F1 Score of Different Approaches

In this profile, we firstly contrast the recall rate of different approaches in set C. According to Figure 8, the number of recommended papers of our approach is not exceeding 30, so the number of recommended papers of RW and RWR are 10, 20, and 30, respectively; the recall rate of RW and RWR take the average value among the three. In Figure 13(a), the recall rate of PRkeyword+pop ranges from 4% to 21%; the recall rate of Paper-Random and Paper-Greedy range from 39% to 54%; and the recall rate of RW and RWR are less than 9.5%. In addition, we also compare the F1 score of our approach with Paper-Random and Paper-Greedy. In Figure 13(b), the F1 score of our proposal ranges from 9% to 34% and the F1 score of Paper-Random and Paper-Greedy range from 33% to 44%. As the number of returned papers of Paper-Random and Paper-Greedy are in excess of PRkeyword+pop, the recall rate and the F1 score of our proposal are less than these two approaches. Furthermore, when the number of query keywords is not equal to 3, the recall rate of RW and RWR are both less than our approach. In conclusion, the recall rate and F1 score of Figure 13 can directly verify the feasibility of our proposal.

Currently, recommender techniques play vital roles in many research areas. Furthermore, recommendation methods can be mainly classified into three categories: collaborative filtering (CF), content-based filtering (CBF), and graph-based approaches.

7.1. Collaborative Filtering

The early work on paper recommendation mainly explored the use of collaborative filtering (CF) techniques. For example, McNee et al. [33] mainly focused on the rating matrices in paper citation networks. In addition, Pennock [34] proposed a personality diagnosis method based on a Bayesian network as their considered rating frequency of other users made a difference to user’s ratings of items. Furthermore, McNee et al. [33] combined CF method with the cited frequency of papers to recommend papers, which was because they [33] considered that the number of citations of a paper had a vital effect on papers’ ratings. In addition, if there were interactions between users and items in implicit collaborative filtering, it was recorded as 1, otherwise 0. However, 1 or 0 did not indicate positive or negative factors between users and items that generated the interaction [35]. According to users’ query keywords, CF approaches can effectively recommend papers to users, but these approaches are generally limited by some problems, e.g., the cold start problem and the data sparsity problem [36].

7.2. Content-Based Filtering

To further ameliorate paper recommendation approach, some researchers further explored content-based filtering (CBF) approaches. Generally, CBF [37] approaches attempted to retrieve papers with respect to textual content and it was not using rating relationships. For example, Alzoghbi et al. [38] examined the preferences among papers by their proposed two different validation mechanisms and recommended interesting papers to users. Furthermore, Wang and Blei [39] combined a topic model with the collaborating filter to propose paper recommendation approach, named CTR. The CTR firstly used LDA to find latent topics for papers, and this approach inferred user-item relations by using matrix factorization. In fact, the CBF approach suffers from traditional information retrieval issues, e.g., the semantic ambiguity problem. Furthermore, gathering and dealing with the relevance information of papers is often time consuming.

7.3. Graph Model

Currently, the papers’ relationships can further reflect the future research trends of paper recommendation, which is mainly because the correlation relationships among papers can indicate the correlation of papers’ research contents. For example, Meng et al. [31] regarded authors, papers, topics, and keywords as nodes and their relationships as edges, and the approach recommended academic papers by executing the random walk on a four-layer heterogeneous graph. Furthermore, Gori and Pucci [32] proposed the graph-based PageRank-like recommendation approach that performed the biased random walk on paper citation graphs, and the approach further emphasized on the correlations among citations. In addition, Wu and Sun [40] thought that three different types of paper citation networks could be constructed based on papers’ citation relationships, i.e., directly connected network, coupling network, and cocitation network. Liang et al. [41] have proved that the cocitation relationships of cocitation network could be employed in paper recommendation, e.g., if two papers were both cited by more same papers, these two papers had high relevance and were highly likely to be recommended simultaneously.

In fact, the correlation relationships [42] could be formed in a paper citation graph as most papers selected their references based on the content similarity. Thus, we use the paper citation graph for establishing an undirected paper citation graph. On the undirected paper citation graph, our proposal (i.e., PRkeyword+pop) efficiently recommends a set of satisfactory papers to users. Finally, extensive experiments results validate the usefulness and feasibility of PRkeyword+pop approach.

8. Conclusions and Future Work

Whether a set of satisfactory papers will be recommended to users is very important paper discovery and paper selection tasks, which is known as paper recommendation problem. Here, we propose a novel keywords-driven and popularity-aware approach (i.e., PRkeyword+pop) to return a set of satisfactory papers, i.e., these papers not only collectively cover users’ query keywords but also have higher correlation and popularity among papers. Furthermore, these recommendation results support users in doing deep and continuous research on a certain topic or domain. In addition, the experiment results further show the usefulness and feasibility of our proposal.

Although our work shows desirable results, there are still some aspects worth further research and improvement. Since users cannot analyze research requirements in detail, e.g., the required data types [4345], the recommended results may fail to return satisfactory results. Furthermore, we may face the sparsity problem of the existing paper citation graph. Hence, the abovementioned research contents are to further study and progress.

Data Availability

The experiment dataset Hep-Th used to support the findings of this study has been deposited in “http://snap.stanford.edu/data/cit-HepTh.html.”

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported in part by the National Key Research and Development Program of China (Grant no. 2017YFB1001800), National Natural Science Foundation of China (Grant no. 61872219), and Natural Science Foundation of Shandong Province (Grant no. ZR2019MF001).