1 Introduction
2 Research methodology
3 Basic techniques in recommender systems
3.1 Aspects of recommender systems
3.2 Basic recommendation techniques
4 Recommendation scenarios in the healthcare domain
Rec. scenarios | Papers | Rec. techniques | Functionalities |
---|---|---|---|
Food recommendation | (Aberg 2006) | CF, CB | resolves the malfunction of the elderly |
(Achananuparp and Weber 2016) | context-based rec. | proposes healthy food for users | |
CF, group rec. | generates food recommendations to a family | ||
(Elsweiler et al. 2017) | ingredient network | generates similar but healthier recipe replacements | |
(Rehman et al. 2017) | ant colony algorithm | generates optimal food lists for users | |
(Rokicki et al. 2015) | clustering | recommends healthy menus to users | |
(Ueta et al. 2011) | goal-oriented recipe rec. | provides suitable nutrient to treat a user’s health problems | |
Drug recommendation | (Bankhele et al. 2017) | CF | suggests proper drugs to diabetes patients |
(Bresso et al. 2013) | decision trees, inductive logic programming | specifies side-effect profiles of drugs | |
Pharmacosafety Networks | provides accurate drug side-effects predictions | ||
ontologies, multi-criteria decision making | proposes anti-diabetic drug recommendations | ||
ontologies, rule-based rec. | recommends proper drugs | ||
structure-activity, structure - property relationships | predicts drug side effects | ||
(Huang et al. 2011) | support vector machine, logistic regression | predicts drug side effects | |
(Mahmoud and Elbeh 2016) | ontologies, rule-based decision making | suggests medicines with dose prescriptions | |
(Medvedeva et al. 2007) | case-similarity retrieval system | assists doctors in optimizing treatments for their patients | |
(Rodríguez et al. 2009) | semantic web | provides patients with drugs to heal a pathology | |
(Shimada et al. 2005) | risk-level classification | assists doctors in selecting first-line drugs | |
(Stark et al. 2017) | CF, graph-based rec. | provides precise drugs to migraine-disease patients | |
(Yamanishi et al. 2012) | sparse canonical correlation | predicts potential side-effect profiles of drug candidate molecules | |
(Zhang et al. 2016) | CB | predicts missing side effects of approved drugs | |
Health status prediction | (Hussein et al. 2012) | random forest classification | predicts disease risks of patients |
CF | predicts risk factors of chronic-disease patients | ||
Physical activity recommendation | (Ali et al. 2018) | context-aware rec., knowledge-based rec. | provides patients with physical activities |
(Dharia et al. 2016) | CF, CB | suggests personalized workout-session recommendations | |
ontologies, semantics technologies | recommends proper exercises to users | ||
Healthcare professionals recommendation | (Gujar et al. 2018) | coreNLP techniques | generates doctor recommendations to patients |
(Han et al. 2018) | CF, CB | recommends family-doctor to patients | |
(Narducci et al. 2015) | CF | suggests doctors and hospitals according to the patient’s profile | |
(Zhang et al. 2016) | hybrid matrix factorization | predicts the rating of doctors for patients |
4.1 Food recommendation
4.2 Drug recommendation
4.2.1 Drug recommendation for curing diseases
age | insulin | glucose | BMI | BP | triceps thickness | |
---|---|---|---|---|---|---|
Tom (active patient) | 25 | 25.4 | 86.0 | 27.9 | 67 | 23 |
patient1 | 27 | 25.9 | 166 | 25.8 | 72 | 19 |
patient2 | 48 | 33 | 118 | 46 | 82 | 41 |
patient3 | 27 | 28 | 89 | 31 | 70 | 20 |
patient4 | 53 | 43.2 | 195 | 30.7 | 69.7 | 45 |
-
Filter out patients who are similar to the active patient in terms of gender (male/female), aura (yes/no), and the type of migraine (acute/chronic).
-
Calculate the similarity level between each neighbor and the active patient according to the following features: age, allergies, disease history, preexisting conditions, current drug prescription, and blood pressure. Each feature is weighted depending to its importance. For instance, age and disease history are more important than other features. Therefore, these features have a higher weight compared to others: wage = wdiseaseHistory = 3 and wallergies = wpreexistingConditions = wbloodPressure = 1.
-
Sum up all features’ scores. Only drugs consumed by the patients who are at least 80% similar to the current patient will be included in the recommendation.
4.2.2 Predict drug side effects
-
Step 1: Calculate drug-drug similarity based on side effect profiles. Given two drugs di and dk whose side effect profiles are Si and Sk, the Jaccard similarity is used to calculate their similarity sim(i, k) (see Formula (2)).$$ sim(i,k) = \frac{|S_{i} \cap S_{k}|}{|S_{i} \cup S_{k}|} $$(2)\(sim(d,d_{1}) = \frac {2}{4} = \textbf {0.5} \checkmark \); \(sim(d,d_{2}) = \frac {1}{4} = 0.25\)\(sim(d,d_{3}) = \frac {3}{4} = \textbf {0.75} \checkmark \); \(sim(d,d_{4}) = \frac {1}{4} = 0.25\)
-
Step 2: A set of neighbor drugs of the target drug d are determined by filtering similarity scores with a pre-defined threshold 𝜃. In this example, we assume 𝜃 = 0.5, which means only drugs d1 and d3 are selected to be the neighbors of d.
-
Step 3: Calculate the probability of drug di inducing side effect sj - prob(di, sj) by aggregating the known side effect sj of its neighbors (see Formula (3)).
Drugs | Side effects | |||||
---|---|---|---|---|---|---|
s1 | s2 | s3 | s4 | s5 | s6 | |
d (target drug) | 1 | 1 | 0 | 0 | ? | ? |
d1 | 0 | 1 | 0 | 1 | 0 | 1 |
d2 | 1 | 0 | 1 | 1 | 0 | 1 |
d3 | 1 | 1 | 0 | 1 | 1 | 1 |
d4 | 0 | 1 | 1 | 1 | 0 | 1 |
4.3 Health status prediction
-
Step 1: Calculate the similarity between patient a and each patient i ∈ I using Formula (4), where va, j is the vote of patient a for risk factor j:$$ w(a,i) = \sum\limits_{j \in J}\frac{v_{a,j}}{\sqrt{{\sum}_{k \in {J_{a}}}v_{a,k}^{2}}}\frac{v_{i,j}}{\sqrt{{\sum}_{k \in {J_{i}}}v_{i,k}^{2}}} $$(4)
-
Step 2: Find the most similar patients to patient a based on the similarity scores. The most similar patient has the greatest similarity score.
-
Step 3: Calculate the prediction score of a risk factor j which have not been faced by patient a using Formulae (5) - (7), where p(a, j) is the prediction score for the patient a on risk factor j, \(\overline {v_{j}}\) is the average vote of all patients who have faced risk factor j, w(a, i) is the similarity between patients a and i (see Formula (4)), and |Ij| is the number of patients who have faced risk factor j. The normalized constant k ensures the prediction score within the range of possible votes.$$ p(a,j) = \overline{v_{j}} + k_{a}(1-\overline{v_{j}})\sum\limits_{i \in {I_{j}}}w(a,i) $$(5)$$ \overline{v_{j}}=\frac{|I_{j}|}{|I|} $$(6)$$ k_{a} = \frac{1}{{\sum}_{i \in I}w(a,i)} $$(7)
similar | nerve | eye | slow | skin | kidney | hearing | heart | w(Maria, pi) |
---|---|---|---|---|---|---|---|---|
patients | damage | damage | healing | issues | damage | impairment | disease | |
Maria | 1 | 1 | 1 | 1 | ? | ? | ? | |
p1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0.82 |
p2 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0.89 |
p3 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.82 |
p4 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.76 |
4.4 Physical activity recommendation
-
Module 1 (Data acquisition and processing), which stores the demographic information and preferred activities of users collected from sensory devices.
-
Module 2 (Context generation), which saves the current activity, location, weather conditions, and emotional state of the user.
-
Module 3 (Expert knowledge repository), which represents rules as IF-THEN form, which are then adopted to create recommendations. For instance, “IF a patient is pregnant and facing the gestational diabetes mellitus, THEN she should do a 20-30 minute moderate-intensity exercise on almost every day of the week” (Colberg et al. 2016).
-
Module 4 (Multi-stage recommender), which utilizes the user information collected from Modules 1 and 2 to create a comprehensive recommendation to the user. The recommendation process is done in two stages. In Stage 1, the system calculates the user’s calorie-burn, in-take targets, and a generic set of physical activity recommendations. Additionally, a case-based reasoning mechanism is used to infer the most relevant rules from the knowledge-base. In Stage 2, the recommendations generated in Stage 1 are refined in a personalized manner. A contextual matrix is created to recommend suitable activities to the user at a given time. This matrix is calculated based on the user’s surveyed results to filter out proper physical activities in different contexts. For instance, “since the user is now staying at home, stretching seems to be more appropriate for him than running”.
-
Module 5 (Explanations of suggested activities), which are sent together with recommendations to describe as to why a specific physical activity has been recommended to the user. For instance, “you should run at least one hour daily to improve your current health condition and meet one of your calorie-burn targets”. Additional explanations based on the context can also be provided, e.g., “it is quite cold today, hence consider to bring a sports jacket with you before going out”.
4.5 Healthcare professional recommendations
-
Use case 1 (New patient): The patient has recently joined the network, and only basic demographic information is available. The CB recommendation is used to create recommendations based on similar demographic profiles.
-
Use case 2 (Existing patient with no interactions with primary care doctors): The patient has already visited specialists or hospitals, but has not visited family doctors yet. The activities of other patients in previous visits are utilized to narrow down the doctor list. Besides, a complementary data set describing hospital inpatient procedures and certain types of diseases of patients are used to create the patient profiles and then generate recommendations using the CB recommendation approach.
-
Use case 3 (Existing patient with prior interactions with primary care doctors): The CF recommendation approach is applied to look for doctors visited by similar patients (i.e., patients who have visited the same doctors earlier).
-
k and n are the numbers of conditions of patients p and p′ respectively.
-
z and r are the numbers of treatments of patients p and p′ respectively.
-
pc and pc′ are the conditions of patients p and p′ respectively.
-
pt and pt′ are the treatments of patients p and p′ respectively.
-
\(s_{c}(p_{c_{i}},p'_{c_{j}})\) is the similarity score between the condition ci of patient p and the condition cj of patient p′ (see Formula (9)). If these two conditions are the same, then this score is the logarithm of the ratio between the number of conditions in the database (#C) and the number of patients affected by that condition (\(P_{c_{i}}\)). Otherwise, the sc is computed as the number of edges in the shortest path sp, which connects the two conditions in the disease hierarchy10. The idea of this rule is to figure out whether two patients are affected by similar disease conditions. For instance, dilated cardiomyopathy and coronary artery conditions of two patients can be considered the same since they both refer to heart-muscle failures. In this context, the experiences of consulted doctors/hospitals of this patient could be useful for another (Narducci et al. 2015).
-
\(s_{t}(p_{t_{i}}, p^{\prime }_{t_{j}})\) is the similarity score between the treatment ti of patient p and the treatment tj of patient \(p^{\prime }\) (see Formula (10)).
-
α refers to the contribution of conditions and treatments to patients’ similarity.
-
β indicates the weight of the community (patients) and the ministry indicator.
consulted doctor | visited hospital | |||||
---|---|---|---|---|---|---|
patient | condition | treatment | name | rating | name | rating |
p1 | coronary artery | statins | X | 4.1 | A | 4.2 |
p2 | mitral regurgitation | surgery | Y | 4.9 | B | 4.3 |