Introduction
-
The hardness of modeling the user preferences and item features from spare explicit feedback. In most past studies, ratings are used as the only criterion of feedback information to measure the degree of user preference for the specified item. However, the ratings only reflect the overall satisfaction over an item without further details. As shown in Fig. 1, on the website Amazon both users buy the album “California Girls” with positive reviews, while user1 also buys the album “Pet Sounds” but give a negative review. Without consideration of the reviews, existing rating-based methods would recommend “Pet Sounds” to the user2 as it was rated by user1. However, it turns out improper recommendation. Thus, the improper recommendation is happened when only considering user connectivity but ignoring the reviews.
-
The semantic and item irregularities are hidden in the user’s successive actions. For the majority of users, the sequential dependencies of their interaction behaviors are not strictly ordered. Due to the uncertainty of user shopping behaviors, the main purpose behind the user behavior sequence is not clean. Thus, in the real world, some user behavior sequences are not strictly ordered, i.e., not all adjacent interactions are sequentially dependent in a sequence. For instance, as shown in Fig. 2, given the historical interaction sequence of a user as: S = {Music_Player 1, Music_Player 2, Music_Player 3, Sports Tracker}. It seems that the first three items in S indicate that the user has a higher probability of buying a music player next, than buying sports tracker software. However, it is not a valid recommendation, since the user chooses the sports tracker app after three months. Hence, this kind of temporal distance deserves specific handling.
-
R.Q. How to capture the collective sequential dependencies with a flexible orders become a key challenge in sequential recommendation domain?
-
First, compared to the ratings, review texts are not only much more expressive than ratings, but also provide a strong tool to explain the underlying dimensions behind users’ decisions. On that basis, we employ the reviews to build the sequential recommendation model with more accurate user and item representations. Specifically, user-provided reviews can be viewed from the aspects of user and item. From the user aspect, a user’s review set reflects the experience of buying diverse items. From the item aspect, an item’s review set includes the reviews written to an item, and usually exhibits the various features of the specific item. Therefore, learning the user and item representations from reviews has a strengthening effect on sequential recommendation. In Fig. 3, we can observe that user u has a diverse purchase behavior from the user-aspect reviews, and reflects the user’s preference more accurately. And the item-aspect reviews exhibit the various features of the specific item v from different users. In this case, the matching aspect reviews significantly help recommend models to predict the user’s next purchase item, and u indeed marked a 5.0 score on v after purchasing it.
-
Second, we also notice that there is no strict order and time between user sequential behaviors due to the uncertainty of user behaviors. The existing time-aware sequential recommenders [11, 12] always assume that the items in sequence can be considered evenly spaced and semantically consistent. However, in the practical recommendation scenarios, the user’s behavior sequence is complex. As shown in Fig. 4, the time intervals between two adjacent reviews can be various, with the max interval of two adjacent reviews being about 160 days past. Intuitively, the two actions within a short time interval tend to share a closer relationship than two actions within a long time interval. Thus this kind of temporal distance deserves special handling. To do it, we leverage a couple of convolutional fitters with varying sizes to effectively learn the user and item latent factors with flexible order.
-
We propose RTiSR, a novel review-driven time interval-aware framework that exploits reviews with time interval information for a sequential recommendation, which captures sequence dependencies from the aspect-aware review sequence respectively.
-
We introduce the flexible sequential pattern learning layer to learn the collective sequential dependencies with flexible order. We regard the embedded sequential reviews with explicit time interval information as an image, then employ multi-size convolution fitters to capture the collective sequential dependencies with flexible order, rather than the point-wise way.
-
We conduct extensive experiments deployed on five real-world datasets. The experimental results demonstrate that the RTiSR achieves competitive and superior HR/NDCG results, compared to SOTA methods.
Related work
Sequential recommender
Review-based recommender
Methodology
Problem statement
Symbols | Definitions |
---|---|
U , \( \mathcal {V} \) | The user set and item set |
\( u\in {U} \), \( v \in \mathcal {V} \) | A user and an item |
\( \mathcal {S}^u \), \( \mathcal {S}^v \) | The behavior sequences of user u and item v |
\( \mathcal {D}^u \), \( \mathcal {D}^v \) | Review sequence of user u and item v corresponding to \( \mathcal {S}^u \), \( \mathcal {S}^v \) |
\( T^u \), \(T^v \) | Timestamp sequence of user u and item v corresponding to \( \mathcal {S}^u \), \( \mathcal {S}^v \) |
\( D_v^u \)
| User u writes a review on item v |
\( y^u_v \), \( \hat{y}^u_v \) | The true and estimated rating |
\( \textbf{o}^u, \textbf{o}^v \)
| User and item representation with size of \( d_o \) |
m
| The number of words in a review text |
\( n_c \)
| The number of convolution filters |
\( d_w \)
| The dimension of each word vector |
\( d_l \)
| The dimension of hidden state in LSTM |
\( d_o \)
| The dimension of user/item representation |
\( \oplus \)
| The concatenation operator |
\( \odot \)
| The element-wise product operator |
The architecture of RTiSR
The time interval-aware review embedding layer
The flexible sequence pattern learning layer
The prediction layer
Model optimization
Experiments
-
RQ1: How does RTiSR perform as compared with state-of-the-art sequence and review based recommendation models;
-
RQ2: What is the influence of aspect reviews, time interval aware embedding and convolutional layer in RTiSR;
-
RQ3: How do the key hyperparameters affect the performance of RTiSR, such as the dimension (\( d_o \)) of user/item latent factors, the height size (h) of convolution filter.
Experimental settings
Datasets | # users | # items | # reviews | Density (%) |
---|---|---|---|---|
Yelp | 366,715 | 60,785 | 1,569,264 | 0.007 |
Beer | 40,213 | 110,419 | 2,924,127 | 0.066 |
LB | 19,947 | 1798 | 23,799 | 0.066 |
Auto | 2928 | 1835 | 20,473 | 0.381 |
MIs | 1429 | 900 | 10,261 | 0.798 |
-
BPRMF [37]: It is Matrix Factorization-based ranking algorithm that optimizes the pairwise ranking loss with implicit feedback. It is a popular baseline for item recommendation.
-
DeepCoNN [27]: This is a state-of-the-art review-based recommendation method, which leverages convolution neural network to jointly model the users and items from reviews.
-
SLRC [38]: It introduces Hawkes Process into Collaborative Filtering (CF), and explicitly addresses two item-specific temporal dynamics: short-term effects and life-time effects.
-
CFKG [39]: It is a knowledge-based representation learning approach that embeds heterogeneous entities for personalized recommendation, which incorporates the defined user-item knowledge-graph structure to improve the recommendation performance.
-
CORE [40]: It is a sequential recommendation method with a representation consistency encoder for representing sequence embeddings and item embeddings in the same space. This method is the state-of-the-art baseline for sequential recommendation.
-
SINE [41]: It is a state-of-the-art sequential recommendation method that simultaneously considers multiple interests of a user and aggregates them to predict the user’s current intention.
-
LightSANs [42]: It is a novel transformer-based sequential recommendation, which introduces the low-rank decomposed self-attention, which projects the user’s historical items into a small constant number of latent interests and leverages item-to-interest interaction to generate the context-aware representation.
Overall performance (RQ1)
Dataset | Metric | BPRMF | CORE | SINE | LightSANs | SLRC | DeppCoNN | CFKG | RTiSR | Impv. | p-value |
---|---|---|---|---|---|---|---|---|---|---|---|
MIs | HR@5 | 0.4033 | 0.5976 | 0.5153 | 0.5477 | 0.5213 | 0.4136 | 0.5250 | 0.5951 | – | – |
NDCG@5 | 0.2921 | 0.3942 | 0.3957 | 0.4078 | 0.3890 | 0.2923 | 0.3625 | 0.4207 | 3.16% | 8.9e–2 | |
HR@10 | 0.4551 | 0.5869 | 0.5612 | 0.5698 | 0.5890 | 0.4741 | 0.5548 | 0.6874 | 16.71% | 1.1e–4\(^\star \) | |
NDCG@10 | 0.3007 | 0.4056 | 0.4034 | 0.4248 | 0.4038 | 0.3136 | 0.3890 | 0.4569 | 7.56% | 4.9e–2\(^\star \) | |
HR@20 | 0.4568 | 0.5935 | 0.5655 | 0.5786 | 0.5992 | 0.4947 | 0.5660 | 0.6957 | 16.10% | 8.2e–4\(^\star \) | |
NDCG@20 | 0.3215 | 0.4104 | 0.4099 | 0.4292 | 0.4127 | 0.3531 | 0.3977 | 0.4599 | 7.15% | 4.2e–2\(^\star \) | |
Auto | HR@5 | 0.4417 | 0.6097 | 0.5559 | 0.6068 | 0.5023 | 0.4319 | 0.5501 | 0.6207 | 1.80% | 1.3e–2\(^\star \) |
NDCG@5 | 0.2721 | 0.4660 | 0.4146 | 0.4703 | 0.3140 | 0.2812 | 0.3007 | 0.3861 | – | – | |
HR@10 | 0.4668 | 0.6107 | 0.5663 | 0.6456 | 0.5662 | 0.4638 | 0.5945 | 0.7039 | 9.03% | 4.9e–2\(^\star \) | |
NDCG@10 | 0.3102 | 0.4448 | 0.4200 | 0.4874 | 0.3871 | 0.3008 | 0.3915 | 0.4833 | – | – | |
HR@20 | 0.4725 | 0.6187 | 0.5925 | 0.6591 | 0.5810 | 0.4850 | 0.6110 | 0.7099 | 7.71% | 7.2e–2 | |
NDCG@20 | 0.3320 | 0.4471 | 0.3274 | 0.3842 | 0.4020 | 0.3122 | 0.4201 | 0.4929 | 10.24% | 4.8e–2\(^\star \) | |
LB | HR@5 | 0.4359 | 0.5269 | 0.5311 | 0.4973 | 0.4903 | 0.4759 | 0.4779 | 0.5531 | 4.14% | 7.5e–3\(^\star \) |
NDCG@5 | 0.2805 | 0.3639 | 0.3295 | 0.3538 | 0.3656 | 0.3032 | 0.3137 | 0.3992 | 9.19% | 6.1e–4\(^\star \) | |
HR@10 | 0.4620 | 0.5279 | 0.5347 | 0.5016 | 0.5243 | 0.4904 | 0.5021 | 0.6075 | 13.62% | 2.0e–3\(^\star \) | |
NDCG@10 | 0.2917 | 0.3791 | 0.3362 | 0.3581 | 0.3907 | 0.3426 | 0.3383 | 0.4251 | 8.80% | 2.0e–2\(^\star \) | |
HR@20 | 0.4801 | 0.5323 | 0.5404 | 0.5189 | 0.5309 | 0.5011 | 0.5201 | 0.6089 | 12.68% | 3.1e–3\(^\star \) | |
NDCG@20 | 0.3382 | 0.4421 | 0.3434 | 0.3618 | 0.4102 | 0.3213 | 0.3566 | 0.4315 | – | – | |
Beer | HR@5 | 0.3625 | 0.4938 | 0.4611 | 0.5173 | 0.4526 | 0.3820 | 0.3661 | 0.5617 | 8.58% | 4.1e–2\(^\star \) |
NDCG@5 | 0.2496 | 0.3624 | 0.3633 | 0.3558 | 0.2926 | 0.2817 | 0.2457 | 0.3715 | 2.26% | 2.7e–3 | |
HR@10 | 0.3998 | 0.5793 | 0.5673 | 0.5769 | 0.4725 | 0.3947 | 0.3870 | 0.5921 | 2.21% | 7.3e–3\(^\star \) | |
NDCG@10 | 0.2651 | 0.3963 | 0.3706 | 0.3616 | 0.3322 | 0.2920 | 0.2681 | 0.4107 | 3.63% | 0.17 | |
HR@20 | 0.4195 | 0.5868 | 0.5908 | 0.5838 | 0.5026 | 0.4218 | 0.4102 | 0.5026 | – | – | |
NDCG@20 | 0.3005 | 0.3556 | 0.3987 | 0.3348 | 0.3539 | 0.3198 | 0.3108 | 0.4211 | 5.62% | 0.18 | |
Yelp | HR@5 | 0.4331 | 0.4868 | 0.5166 | 0.5326 | 0.4903 | 0.4741 | 0.4726 | 0.5637 | 5.84% | 3.9e–2\(^\star \) |
NDCG@5 | 0.2521 | 0.3601 | 0.3492 | 0.3439 | 0.3107 | 0.3112 | 0.2635 | 0.3826 | 6.25% | 3.6e–2\(^\star \) | |
HR@10 | 0.4537 | 0.5159 | 0.5227 | 0.5411 | 0.5130 | 0.5001 | 0.4807 | 0.6139 | 13.45% | 1.2e–2\(^\star \) | |
NDCG@10 | 0.2813 | 0.3956 | 0.3682 | 0.3457 | 0.3261 | 0.3138 | 0.2840 | 0.4217 | 6.60% | 0.10 | |
HR@20 | 0.4807 | 0.5338 | 0.5556 | 0.5469 | 0.5337 | 0.5238 | 0.5010 | 0.6219 | 11.93% | 3.8e–3\(^\star \) | |
NDCG@20 | 0.3101 | 0.3964 | 0.3685 | 0.3514 | 0.3740 | 0.3329 | 0.3124 | 0.4361 | 10.02% | 4.5e–2\(^\star \) | |
Statistic | Win/Loss | (18/0) | (17/1) | (17/1) | (16/2) | (18/0) | (18/0) | (18/0) | |||
F-rank\(^{a} \) | 1.33 | 6.23 | 5.13 | 5.67 | 4.65 | 2.30 | 3.00 | 7.68 |
-
RTiSR consistently outperforms most baselines on all datasets, which achieves the best performance and the highest F-rank value. Compared to the strongest baselines on HR or NDCG, RTiSR still achieves different degrees of improvement on the five datasets. In detail, RTiSR average outperforms the strongest baselines by 8.4%, 4.7%, 8.1%, 3.7%, 9.0% on dataset MIs, Auto, LB, Beer, Yelp, respectively. It demonstrates the effectiveness of RTiSR, which is attributed to better capturing the union-level sequential dependencies with a flexible order. Moreover, the superior performance of RTiSR reflects the rationality of utilizing user reviews and exact time interval information in improving the recommendation performance.
-
In most conditions, sequential recommendation methods like CORE and TiSASRec SINE perform better than the CF-based methods (i.e., BPRMF, CFKG) and review-based methods (i.e., DeppCoNN). The main reason stands for the significant performance gap: although DeppCoNN and CFKG utilize the user review and knowledge-graph to enhance the user and item representation, the capacity of BPRMF, DeppCoNN, and CFKG are limited in modeling user preference without considering the sequential dependencies hidden in the user behaviors.
-
Our proposed method shows significant improvement \((\textit{p}\hbox {-value}\leqslant 0.05)\) on all datasets compared to the SOTA baseline. The reasons for this performance gap are (1) compared with CF-based methods (e.g., CFKG, and SLRC), RTiSR can capture users’ dynamic preferences by modeling users’ sequential patterns in historical interactions. (2) compared with sequential recommendation models (e.g., CORE, SINE, and LightSANs), RTiSR introduces the aspect review information to improve the user and item embedding quality. Based on it, RTiSR regards the embedded sequential reviews with explicit time interval information as an image, then employs multi-size convolution fitters to capture the collective sequential dependencies with flexible order, rather than the point-wise way. (3) Compared with the review-based recommendation method (i.e., DeepCoNN), RTiSR improves the performance of sequence recommendation by considering time interval information to assign dynamical weights to different reviews in aspect-aware review sequence.
Ablation study (RQ2)
-
RTiSR-NoTC: Both time interval model and convolution filter are not used in this variant.
-
RTiSR-NoC: It only uses the time interval model to enhance the user/item representation.
-
RTiSR-NoT: It only integrates the convolution filter into the RTiSR framework.
-
RTiSR-UT: We introduce the time interval-aware model in the U-Net, which is equivalent to only considering the temporal pattern of the user.
-
RTiSR-IT: Instead of introducing a time interval-aware model for the U-Net, the variant utilizes it on I-Net to explore the temporal pattern of item.
-
RTiSR-NoUI: Both time user-side model and item-side model are not used in this variant, the user/item representations are generated from an MLP, without any aspect review information embedded.
-
RTiSR-UR: It only uses the user-side model to generate the user representations, which contain the user-aspect review information.
-
RTiSR-IR: It only uses the item-side model to generate the item representations, which contain the item-aspect review information.
Study of RTiSR (RQ3)
-
Increasing the size of review embedding substantially improves the recommendation performance on all datasets in terms of HR@10 and NDCG@10. In detail, RTiSR consistently improves dense dataset MIs and Auto over the other three datasets. We attribute the improvement to the effective exploration of user reviews for user and item representations: reviews contain valuable sentiment information about user preference and item features. Thus, RTiSR can significantly enhance the performance of sequential recommendations with user reviews.
-
When further increasing the size of review embedding with larger than 100, we find that it leads to overfitting over all datasets. This might be caused by applying a larger review embedding size that might introduce noises to the representation learning. Thus, it verifies that setting \( d_{l} = 100 \) is sufficient to represent reviews.