3.2.1 Compatibility scores
On users’ personal homepages, each user has shown the demands to the potential mates, including requirements for 7 attributes, i.e. age, avatar, education level, height, credit rating, place of residence and marital status (see Figs. 1–4 in Additional file
1 for the selection requirements of several attributes). As for credit rating, on the dating site, after a user passes the quick identity authentication, or uploads one of three documents (the ID card, the passport or the Hong Kong and Macau Pass) and passes the review, he/she will obtain the first star, i.e. credit rating equals 1. On the basis of the first star, each time a new document is uploaded and approved, an additional star or rating can be added (up to five stars, i.e. five-star member). Besides although on the platform the minimum age of users is 18, there are still very few users who set their requirement for minimum or maximum age below 18 (see Fig. 3 in Additional file
1 for details). We apply the concept of compatibility score to describe the match between users based on whether or not a user meets another user’s selection requirement. When women send messages to men, for each message and for each attribute, we can obtain the proportion of women who match the mate preferences of men and the proportion of men who meet the preferences of women, i.e. we can get two vectors including 7 proportions. According to the data we obtain
\(\mathbf{w}_{\mathrm{FMm}}= (0.701,0.886,0.462,0.826,0.919,0.786,0.920)\), and
\(\mathbf{w}_{\mathrm{FMf}}=(0.912,0.976,0.681,0.962,0.994,0.864,0.912)\), where
\(\mathbf{w}_{\mathrm{FMm}}\) is the proportions of female attributes meeting male preferences and
\(\mathbf{w}_{\mathrm{FMf}}\) is the proportions of male attributes consistent with female preferences. Similarly when men send messages to women, we obtain
\(\mathbf{w}_{\mathrm{MFm}}=(0.877,0.977,0.402,0.980,0.992,0.831,0.960)\) and
\(\mathbf{w}_{\mathrm{MFf}}=(0.671,0.867,0.572,0.678,0.758,0.771,0.892)\). Thus the compatibility scores of women sending messages to men are
$$\begin{aligned}& c_{\mathrm{FMm}} = \frac{\mathbf{w}_{\mathrm{FMm}} \cdot { (\textrm{female attr. in male pref.})}}{ {\operatorname{sum}(\mathbf{w}_{\mathrm{FMm}} )}}, \end{aligned}$$
(1)
$$\begin{aligned}& c_{\mathrm{FMf}} = \frac{\mathbf{w}_{\mathrm{FMf}} \cdot (\textrm{male attr. in female pref.})}{ {\operatorname{sum}(\mathbf{w}_{\mathrm{FMf}} )}}, \end{aligned}$$
(2)
and the compatibility scores of men sending messages to women are
$$\begin{aligned}& c_{\mathrm{MFm}} = \frac{\mathbf{w}_{\mathrm{MFm}} \cdot (\textrm{female attr. in male pref.})}{ {\operatorname{sum}(\mathbf{w}_{\mathrm{MFm}} )}}, \end{aligned}$$
(3)
$$\begin{aligned}& c_{\mathrm{MFf}} = \frac{\mathbf{w}_{\mathrm{MFf}} \cdot (\textrm{male attr. in female pref.})}{ {\operatorname{sum}(\mathbf{w}_{\mathrm{MFf}} )}}, \end{aligned}$$
(4)
where (female attr. in male pref.) is a vector characterizing whether female attributes meet male preferences for a pair of users (1 for yes and 0 for no), and similarly (male attr. in female pref.) is a vector characterizing whether male attributes meet female preferences for a pair of users. Equations
1 and
3 are the compatibility scores between a male preference and the profile of his chosen mate, and Eqs.
2 and
4 are the compatibility scores between a female preference and the profile of her chosen mate. For a pair of users,
\(u_{a}\) and
\(u_{b}\), we use a score, i.e. reciprocal score, to quantify how much the attributes of
\(u_{b}\) match the preferences of
\(u_{a}\) and how much the attributes of
\(u_{a}\) match the preferences of
\(u_{b}\). The reciprocal score between
\(u_{a}\) and
\(u_{b}\) is the mean of the compatibility scores of these two users, that is, for women sending messages to men the reciprocal score is
\(\mathit{rs} = (c_{\mathrm {FMm}} + c_{\mathrm{FMf}} )/2\), and for men sending messages to women
\(\mathit{rs} = (c_{\mathrm{MFm}} + c_{\mathrm{MFf}} )/2\).
3.2.2 Logistic regression
Let
click be the number of times a user is clicked,
msg be the number of messages received by a user, and
rec be the number of times a user is recommended and shown on the other users’ homepages, we define
\(\mathit{pop}_{1} = \mathit{click}/\mathit{rec}\) and
\(\mathit{pop}_{2} = \mathit{msg}/\mathit{rec}\) which can characterize the popularity of a user based on actions. We also use PageRank centrality (
\(\mathit{pop}_{3}\)) to quantify how focal or popular a user is in a network by considering all connections in the network. Attractive people, such as the people with advantageous demographic attributes and higher socio-economic status, tend to be more demanding than average people in terms of potential mate choice, which can be revealed in the preference analysis of income and education level in Sect.
3.1.2. Those who are perceived as attractive by attractive people can be even more popular/attractive. The variables used in the paper and their meanings are shown in Table
1.
Table 1
Variables and their corresponding meanings
MobileF | Whether a female mobile phone is verified |
HouseF | Whether a female has a flat |
AutoF | Whether a female has a car |
LevelF | Female credit rating |
Pop1F | Female \(\mathit{pop}_{1}\) |
Pop2F | Female \(\mathit{pop}_{2}\) |
Pop3F | Female PageRank (\(\mathit{pop}_{3}\), the damping factor is 0.85) in the messaging network |
IndegreeF | Female indegree in the click network |
OutdegreeF | Female outdegree in the click network |
CompatFM | The compatibility score between a female preference and the profile of the corresponding other side |
MsgFM | Whether females send messages to males |
MobileM | Whether a male mobile phone is verified |
HouseM | Whether a male has a flat |
AutoM | Whether a male has a car |
LevelM | Male credit rating |
Pop1M | Male \(\mathit{pop}_{1}\) |
Pop2M | Male \(\mathit{pop}_{2}\) |
Pop3M | Male PageRank (\(\mathit{pop}_{3}\), the damping factor is 0.85) in the messaging network |
IndegreeM | Male indegree in the click network |
OutdegreeM | Male outdegree in the click network |
CompatMF | The compatibility score between a male preference and the profile of the corresponding other side |
MsgMF | Whether males send messages to females |
RS | Mean of the compatibility scores of a sender and the corresponding receiver |
We introduce several centrality indices, such as \(\mathit{pop}_{1}\), \(\mathit{pop}_{2}\), \(\mathit{pop}_{3}\), and indegree, to evaluate their correlation with messaging behaviors. It is noteworthy that the centrality indices are aggregated indicators describing users’ desirability or popularity, and users do not know their indices, nor do they know the indices of others. We use outdegree to characterize users’ activity level, and in the dating site, users also do not know the outdegree of other users. In reality, instead of using the indices to identify or select attractive partners, users will message another based on more specific clues, such as higher income, better education background, attractive photos or good demographic and socio-economic compatibility. In the paper, we will evaluate whether the indices are significantly associated with messaging behaviors.
Suppose
\(p_{i}\) is the probability of sending messages for a female user
i,
\(1-p_{i}\) is the probability of not sending messages, then
\(L_{f_{i}}=\ln(\frac{p_{i}}{1-p_{i}})\), i.e., for all women,
\(L_{f}=\ln(\frac{p}{1-p})\). Similarly, suppose
\(q_{j}\) is the probability of sending messages for a male user
i,
\(1-q_{j}\) is the probability of not sending messages, then
\(L_{m_{j}}=\ln (\frac{q_{j}}{1-q_{j}})\), i.e., for all males,
\(L_{m}= \ln(\frac{q}{1-q})\). We obtain logistic regression models as follows:
$$\begin{aligned}& L_{f} = \alpha _{1} + {\boldsymbol{\beta} }_{1} \cdot {\mathbf{attribute}} + \varepsilon _{\mathrm{1}}, \end{aligned}$$
(5)
$$\begin{aligned}& L_{m} = \alpha _{2} + {\boldsymbol{\beta }}_{2} \cdot {\mathbf{attribute}} + \varepsilon _{\mathrm{2}}. \end{aligned}$$
(6)
In this study, multicollinearity tests are conducted to find out independent variables among which the correlation coefficients are less than 0.5 (see Tables 7 and 8 in Additional file
1 for details). The logistic regression results for women sending messages to men are shown in Table
2. We find that almost all the variables are significant when only considering the attributes of women (model 1), i.e., the attributes of senders, but only housing and outdegree of women are positively associated with the probability of women sending messages to men. When only considering the male attributes (model 2), except male mobile phone verification and credit rating, all the others are significant and are positively associated with the probability of women’s sending messages. When considering the two parties’ attributes and compatibility scores (model 3), among the significant variables, female mobile phone verification, car ownership, credit rating and popularity levels (
\(\mathit{pop}_{1}\) and
\(\mathit{pop}_{3}\)) are negatively associated with the probability of women’s sending messages, while the other variables are positively associated. We find that, when women send messages to men, they are concerned about not only whether they meet the requirements of men but also whether men meet their own requirements.
Table 2
Logistic regression results for female users sending messages to male users
Intercept | −5.322∗∗∗ | 0.014 | −5.548∗∗∗ | 0.016 | −5.640∗∗∗ | 0.017 |
MobileF | −0.092∗∗∗ | 0.014 | | | −0.090∗∗∗ | 0.014 |
HouseF | 0.061∗∗∗ | 0.014 | | | 0.038∗∗ | 0.014 |
AutoF | −0.118∗∗∗ | 0.016 | | | −0.116∗∗∗ | 0.016 |
LevelF | −0.059∗∗∗ | 0.014 | | | −0.072∗∗∗ | 0.014 |
Pop1F | −0.162∗∗∗ | 0.018 | | | −0.167∗∗∗ | 0.018 |
Pop2F | 0.016 | 0.014 | | | 0.012 | 0.014 |
Pop3F | −0.110∗∗∗ | 0.021 | | | −0.121∗∗∗ | 0.021 |
OutdegreeF | 0.209∗∗∗ | 0.005 | | | 0.211∗∗∗ | 0.006 |
MobileM | | | 0.015 | 0.014 | 0.025 | 0.014 |
HouseM | | | 0.238∗∗∗ | 0.015 | 0.243∗∗∗ | 0.015 |
AutoM | | | 0.153∗∗∗ | 0.013 | 0.157∗∗∗ | 0.013 |
LevelM | | | 0.016 | 0.013 | 0.029∗ | 0.013 |
Pop1M | | | 0.393∗∗∗ | 0.007 | 0.390∗∗∗ | 0.007 |
Pop2M | | | 0.053∗∗∗ | 0.004 | 0.051∗∗∗ | 0.004 |
Pop3M | | | 0.142∗∗∗ | 0.006 | 0.146∗∗∗ | 0.006 |
OutdegreeM | | | 0.028∗∗ | 0.011 | 0.029∗∗ | 0.011 |
CompatFM | | | | | 0.057∗∗∗ | 0.014 |
CompatMF | | | | | 0.061∗∗∗ | 0.014 |
AIC | 72,160 | 67,462 | 65,958 |
N
| 1,115,363 | 1,115,363 | 1,115,363 |
The logistic regression results for men sending messages to women are shown in Table
3. We find that when only the female attributes are considered (model 1), except female mobile phone verification, credit rating and outdegree, all the other variables are significant, but only female house ownership affects probability of male messaging in a negative way. When only male attributes are considered (model 2), all the variables are significant but only male outdegree is positively correlated with messaging behaviors, others negatively correlated. With all variables considered (model 3), except for female credit rating, outdegree, and the compatibility score between a female preference and the profile of the corresponding other side, all other variables are significant. Among the significant variables, female mobile phone verification, car ownership, popularity (
\(\mathit{pop}_{1}\),
\(\mathit{pop}_{2}\) and
\(\mathit{pop}_{3}\)), male outdegree and the compatibility score between a male preference and the profile of the corresponding other side are positively correlated with messaging behaviors, while all the other variables are negatively correlated. In addition, by analyzing the significance of the two compatibility scores, we find that men only pay attention to whether women meet their own requirements when sending messages to women.
Table 3
Logistic regression results for male users sending messages to female users
Intercept | −4.800∗∗∗ | 0.008 | −4.732∗∗∗ | 0.008 | −4.973∗∗∗ | 0.009 |
MobileF | 0.007 | 0.007 | | | 0.022∗∗ | 0.007 |
HouseF | −0.021∗∗ | 0.007 | | | −0.015∗ | 0.007 |
AutoF | 0.037∗∗∗ | 0.006 | | | 0.038∗∗∗ | 0.006 |
LevelF | −0.013 | 0.007 | | | 0.012 | 0.007 |
Pop1F | 0.468∗∗∗ | 0.004 | | | 0.462∗∗∗ | 0.004 |
Pop2F | 0.023∗∗∗ | 0.003 | | | 0.022∗∗∗ | 0.003 |
Pop3F | 0.126∗∗∗ | 0.003 | | | 0.141∗∗∗ | 0.003 |
OutdegreeF | −0.004 | 0.008 | | | 0.000 | 0.008 |
MobileM | | | −0.146∗∗∗ | 0.007 | −0.147∗∗∗ | 0.007 |
HouseM | | | −0.062∗∗∗ | 0.008 | −0.071∗∗∗ | 0.008 |
AutoM | | | −0.100∗∗∗ | 0.008 | −0.107∗∗∗ | 0.008 |
LevelM | | | −0.195∗∗∗ | 0.009 | −0.197∗∗∗ | 0.009 |
Pop1M | | | −0.036∗∗∗ | 0.009 | −0.044∗∗∗ | 0.009 |
Pop2M | | | −0.026∗ | 0.012 | −0.026∗ | 0.012 |
Pop3M | | | −0.196∗∗∗ | 0.013 | −0.215∗∗∗ | 0.014 |
OutdegreeM | | | 0.338∗∗∗ | 0.003 | 0.335∗∗∗ | 0.004 |
CompatFM | | | | | −0.013 | 0.007 |
CompatMF | | | | | 0.062∗∗∗ | 0.008 |
AIC | 219,384 | 227,215 | 210,184 |
N
| 2,066,668 | 2,066,668 | 2,066,668 |
As can be seen from Tables
2 and
3, for males or females sending messages, popularity of the other side is significantly positively associated with messaging behaviors. On the one hand,
\(\mathit{pop}_{1}\) and
\(\mathit{pop}_{2}\) values, according to their calculation method, represent a user’s local popularity. On the other hand,
\(\mathit{pop}_{3}\) value, i.e. PageRank, represents the popularity of a user from a global perspective.
For females sending messages to males, \(\exp (0.390) = 1.477\) for male \(\mathit{pop}_{1}\) is larger than \(\exp (0.146) = 1.157\) for male \(\mathit{pop}_{3}\), and for males sending messages to females, \(\exp (0.462) = 1.587\) for female \(\mathit{pop}_{1}\) is also larger than \(\exp (0.141) = 1.151\) for female \(\mathit{pop}_{3}\). Thus, for both males and females, the other party’s \(\mathit{pop}_{1}\) is more important than \(\mathit{pop}_{3}\). Besides we also find that, when females send messages to males, \(\exp (0.390) = 1.477\) for male \(\mathit{pop}_{1}\) is less than \(\exp (0.462) = 1.587\) for female \(\mathit{pop}_{1}\) when males send messages to females, which indicates that compared with females, for males the other side’s \(\mathit{pop}_{1}\) is more associated with their messaging behaviors. However, when females send messages to males, \(\exp (0.146) = 1.157\) for male \(\mathit{pop}_{3}\) is larger than \(\exp (0.141) = 1.151\) for female \(\mathit{pop}_{3}\) when males send messages to females, which indicates that compared with males, for females the other side’s \(\mathit{pop}_{3}\) is more associated with their messaging behaviors.
In China, having an apartment and a car is a symbol of a person’s wealth and social status, and in some regions, they have become necessities for getting married. When women send messages to men, it is important for men to have a house and a car. When men send messages to women, it is not important for women to have a house but it’s somewhat important for women to have a car. We find that \(\exp(0.038) = 1.039\) for whether the other side has a car when men send messages to women is smaller than \(\exp (0.157) = 1.170\) for whether the other side has a car when women send messages to men, indicating that women pay more attention than men to whether the other side has a car.
A user’s outdegree quantifies the user’s activity. Seemingly high activity means contacting many other users, however, essentially it may imply that users invest more time and resources in attempting to find potential partners. Outdegree is an attribute different for men and women. When a woman sends a message to a man, the other side’s outdegree is significantly positively associated with the messaging behavior, while not when a man sends a message to a woman. When women send messages to men, network measures of popularity and activity of the men they contact are significantly positively associated with their messaging behaviors, but when men send messages to women, only the network measures of popularity of the women they contact are significantly positively associated with their messaging behaviors.