Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians

Yu, Philip L. H.; Lee, Paul H.; Cheung, S. F.; Lau, Esther Y. Y.; Mok, Doris S. Y.; Hui, Harry C.

doi:10.1007/s00180-015-0588-4

Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians

Original Paper
Published: 13 June 2015

Volume 31, pages 799–827, (2016)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Philip L. H. Yu¹,
Paul H. Lee²,
S. F. Cheung³,
Esther Y. Y. Lau⁴,
Doris S. Y. Mok³ &
…
Harry C. Hui⁴

505 Accesses
7 Citations
Explore all metrics

Abstract

Logit models are popular tools for analyzing discrete choice and ranking data. The models assume that judges rate each item with a measurable utility, and the ordering of a judge’s utilities determines the outcome. Logit models have been proven to be powerful tools, but they become difficult to interpret if the models contain nonlinear and interaction terms. We extended the logit models by adding a decision tree structure to overcome this difficulty. We introduced a new method of tree splitting variable selection that distinguishes the nonlinear and linear effects, and the variable with the strongest nonlinear effect will be selected in the view that linear effect is best modeled using the logit model. Decision trees built in this fashion were shown to have smaller sizes than those using loglikelihood-based splitting criteria. In addition, the proposed splitting methods could save computational time and avoid bias in choosing the optimal splitting variable. Issues on variable selection in logit models are also investigated, and forward selection criterion was shown to work well with logit tree models. Focused on ranking data, simulations are carried out and the results showed that our proposed splitting methods are unbiased. Finally, to demonstrate the feasibility of the logit tree models, they were applied to analyze two datasets, one with binary outcome and the other with ranking outcome.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Bradley–Terry Regression Trunk approach for Modeling Preference Data with Small Trees

Article Open access 03 September 2022

Bayesian analysis of multivariate ordered probit model with individual heterogeneity

Article 23 June 2020

Bayesian Plackett–Luce Mixture Models for Partially Ranked Data

Article 12 October 2016

References

Alexander WP, Grimshaw SD (1996) Treed regression. J Comput Graph Stat 5(2):156–175
Google Scholar
Allison PD, Christakis NA (1994) Logit models for sets of ranked items. Sociol Methodol 24:199–228
Article Google Scholar
Armitage P (1955) Tests for linear trends in proportions and frequencies. Biometrics 11:375–386
Article Google Scholar
Beggs S, Cardell S, Hausman J (1981) Assessing the potential demand for electric cars. J Econom 16:1–19
Article Google Scholar
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Belmot, California
MATH Google Scholar
Chan KY, Loh WY (2004) Lotus: an algorithm for building accurate and comprehensible logistic regression trees. J Comput Graph Stat 13(4):826–852
Article MathSciNet Google Scholar
Chapman RG, Staelin R (1982) Exploiting rank ordered choice set data within the stochastic utility model. J Mark Res 19:288–301
Article Google Scholar
Chaudhuri P, Lo WD, Loh WY, Yang CC (1995) Generalized regression trees. Stat Sin 5:641–666
MathSciNet MATH Google Scholar
Cochran WG (1954) Some methods of strengthening the common chi-square tests. Biometrics 10:417–451
Article MathSciNet MATH Google Scholar
Dusseldorp E, Meulman JJ (2004) The regression trunk approach to discover treatment covariate interaction. Psychometrika 69:355–374
Article MathSciNet MATH Google Scholar
Dusseldorp E, Conversano C, van Os BJ (2010) Combining an additive and tree-based regression model simultaneously: STIMA. J Comput Graph Sci 19(3):514–530
Article MathSciNet Google Scholar
Erosheva EA, Fienberg SE, Joutard C (2007) Describing disability through individual-level mixture models for multivariate binary data. Ann Appl Stat 1(2):502–537
Article MathSciNet MATH Google Scholar
Goldberg LR (1997) A broad-bandwidth, public-domain, personality inventory measuring the lower-level facets of several five-factor models. In: Mervielde I, Deary I, Fruyt FD, Ostendorf F (eds) Personality psychology in Europe. Tilburg University Press, Tilburg
Hausman J, Ruud PA (1987) Specifying and testing econometric models for rank-ordered data. J Econom 34:83–104
Article MathSciNet MATH Google Scholar
Hosmer DW, Lemeshow S (2000) Appl Logist Regres, 2nd edn. Wiley, New York
Book Google Scholar
Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674
Article MathSciNet Google Scholar
Hui CH, Ng ECW, Mok DSY, Lau EYY, Cheung SF (2011) Faith maturity scale for Chinese: a revision and construct validation. Int J Psychol Relig 21:308–322
Article Google Scholar
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59:161–205
Article MATH Google Scholar
Lee PH, Yu PLH (2010) Distance-based tree models for ranking data. Comput Stat Data Anal 54:1672–1682
Article MathSciNet MATH Google Scholar
Lee PH, Yu PLH (2012) Mixtures of weighted distance-based models for ranking data with applications in political studies. Comput Stat Data Anal 56:2486–2500
Article MathSciNet MATH Google Scholar
Lee PH, Yu PLH (2013) An R package for analyzing and modeling ranking data. BMC Med Res Methodol 13:65
Article Google Scholar
Leung KF, Tay M, Cheng S, Lin F (1997) Hong Kong Chinese version World Health Organization quality of life—abbreviated version. Hong Kong Hospital Authority, Hong Kong
Google Scholar
Leung K, Bond MH, de Carrasquel SR, Munoz C, Hernandez M, Murakami F, Yamaguchi S, Bierbrauer G, Singelis TM (2002) Social axioms: the search for universal dimensions of general beliefs about how the world functions. J Cross Cult Psychol 33:286–302
Article Google Scholar
Luce RD (1959) Individual choice behavior. Wiley, New York
MATH Google Scholar
Maydeu-Olivares A, Brown A (2010) Item response modeling of paired-comparison and ranking data. Multivar Behav Res 45(6):935–974
Article Google Scholar
Poon WY, Xu L (2009) On the modelling and estimation of attribute rankings with ties in the Thurstonian framework. Br J Math Stat Psychol 62:507–527
Article MathSciNet Google Scholar
Schwartz SH (1996) Value priorities and behavior: applying a theory of integrated value systems. In: Seligman C, Olson JM, Zanna MP (eds) The psychology of values: the Ontario symposium, vol 8. Erlbaum & Hillsdale, NJ,
Strobl C, Wickelmaier F, Zeileis A (2011) Accounting for individual differences in Bradley–Terry models by means of recursive partitioning. J Educ Behav Stat 36:135–153
Google Scholar
Su XG, Wang M, Fan JJ (2004) Maximum likelihood regression trees. J Comput Graph Stat 13(3):586–598
Article MathSciNet Google Scholar
Thurstone LL (1927) A law of comparative judgement. Psychol Rev 34:273–286
Article Google Scholar
Train K (2003) Discrete choice methods with simulation. Cambridge University Press, Cambridge
Book MATH Google Scholar
Tsai RC, Yao G (2000) Testing Thurstonian case V ranking models using posterior predictive checks. Br J Math Stat Psychol 53:275–292
Article Google Scholar
Yao G, Bockenholt U (1999) Bayesian estimation of Thurstonian ranking models based on the gibbs sampler. Br J Math Stat Psychol 52:79–92
Article Google Scholar
Yu PLH, Wan WM, Lee PH (2008) Analyzing ranking data using decision tree. In: European conference on machine learning and principles and practice of knowledge discovery in databases
Yu PLH, Wan WM, Lee PH (2010) Decision tree modeling for ranking data. In: Furnkranz J, Hullermeier E (eds) Preference learning. Springer, Berlin, pp 83–106
Zeileis A, Hothorn T, Hornik K (2008) Model-based recursive partitioning. J Comput Graph Stat 17(2):492–514

Download references

Acknowledgments

The research of Philip L. H. Yu was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU 7473/05H). We thank the associate editor and two anonymous referees for their helpful suggestions for improving this article.

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, The University of Hong Kong, Room 225, Run Run Shaw Building, Pok Fu Lam, Hong Kong
Philip L. H. Yu
School of Nursing, Hong Kong Polytechnic University, Kowloon, PQ433, Hong Kong
Paul H. Lee
Department of Psychology, The University of Macau, Macao, China
S. F. Cheung & Doris S. Y. Mok
Department of Psychology, The University of Hong Kong, Pok Fu Lam, Hong Kong
Esther Y. Y. Lau & Harry C. Hui

Authors

Philip L. H. Yu
View author publications
You can also search for this author in PubMed Google Scholar
Paul H. Lee
View author publications
You can also search for this author in PubMed Google Scholar
S. F. Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Esther Y. Y. Lau
View author publications
You can also search for this author in PubMed Google Scholar
Doris S. Y. Mok
View author publications
You can also search for this author in PubMed Google Scholar
Harry C. Hui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul H. Lee.

Appendices

Appendix 1: Tree pruning: the cost-complexity algorithm

In the cost-complexity algorithm, we seek to find the optimal tree T by minimizing

$$\begin{aligned} R(T) + g |\tilde{T}|, \end{aligned}$$

(16)

where R(T) and $|\tilde{T}|$ are the cost function (or error) and the number of terminal nodes of the tree T respectively, and g controls the quantity of penalization for the tree size. The pruning stage can be divided into two steps: (1) generating subtrees and (2) choosing the best subtree. We will discuss them in the following Sections.

1.1 Generating subtrees

Before explaining how the subtrees are generated, here are some definitions:

R(T) is defined as the sum of deviance (i.e. $-$2 $\times $ loglikelihood) for all of its terminal nodes.
$T_t$ is the subtree with root t, where t is an internal node of tree T.
R(t) is the deviance of internal node t.
g(t) is the strength of the link from t and is defined as
$$\begin{aligned} \frac{R(t)-R(T_t)}{|\tilde{T}_t|-1}. \end{aligned}$$
It can be viewed as the reduction of error if the node t is further split instead of being stopped.

Note that when $g = g(t)$, it is indifferent for cutting the nodes under t or not.

After the growing stage, we arrive at a tree $T^0$. The pruning stage begins by calculating g(t) for all internal nodes in $T^0$ and searching for the node $t^1$ with the minimum value $g(t^1)$. Next, we then cut all the descendant nodes under $t^1$ and turn $t^1$ into a terminal node. A pruned tree $T^1$ is then created. This process is repeated until $T^0$ is pruned to the root $T^m$. Afterwards, we obtain a sequence of nested trees

$$\begin{aligned} T^0\supset T^1\supset T^2\supset \cdots \supset T^m. \end{aligned}$$

Note that among the sequence, the tree $T^y$ will be chosen as our final model if

$$\begin{aligned} g(t^y) \le g \le g(t^{y+1}). \end{aligned}$$

1.2 Choosing the best subtree

Breiman et al. (1984) suggested two methods to choose the best model among trees $T^0, T^1,\ldots ,T^m$. When the dataset is large, the independent testing dataset method can be used. Otherwise, the cross-validation method is suggested. Here, the latter method is applied.

To select the best subtree, we first need to compute the selection criterion, the cross-validation deviance (DEV-CV) The N observations are divided randomly into V arbitrary equal-sized subsets $L_1, L_2, \ldots , L_V$. It is a common practice to take $V=10$. For $v=1,\ldots ,V, L_v$ is the vth validation dataset and its complement $L_v^C$ is the vth training dataset. Based on the training dataset $L_v^C$, the vth sequence of nested trees,

$$\begin{aligned} T_v^0\supset T_v^1\supset T_v^2 \supset \cdots \supset T_v^{m_v} \end{aligned}$$

is constructed. We then use the validation dataset $L_v$ to compute the validation deviance. The validation deviance in any particular terminal node of tree $T_v^y, D_v^y$ is $-2 \times $ loglikelihood, where in the loglikelihood function, the probabilities are computed using $L_v^C$ and the observed frequencies are those in $L_v$ Adding up the validation deviance of all terminal nodes, we get the validation deviance of tree $T_v^y$.

After computing the validation deviances for all nested trees $T_v^0, T_v^1,\ldots ,T_v^{m_v}$, $v=1,\ldots ,V$, we can proceed to the next step, that is, evaluating the DEV-CV for each of the nested trees $T^0, T^1,\ldots ,T^m$. Breiman et al. (1984) recommended using the geometric average $\sqrt{g(t^y)g(t^{y+1})}$ as the estimate of $g^y$, and hence the cross-validation deviance for tree $T^y$, $R^{CV}(T^y)$ is evaluated as

$$\begin{aligned} R^{CV}(T^y) = \frac{1}{V}\sum _{v=1}^{V} R\left( T_v\left( \sqrt{g\left( t_v^y\right) g\left( t_v^{y+1}\right) }\right) \right) \end{aligned}$$

(17)

where $T_v(g)$ is equal to the pruned subtree $T_v^y$ such that $g(t_v^y) \le g \le g(t_v^{y+1})$.

Now we can return to the selection of the best subtree from $T^0, T^1,\ldots ,T^m$. Let $T^*$ be the tree with minimum DEV-CV. It sounds reasonable to select $T^*$ as our final model. Since the position of the minimum $R^{CV}(T^*)$ is uncertain (Breiman et al. 1984), we adopt the 1-SE rule to select another tree $T^{**}$ as our final model. $T^{**}$ is the smallest subtree which satisfies

$$\begin{aligned} R^{CV}(T^{**}) \le R^{CV}(T^*) + SE\left( R^{CV}(T^*)\right) . \end{aligned}$$

(18)

To summarize, below is the pseudo-code for demonstrating how our pruning algorithm works:

Appendix 2: Covariates used in the “formation and transformation of beliefs in Chinese” study

The summary statistics of these covariates are listed in Table 14.

Table 14 Summary of covariates

Full size table

All these covariates, with the exception of demographic information, are the participants’ raw answers to a number of items. The meaning of these covariates will be described in the following.

Personality. The 50-item International Personality Item Pool Big-Five Domain scale (IPIP; Goldberg 1997) was used to measure five personality dimensions, namely intellect (openness to experience), conscientiousness, extraversion, agreeableness, and emotional stability. The items were translated into Chinese (followed by the usual back-translation procedure). Participants rated each item on a 5-point scale (1 $=$ extremely disagree; 5 $=$ extremely agree). The Cronbach’s alpha for the five dimensions in our sample were 0.79, 0.76, 0.86, 0.77, and 0.90 respectively.

Social axiom. Social Axiom (Leung et al. 2002) included social cynicism (negative assessment of human nature and social events), social complexity (belief in multiple and uncertain solutions to social issues), reward for application (belief that effort will bring good outcomes), religiosity (belief in the presence of supernatural influences on the world, and that religions are good), and fate control (belief in the presence of impersonal forces influencing social events). Participants rated each of the 25 items on a 5-point scale (1 $=$ extremely disagree; 5 $=$ extremely agree). The Cronbach’s alpha for the five dimensions in our sample were 0.61, 0.56, 0.71, 0.76, and 0.71 respectively.

Quality of life. From the 28 items in the Hong Kong Chinese version of World Health Organization Quality of Life Measures (WHOQOL-BREF(HK); Leung et al. 1997) we chose 26 which we deemed suitable. The instrument provides several indices, namely Physical Health, Psychological Health (culturally adjusted for Hong Kong), Social Relationships, and Environment. Overall quality of life and overall health were individually indicated by one item each. Participants responded to each item on a 5-point scale. The anchors of the rating scales vary from item to item, using wordings from the items themselves. For example, for the item “How safe do you feel in your daily life?”, the scale was from 1 (not at all safe) to 5 (extremely safe).To allow comparison with the WHOQOL-100 (the origin of WHOQOL-BREF(HK)), the 6 indices (Physical health, Psychological health, Social relationships, Environment, Overall QOL, Overall health) were scaled to the range (4, 20). The Cronbach’s alpha for the four dimensions in our sample were 0.67, 0.82, 0.59, and 0.68 respectively.

Personal values. Personal values were measured with the Schwartz Value Survey (Schwartz 1996). Participants indicated the importance of each of 57 items as a guiding principle in their life on a 9-point scale ($-$1 $=$ opposed to my values; 0 $=$ not important, 7 $=$ of extreme importance). Ten value indices were computed: conformity, tradition, benevolence, universalism, self-direction, stimulation, hedonism, achievement, power, and security. The Cronbach’s alpha for the ten dimensions in our sample were 0.77, 0.56, 0.83, 0.84, 0.77, 0.64, 0.58, 0.81, 0.76, and 0.76 respectively.

Faith maturity. Faith Maturity Scale (FMS) was developed by the Search Institute (http://www.search-institute.org/). The scale has two dimensions. The FM vertical dimension reflects the person and his/her feelings of faith. The FM horizontal dimension reflects how the person’s behavior towards others approximates that prescribed in the religion. The Cronbach’s alpha for the two dimensions in our sample were 0.89 and 0.77 respectively.

Demographic information. Participants indicated if they were students, and for how long they had been a believer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, P.L.H., Lee, P.H., Cheung, S.F. et al. Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians. Comput Stat 31, 799–827 (2016). https://doi.org/10.1007/s00180-015-0588-4

Download citation

Received: 28 October 2013
Accepted: 12 May 2015
Published: 13 June 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s00180-015-0588-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians

Abstract

Access this article

Similar content being viewed by others

The Bradley–Terry Regression Trunk approach for Modeling Preference Data with Small Trees

Bayesian analysis of multivariate ordered probit model with individual heterogeneity

Bayesian Plackett–Luce Mixture Models for Partially Ranked Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Tree pruning: the cost-complexity algorithm

1.1 Generating subtrees

1.2 Choosing the best subtree

Appendix 2: Covariates used in the “formation and transformation of beliefs in Chinese” study

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Logit tree models for discrete choice data with application to advice-seeking preferences among Chinese Christians

Abstract

Access this article

Similar content being viewed by others

The Bradley–Terry Regression Trunk approach for Modeling Preference Data with Small Trees

Bayesian analysis of multivariate ordered probit model with individual heterogeneity

Bayesian Plackett–Luce Mixture Models for Partially Ranked Data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Tree pruning: the cost-complexity algorithm

1.1 Generating subtrees

1.2 Choosing the best subtree

Appendix 2: Covariates used in the “formation and transformation of beliefs in Chinese” study

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation