On finite mixtures of Discretized Beta model for ordered responses

Simone, Rosaria

doi:10.1007/s11749-022-00800-7

On finite mixtures of Discretized Beta model for ordered responses

Original Paper
Open access
Published: 24 February 2022

Volume 31, pages 828–855, (2022)
Cite this article

Download PDF

You have full access to this open access article

TEST Aims and scope Submit manuscript

On finite mixtures of Discretized Beta model for ordered responses

Download PDF

Rosaria Simone ORCID: orcid.org/0000-0002-6844-6418¹

1687 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

The paper discusses the specification of finite mixture models based on the Discretized Beta distribution for the analysis of ordered discrete responses, as ratings and count data. The ultimate goal of the paper is to parameterize clusters of opposite and intermediate response outcomes. After a thorough discussion on model interpretation, identifiability and estimation, the proposal is illustrated on the wake of a case study on the probability to vote for German Political Parties and with a comparative discussion with the state of the art.

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

Article 15 July 2015

A simple introduction to Markov Chain Monte–Carlo sampling

Article Open access 11 March 2016

Mixture Models: Latent Profile and Latent Class Analysis

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Motivation

With the upcoming of any electoral competition, parties’ share of the electoral consensus can be measured by pollsters if voting intentions on nominal scales are surveyed. A more innovative approach consists in gauging probability to vote for each candidate as ratings on ordered scales in order to assess the extent by which respondents’ opinions hold. Similarly, marketing stakeholders prefer to survey intention to take a certain decision in the future, rather than asking questions with yes/no answers about respondents’ likings and habits. Thus, suitable statistical modelling of ordered evaluations is advocated to characterize clusters of both extreme and intermediate response choices.

Polarization is hereafter meant as the process by which evaluations about an item converge towards one of two opposing poles of the response spectrum, in the spirit of (Apouey 2007)^{Footnote 1}. Possibly, a further cluster may be expected as a result of un-polarized respondents, corresponding to a concentration of responses away from the extremes: the term floatation is hereafter used to indicate this circumstance as complementary to polarization.

A candidate model allowing to directly parameterize polarization towards the extremes is the two-component mixture of Inverse Hypergeometric distributions (mihg, (Simone and Iannario 2018)), whereas a mixture of Binomial and Discretized Beta models can be considered to analyse overall response feeling and certain symmetric response styles (caub, (Simone and Tutz 2018)). For count data, bimodality (not necessarily at the extremes of the response support) can be tackled via suitable adaptation of the (shifted) Poisson distribution (Gómez-Déniz et al. 2020) or by resorting to a two-component mixture of Conway–Maxwell–Poisson models (Sur et al. 2015).

With respect to the state of the art, the paper discusses the specification of mixture models based on the Discretized Beta distribution (Ursino 2014; Ursino and Gasparini 2018) as a flexible class of statistical models to parameterize polarization and floatation of ordered evaluations. The proposal is designed to attain broad and straightforward interpretation for marketing, psychology and socio-economic studies, as it allows to characterize opposite and intermediate response clusters. Further relevant applications include self-reported wealth or health, or Net Promoter Score type evaluations (NPS, (Reichheld 2003)) to assess the extent by which attractors outclass detractors (Capecchi and Piccolo 2017).

The paper is organized as follows: Sect. 2 recalls the baseline framework of the Discretized Beta model. The core of the paper is Sect. 3, with a detailed discussion on mixtures based on the Discretized Beta distribution to jointly model polarization and floatation of ordered evaluations; goodness-of-fit criteria and inferential aspects are described in Sects. 3.2–3.5, whereas a comparative discussion of the state of the art is delivered in Sect. 3.6. A case study is pursued in Sect. 4 to support the proposal with empirical evidence. Concluding remarks are addressed in Sect. 5. A devoted appendix supplements the presentation with a discussion on the optimal number of components for Discretized Beta mixtures and of the parameter constraints needed to prevent identifiability issues.

2 Discretized Beta mixtures for polarization and floatation of ordered data

Let R be a rating variable collected on a response scale with m ordered categories, say $c_1 \prec c_2 \prec \cdots \prec c_m$: the numeric scoring $c_r = r$ will be made merely for notational convenience. Without loss of generality, assume that the scale has a positive orientation with the trait being examined.

Definition 1

For $\alpha , \beta \in {\mathbb {R}}^{+}$, let $X \sim Beta(\alpha ,\beta )$ be a Beta distributed random variable over the real interval [0, 1]. For a given $m>3$, a discrete variable R, with support $\{1,2,\dots ,m\}$, is said to be distributed according to a Discretized Beta model of parameters $\alpha ,\beta $ ($R \sim \text {DB}(\alpha ,\beta )$, for short) if:

$$\begin{aligned} Pr(R = r |\alpha ,\beta )\,=\, Pr\bigg ( \frac{r-1}{m} \le X \le \frac{r}{m} \big | \alpha , \beta \bigg ), \qquad r=1,\dots ,m. \end{aligned}$$

(1)

For notational convenience, set $db(r; \alpha ,\beta ) := Pr(R = r |\alpha ,\beta )$. This model has been already acknowledged in the literature on ordinal data analysis in view of the flexibility inherited from the underlying Beta distribution, which does not impose a predetermined shape for the latent continuous trait (Ursino 2014; Fasola and Sciandra 2015; Ursino and Gasparini 2018; Simone and Tutz 2018). Similar arguments can be advanced for the Beta-Binomial model (Morrison 1979), yet the Discretized Beta is more versatile as it can be either overdispersed or underdispersed (Ursino 2014). The uniform distribution arises as a limit case when $\alpha =\beta =1$. Location and shape properties of the latent Beta model imply the following features of the DB distribution (Abramowitz and Stegun 1972; Forbes et al. 2011). Given that the discretization of the latent Beta model occurs at equi-spaced intervals for a fixed m, the modal value Mo(R) of $R \sim {\text {DB}}(\alpha ,\beta )$ satisfies:

$Mo(R) = 1$ if $\alpha < 1$ and $\beta \ge 1$ or, in case $\min (\alpha ,\beta )>1$, if $\frac{\alpha -1}{\alpha +\beta -2}<\frac{1}{m}$;
$Mo(R) = m$ if $\alpha \ge 1$ and $\beta <1$ or, in case $\min (\alpha ,\beta ) > 1$, if $\frac{\alpha -1}{\alpha +\beta -2} > 1-\frac{1}{m}$;
$Mo(R) = r \in \{2,\dots ,m-1\}$ if and only if $\min (\alpha ,\beta ) > 1$ and $\frac{\alpha -1}{\alpha +\beta -2} \in (\frac{r-1}{m}, \frac{r}{m}]$. Thus, the following condition implies an inner mode:
$$\begin{aligned} \frac{1}{m}< \frac{\alpha -1}{\alpha +\beta -2} < 1-\frac{1}{m}; \end{aligned}$$
(2)
The distribution is U-shaped with two modal values at the first and at the last categories if $\max (\alpha ,\beta ) <1$ and if, for the given m, parameters satisfy the following system of inequalities^{Footnote 2} based on the incomplete Beta function $I_x(\alpha ,\beta )$:
$$\begin{aligned} {\left\{ \begin{array}{ll} 1+I_{\frac{m-2}{m}}(\alpha ,\beta ) &{}> \,2\,I_{\frac{m-1}{m}}(\alpha ,\beta );\\ 2\,I_{\frac{1}{m}}(\alpha ,\beta ) &{}> \,I_{\frac{2}{m}}(\alpha ,\beta ).\\ \end{array}\right. } \end{aligned}$$
(3)

As a consequence, a necessary condition for a Discretized Beta model to be applied for polarization of either favourable or unfavourable responses is the constraint $\min (\alpha , \beta ) < 1$. Under this circumstance, parameter $\alpha $ governs the polarization of the unfavourable responses: hereafter, this cluster will be referred to as opponents’ pole. If $\beta = \max (\alpha , \beta ) \ge 1$, the closer $\alpha $ is to 0, the stronger is the polarization of the opponents, with positive asymmetry increasing with growing $\beta $. Conversely, $\beta $ governs the polarization of the favourable responses (say, the supporters’ pole). If $\alpha = \max (\alpha , \beta ) \ge 1$, the closer $\beta $ is to 0, the higher is the probability assigned to the last category and thus the stronger is the polarization of the supporters, with negative asymmetry strengthening with growing $\alpha $. A Discretized Beta model with $\max (\alpha ,\beta )<1$, instead, can be specified to account for polarization towards both the extremes (provided that (3) holds), whereas floatation between the two response endpoints can be modelled by assuming a DB$(\alpha ,\beta )$ distribution with $\min (\alpha ,\beta )>1$, such that (2) holds true, given the number of categories^{Footnote 3}. Asymmetry and intensity of floatation can be measured in terms of skewness $\gamma _1(\alpha ,\beta )$ and excess kurtosis $\gamma _2(\alpha ,\beta )$ of the underlying Beta distribution^{Footnote 4}:

$$\begin{aligned} \gamma _1(\alpha ,\beta )= & {} 2\frac{\beta - \alpha }{\alpha +\beta +2}\sqrt{\dfrac{\alpha +\beta +1}{\alpha \,\beta }}; \end{aligned}$$

(4)

$$\begin{aligned} \gamma _2(\alpha ,\beta )= & {} \frac{6\big (\alpha ^3 + \alpha ^2(1-2\beta ) +\beta ^2(1+\beta )-2\alpha \beta (2+\beta )\big )}{\alpha \beta \big (\alpha +\beta +2\big )\big (\alpha +\beta +3\big )}; \end{aligned}$$

(5)

such that $\gamma _1(\alpha ,\beta ) = -\gamma _1(\beta ,\alpha )$ and $\gamma _2(\alpha ,\beta ) = \gamma _2(\beta ,\alpha )$. However, interpretation of excess kurtosis is not straightforward for asymmetric distributions: the measure of kurtosis adjusted for skewness introduced in (Blest 2003) can be considered to overcome this issue (see (15) in Appendix 1 for details).

3 Finite mixtures of Discretized Beta model

Given the flexibility in both shape and interpretation of the DB model, polarization and floatation in ordered data can be jointly parameterized by specifying suitable mixture distributions.

By virtue of the comments delivered in Sect. 2, if floatation can be shaped via a DB model with parameters $\alpha _2,\beta _2 > 1$ satisfying (2) for given m, alternative specifications are possible for the polarization effect:

1.
a unique component DB$(\alpha _1,\beta _1$) with $\min (\alpha _1,\beta _1) < 1$, and with $\max (\alpha _1,\beta _1) < 1$ satisfying (3) if two opposing clusters at the extremes are present, or with $\max (\alpha _1,\beta _1) \ge 1$ if only one pole of supporters or opponents is found, yielding the two-component mixture:
$$\begin{aligned} Pr\big (R=r \mid \varvec{\theta }\big )= (1-\delta )\,db(r;\alpha _1,\beta _1)\,+\,\delta \,db(r;\alpha _2,\beta _2)\,,\quad r=1,\dots ,m\,; \end{aligned}$$
(6)
2.
a mixture of two J-shaped DB models, DB$(\alpha _1,\beta _1)$ and DB$(\alpha _3,\beta _3)$, yielding the 3-component mixture specification:
$$\begin{aligned} Pr\big (R=r \mid \varvec{\theta }\big )= & {} \delta _1\,db(r;\alpha _1,\beta _1)\,+\,\delta _2\,db(r;\alpha _2,\beta _2)\,\nonumber \\&+\, \delta _3\,db(r;\alpha _3,\beta _3), \end{aligned}$$
(7)
so that $\delta _1+\delta _2+\delta _3=1$, and:
- $\alpha _1 \in (0,1)$ and $\beta _1 \ge 1$ to shape the opponents’ pole;
- $\alpha _2,\beta _2 >1$ satisfy (2) to shape floatation;
- $\alpha _3\ge 1,\beta _3 \in (0,1)$ to shape the supporters’ pole.

Some identifiability issues may arise for the polarization components in both (6) and (7), due to a Beta approximation of the latent Beta models. Appendix 2 collects all the relevant discussion and results pertaining to these topics: the present section will focus on the proposed class of mixtures, stemming from (7) under suitable parameter constraints.

3.1 The OFS mixture for polarization and floatation of ordered evaluations

In order to overcome possible identifiability issues for mixtures of DB models, the proposed strategy is to constrain $\beta _1=1$ and $\alpha _3=1$ for the mixture specification (7).

Hereafter, the acronym OFS will stand for Opponent-Floatation-Supporter, and three 0-1 subscripts will indicate if each component is specified in the mixture (1) or not (0). Thus, models DB$(\alpha _1,1)$, with $\alpha _1 \in (0,1)$, and DB$(1,\beta _3)$, with $\beta _3 \in (0,1)$, will be referred to as ${\text {OFS}}_{100}$ and ${\text {OFS}}_{001}$ to indicate a DB distribution to model polarization towards the opponents’ and the supporters’ pole, respectively. Consequently, as a benchmark for bi-polarization towards the end-points, the proposal is to assume the following mixture specification.

Definition 2

If $\alpha _1, \beta _3, \delta \in (0,1)$, the ${\text {OFS}}_{101}$ model is defined by the mixture:

$$\begin{aligned} Pr(R=r|\varvec{\theta }) = \delta \, db(r;\alpha _1,1) + (1-\delta )\,db(r; 1,\beta _3),\qquad r=1,\dots ,m. \end{aligned}$$

(8)

The mixture of ${\text {OFS}}_{101}$ for polarization with an ${\text {OFS}}_{010}$ distribution for floatation (so that (2) holds) can be safely considered to jointly model polarization towards either one or both the extremes and possible floatation in between.

Definition 3

If the above notation prevails, the ${\text {OFS}}_{111}$ model is defined by:

$$\begin{aligned} Pr(R=r|\varvec{\theta }) = \delta _1\, db(r;\alpha _1,1) \,+\, \delta _2 \,db(r; \alpha _2,\beta _2) \,+\, \delta _3 \,db(r; 1,\beta _3). \end{aligned}$$

(9)

Remark 1

With reference to the procedures outlined in Appendix 2 and unlike for (6) and (7), the Beta approximation of the latent polarization components in (9), and its combination with the latent floatation, does not correspond to an ${\text {OFS}}_{111}$ specification. The same arguments apply if either ${\text {OFS}}_{100}$ or ${\text {OFS}}_{001}$ are assumed for polarization. Thus, identifiability of parameters can be assumed for OFS mixture models.

Both asymmetric and symmetric floatation are encompassed by the ${\text {OFS}}_{111}$ model (under the constraint $\alpha _2 = \beta _2$). In case the floatation component is symmetric, the superscript (s) will be used. If m is odd, a degenerate floatation component corresponds to neutrality (in case $\alpha _2=\beta _2$ tends to infinity), resulting in inflation in the middle of the response scale: in this case, the superscript (i) will replace (s), and the resulting ${\text {OFS}}_{111}^{(i)}$ model will denote a mixture of an ${\text {OFS}}_{101}$ model with a degenerate distribution $\mathbbm {1}_{c=r}$ with mass concentrated at $c=\frac{m+1}{2}$ (so that $\mathbbm {1}_{{c=r}} = 0$ if $r \ne c$, and $\mathbbm {1}_{{c=r}} = 1$ if $r=c$).

Remark 2

OFS models encompass also inflated responses at the extremes of the response support. Consider, for instance, the ${\text {OFS}}_{110}$ model: the DB$(\alpha _1,1)$ component identifies the opponents’ cluster, which is characterized by a mode at the first category and decreasing probabilities as scores increase, thus allowing to account also for scale usage diversity among opponents and for different strengths of opposition. As a limit case, the ${\text {OFS}}_{110}$ tends to an inflated DB model with inflation at the first category if $\alpha _1 \rightarrow 0$. The dual remark applies for the ${\text {OFS}}_{011}$ model^{Footnote 5}. Thus, the smoothed switch between extreme modal values and inner categories implied by the OFS approach is more general than DB models with inflation at either one of the end-points (see the example discussed in Sect. 3.6).

Remark 3

Covariate effects on model parameters can be investigated via suitable link functions. If $\varvec{x}_i, \varvec{y}_i, \varvec{u}_i, \varvec{z}_i, \varvec{t}_i$ are selected subjects’ characteristics, a logarithmic link can be set for individual floatation parameters $\alpha _{2i}, \beta _{2i} > 1$:

$$\begin{aligned} \log (\alpha _{2i})=\varvec{z}_i\, \varvec{\gamma }_2\,;\quad \log (\beta _{2i})=\varvec{u}_i\, \varvec{\eta }_2\,,\\ \end{aligned}$$

provided that the constraint (2) is taken into account also conditional to covariates, whereas a logit link can be set for polarization parameters $\alpha _1, \beta _3, \delta _1, \delta _3 \in (0,1)$:

$$\begin{aligned} {\text {logit}}(\alpha _{1i})=\varvec{y}_i\, \varvec{\gamma }_1\,;\; {\text {logit}}(\beta _{3i})=\varvec{w}_i\, \varvec{\eta }_3\,;\; {\text {logit}}(\delta _{1i})=\varvec{x}_i\, \varvec{\omega }_1;\; {\text {logit}}(\delta _{3i})=\varvec{t}_i\, \varvec{\omega }_3\,.\\ \end{aligned}$$

3.2 Fitting performances and model selection

Model selection within the OFS class can be performed in terms of likelihood ratio test for pairs of nested models (to compare the symmetric and asymmetric specification for floatation, for instance). More generally, fitting performance of an OFS model against competing alternatives can be assessed by resorting to information criteria: in the following, the BIC index will be considered to account also for model complexity. Standard goodness-of-fit tests relying on Pearson $X^2$ statistics could be performed provided that $m-1-k>0$, if k is the number of estimable parameters. For instance, $m>7$ is needed to apply this test for ${\text {OFS}}_{111}$ models.

The normalized Leti’s dissimilarity index (Leti 1983):

$$\begin{aligned} Diss(\varvec{f},\varvec{p}) = \dfrac{1}{2}\sum _{r=1}^m |f_r - p_r|,\quad Diss(\varvec{f},\varvec{p}) \in [0,1], \end{aligned}$$

(10)

will be considered to measure the goodness of fit of an estimated model $\varvec{p} = \varvec{p}(\varvec{\theta }) = (p_1,\dots ,p_m)$ to the observed relative frequency distribution $\varvec{f}= (f_1,\dots ,f_m)$. With respect to more traditional indicators, as the Hellinger distance $H(\varvec{p},\varvec{q})$ (Gibbs and Su 2002), so that:

$$\begin{aligned} H^2(\varvec{p},\varvec{q})\, \le \, Diss(\varvec{p}, \varvec{q}) \le \sqrt{2}\,H(\varvec{p},\varvec{q}), \quad H^2(\varvec{p},\varvec{q})= \dfrac{1}{2}\sum _{r=1}^m \big (\sqrt{p_r} - \sqrt{q}_r\big )^2,\nonumber \\ \end{aligned}$$

(11)

the Dissimilarity value is interpretable as the percentage of responses that are missed by the model^{Footnote 6}. For this reason, it can be also exploited to check the ability of a model $\varvec{p}$, estimated on a training set, to predict the test set distribution $\varvec{f}$. With the same goal and for comparative purposes, the Kullback–Leibler Divergence $KL(\varvec{f}\vert \vert \varvec{p}) = \sum \limits _{r=1}^m f_r \log (\frac{f_r}{p_r})$ will be also computed.

3.3 Inferential issues for the OFS model

Hereafter, the main steps of the expectation–maximization algorithm for mixtures (EM, (McLachlan and Krishnan 1997)) to perform maximum likelihood estimation of parameters are outlined for the general ${\text {OFS}}_{111}$ specification.

For a sample of ratings $\varvec{r} = (r_1,\dots ,r_n)$, the complete log-likelihood of the ${\text {OFS}}_{111}$ model, with parameter vector $\varvec{\theta } = (\delta _1,\delta _3,\alpha _1,\alpha _2,\beta _2,\beta _3)$, is given by:

$$\begin{aligned} l_c(\varvec{\theta }; \varvec{r})&= \log (\delta _1)\,\sum \limits _{i=1}^n Z_{1i} \, +\, \log (\delta _3)\,\sum \limits _{i=1}^n Z_{3i} \, +\, \log (1-\delta _1-\delta _3)\,\sum \limits _{i=1}^n Z_{2i} \; \end{aligned}$$

(12)

$$\begin{aligned}&\quad + \, \sum \limits _{i=1}^n Z_{1i} \,\log \big (db(r_i;\alpha _1,1) \big )\,+ \,\sum \limits _{i=1}^n Z_{2i} \log \big ( db(r_i; \alpha _2,\beta _2)\big ) \nonumber \\&\quad + \sum \limits _{i=1}^n Z_{3i} \,\log \big (db(r_i;1,\beta _3) \big ) \end{aligned}$$

(13)

where $Z_{ji}$ is a random variable with $Z_{ji} =1$ if the i-th rating is drawn from the j-th component in the mixture, and $Z_{ji}=0$ otherwise (so $Z_{2i} = 1-Z_{1i}-Z_{3i}$). Thus, if $\varvec{\theta }^{(k)}$ is the current estimate at the k-th iteration, the posterior probabilities of the i-th rating being drawn from the opponents’ component DB$(\alpha _1,1)$ and the supporters’ component DB$(1,\beta _3)$ are computed within the E-step as:

$$\begin{aligned}&{\mathbb {E}}[Z_{1i}|\varvec{\theta }^{(k)}] = \tau _{1i}^{(k)} = \dfrac{\delta _1^{(k)}\, db(r_i; \alpha _1^{(k)},1)}{Pr(R_i=r_i|\varvec{\theta }^{(k)})};\\&\quad {\mathbb {E}}[Z_{3i}|\varvec{\theta }^{(k)}] = \tau _{3i}^{(k)} = \dfrac{\delta _3^{(k)}\,db(r_i; 1,\beta _3^{(k)})}{Pr(R_i=r_i|\varvec{\theta }^{(k)})}, \end{aligned}$$

so that $\tau _{2i}^{(k)} = 1- \tau _{1i}^{(k)} - \tau _{3i}^{(k)}$. In case covariates effects are not specified in the model, then one can write $\tau _{ji}^{(k)} = \tau _{jr}^{(k)}$ if $r_i=r$, $r=1,\dots ,m$, $j=1,2,3$, and the expected complete log-likelihood to be maximized at the M-step can be rewritten as:

$$\begin{aligned} {\mathbb {E}}[l_c(\varvec{\theta })| \varvec{\theta }^{(k)}] = Q_1^{(k)}(\delta _1,\delta _3) + Q_2^{(k)}(\alpha _1) + Q_3^{(k)}(\alpha _2,\beta _2) + Q_4^{(k)}(\beta _3), \end{aligned}$$

where $(n_1,n_2,\dots ,n_m)$ denotes the frequency distribution of the sample, and one sets:

$Q_1^{(k)}(\delta _1,\delta _3) = \log (\delta _1)\,\sum \limits _{r=1}^m n_r \tau _{1r}^{(k)}\, +\, \log (\delta _3)\,\sum \limits _{r=1}^m n_r \tau _{3r}^{(k)} \,+ \,\log (1-\delta _1-\delta _3) \sum \limits _{r=1}^m n_r \tau _{2r}^{(k)}$, yielding, after differentiation, the updated estimates:
$$\begin{aligned} \delta _1^{(k+1)} = \dfrac{1}{n} \sum \limits _{r=1}^m n_r\,\tau _{1r}^{(k)}; \quad \delta _3^{(k+1)} = \dfrac{1}{n} \sum \limits _{r=1}^m n_r \tau _{3r}^{(k)}; \quad \delta _2^{(k+1)} = 1- \delta _1^{(k+1)}-\delta _3^{(k+1)}\,; \end{aligned}$$
$Q_2^{(k)}(\alpha _1) = \sum \limits _{r=1}^m n_r\, \tau _{1r}^{(k)}\,\log (db(r;\alpha _1,1))$; $Q_4^{(k)}(\beta _3) = \sum \limits _{r=1}^m n_r\, \tau _{3r}^{(k)}\,\log (db(r;1,\beta _3))$;
$Q_3^{(k)}(\alpha _2,\beta _2) = \sum \limits _{r=1}^m n_r\, \tau _{2r}^{(k)}\,\log (db(r;\alpha _2,\beta _2))$.

At each step, the updated estimates of $\alpha _1, \alpha _2, \beta _2, \beta _3$ have to be obtained from numerical optimization of the corresponding functions, under the required bound constraints^{Footnote 7}.

3.4 Small simulation experiment

In order to show the performance of the estimation procedure, a small simulation experiment has been carried out: for each scenario, $B=200$ samples of size n were generated. Table 1 reports the mean squared error (MSE) of the sampling distribution of parameter estimators obtained over the simulation runs. The average dissimilarity between generating model $\varvec{p}$ and estimated distribution $\hat{\varvec{p}}$ (${\widehat{Diss}}(\varvec{p},\hat{\varvec{p}})$) and between frequency distribution of the sample $\varvec{f}$ and estimated distribution $({\widehat{Diss}}(\varvec{f},\hat{\varvec{p}})$) is reported. Analogous simulation experiments are pursued also for ${\text {OFS}}_{101}$ and ${\text {OFS}}_{110}$ for the sake of completeness (see Tables 2 and 3). Results are satisfactory and indicate that the model is correctly specified and estimated, with efficiency improving with sample size.

Table 1 MSE of the sampling estimators of ${\text {OFS}}_{111}$ parameters

Full size table

Table 2 MSE of the sampling estimators of ${\text {OFS}}_{101}$ parameters

Full size table

Table 3 MSE of the sampling estimators of ${\text {OFS}}_{110}$ parameters

Full size table

3.5 Standard errors for OFS parameters

Uncertainty evaluation of parameters estimates could be performed by resorting to asymptotic information theory on the basis of the observed information matrix (see Appendix 1 for details). Potential drawbacks of this procedure may arise due to possible occurrence of numerical overflow in the approximation of the involved integrals. In this respect, numerical derivatives of the log-likelihood can be computed directly with Richardson’s extrapolation method, as suggested in (Ursino and Gasparini 2018)^{Footnote 8}. By considering that information theory results apply only asymptotically under regularity conditions, re-sampling methods as the bootstrap (Efron 1981) can be assumed as a general practice for OFS models, allowing to obtain stable accuracy evaluations on parameter estimates even for small sample sizes.

A small Monte-Carlo experiment has been pursued to compare the asymptotic performance of the different methods: for selected OFS models, n observations were sampled. For the general ${\text {OFS}}_{111}$ model, Table 4 reports standard errors’ estimates obtained on the basis of the observed information matrix (Inf.), numerical approximation of the derivatives of the log-likelihood function with the Richardson’s extrapolation method (Num.), and nonparametric bootstrap with $B=500$ replicates (Boot.). The three methods are asymptotically equivalent, but for small and moderate sample sizes, the data-driven procedure Boot entails more accurate results^{Footnote 9}. For instance, numerical divergence for some of the integrals involved in the computation of the observed information matrix occurred for $n=500$.

Table 4 Comparison of standard errors: ${\text {OFS}}_{111}$ model with $m=11, \delta _1=0.25; \delta _3=0.4; \alpha _1=0.2; \beta _3=0.6; \alpha _2=3; \beta _2=4$

Full size table

The same check limited to numerical and bootstrap methods is pursued for instances of ${\text {OFS}}_{110}$ and ${\text {OFS}}_{101}$ models (see Tables 5 and 6).

Table 5 Comparison of standard errors obtained by numerical differentiation of log-likelihood to obtain the Hessian matrix (Num.) and nonparametric bootstrap: ${\text {OFS}}_{101}$ model with $m=7, \delta _1=0.6; \alpha _1=0.4; \beta _3=0.7$

Full size table

Table 6 Comparison of standard errors obtained by numerical differentiation of log-likelihood to obtain the Hessian matrix (Num.) and nonparametric bootstrap: ${\text {OFS}}_{110}$ model with $m=9, \delta _1=0.6; \alpha _1=0.3; \alpha _2=4; \beta _2=1.5$

Full size table

3.6 A comparative discussion with the state of the art

Like the OFS family, mihg (Simone and Iannario 2018) and caub (Simone and Tutz 2018) mixture models pursue a direct parameterization of the features of interest of the distribution, with easy interpretation and explicit location of modal values (yet the mihg does not consider floatation). In this context, a 3-component mixture of Binomial distributions could be also considered if suitable constraints are put on Binomial parameters to model polarization and floatation: its specification will not be discussed hereafter, since the Binomial model can be approximated by the DB model (Ursino 2014): see (Grilli et al. 2015) for further applications of Binomial mixtures to discrete data.

The proposal of the bimodal discrete shifted Poisson model (Bi-Poiss) advanced in (Gómez-Déniz et al. 2020), instead, deals with a construction to encompass bimodal count data starting from the Poisson model, with addition of an extra dispersion parameter $\theta $ responsible for bimodality (not necessarily at the extremes of support)^{Footnote 10}. After truncation at m, the main drawback of the Bi-Poiss model is the lack of an explicit link between parameter values and polarization and floatation of the response: for instance, theoretical values for the modes can be obtained in terms of parameters by solving numerically nonlinear equations. In addition, the Bi-Poiss does not encompass the scenario of three response clusters as the ${\text {OFS}}_{111}$ model. Conversely, the Bi-Poiss model is directly applicable in case of bimodality at inner categories, whereas specification of mixtures of DB models in this case should be designed carefully for identifiability issues (see Appendix 2).

For bimodal discrete data, a two-component mixture of (truncated) Conway–Maxwell–Poisson models can be considered as well (Mix-CMP, (Sur et al. 2015))^{Footnote 11}. With respect to computational aspects, the M-step within the EM algorithm needs to be performed with a computationally demanding grid search since the ML solution for Mix-CMP is highly dependent of initial values. With respect to the problem under examine, the main drawback about Mix-CMP concerns identifiability, which causes several limitations on interpretation of the response location and dispersion. Specifically, parameters are not straightforwardly interpretable in terms of polarization and floatation, as for the OFS family. As to fitting performances, a tentative approach to pursue a comparative analysis with the OFS family requires to set suitable parameter constraints to mitigate identifiability issues for the Mix-CMP, at the cost of lack of flexibility. For instance, the supporters’ pole can be shaped by restricting to a $CMP(\lambda _S,\nu _S)$ with $\lambda _S \in (m-2,m)$, $\nu _S \in (0,1)$, whereas a $CMP(\lambda _O,\nu _O)$ model with $\lambda _O,\nu _O \in (0,1)$ can be considered for the opponents’ pole. Floatation could be possibly considered explicitly if a component $CMP(\lambda _F, \nu _F), \lambda _F, \nu _F > 1$, is specified in the mixture. For count data, each component should be truncated from below at the minimum observed count, and from above at the largest observed count or at the censoring threshold, whereas it should be truncated from above at $m-1$ and then shifted upward by 1 in case of ratings on Likert-type scales, as argued for the Binomial component in cub mixtures (Piccolo and Simone 2019).

It is worth to remark that for data exhibiting bi-polarization and floatation, a 3-component mixture of CMP would have a higher model complexity than the ${\text {OFS}}_{111}$ model; similarly, the Mix-CMP would be less parsimonious than mihg and ${\text {OFS}}_{101}$ for U-shaped distributions and than ${\text {OFS}}_{110}$ or ${\text {OFS}}_{011}$ for bimodal data with one mode at one of the extremes.

In order to show that the OFS family is successfully applicable also in case of (truncated) distributions of count data, Table 7 reports some performance indicators of alternative models for the Health Heritage Competition data discussed in (Sur et al. 2015)^{Footnote 12}. Fitting results of a unique DB$(\alpha ,\beta )$ model with no parameter constraints, and of the cub mixture (Piccolo and Simone 2019), possibly allowing for inflation at the last category (cub with shelter), are also reported. The last column reports the average of the p values for the Pearson $X^2$ goodness-of-fit statistics, applied on each test set of a $K=30$-fold cross-validation^{Footnote 13} based on the model estimated on the remaining $K-1$ folds^{Footnote 14}: it follows that the ${\text {OFS}}_{111}$ entails very satisfactory performance.

Table 7 Models comparison for the Number of Days in Hospital dataset discussed in (Sur et al. 2015) (the two-component Mix-CMP has been fitted with the discussed constraints: ${\hat{p}}=0.95, {\hat{\lambda }}_1=0.64; {\hat{\nu }}_1=0.05; {\hat{\lambda }}_2=13; {\hat{\nu }}_2=0.88$)

Full size table

Thus, OFS mixtures could be successfully applied to assess the efficiency of health care structures, for instance, as well as for other count data, thanks to good flexibility in both fitting and interpretation. For instance, in this case floatation covers the intermediate stays, whereas polarization should be interpreted as the predominance of short and long hospitalizations, with parameters $\alpha _1, \beta _3$ describing the concentration of brief and lengthy stays towards the lowest and largest count, respectively. Finally, OFS mixing weights quantify how frequent short, intermediate and long hospitalizations are overall. For the example, results indicate that intermediate hospitalizations tend to be as shorter as possible since the floatation component is right-skewed with modal value at the second category^{Footnote 15}.

For the subsequent case studies, fitting results of both the Bi-Poiss and the (constrained) Mix-CMP models will be reported for the sake of comparisons.

Remark 4

Noticeably, the latent Beta polarization components $f(x;\alpha _1,1)$ and $f(x; 1,\beta _3)$ of the OFS family are particular cases of the Kumaraswami distribution (Jones 2009), with density $g(x; \alpha ,\beta )= \alpha \,\beta \,x^{\alpha -1}(1-x^{\alpha })^{\beta -1}$ for $x \in [0,1]$ that is similar to the Beta distribution for several aspects, yet more tractable from the mathematical point of view. Preliminary investigations seem to indicate that mixture specification within this family would not imply identifiability issues as for the Beta mixtures discussed in Appendix 2. Thus, a mixture of two discretized Kumaraswami distributions, one with parameters $(\alpha _1,\beta _1)$ such that $\min (\alpha _1,\beta _1)<1$ for polarization, and one component with parameters $(\alpha _2,\beta _2)$ with $\min (\alpha _2,\beta _2)>1$ for floatation, could be an alternative model for the problem under examine, yet with lack of straightforward and symmetrical interpretation of parameters with respect to polarization and floatation; further, non-uniform symmetric shapes would not be encompassed.

4 A case study on the probability to vote for German Political Parties

The data analysed in the present section are taken from the GESIS ALLBUS German Social Survey (Gesis 2016). On a rating scale ranging from $1=$ “very unlikely”, $10=$ “very likely”, respondents were asked to rate: “How likely it is that you would ever vote for this German party?”. Hereafter, ratings for the four main parties (CDU, SPD, FDP, The Greens) collected in 2002 and 2008 will be considered. The last two categories have been collapsed to yield rating measurements on a scale with $m=9$ categories. After list-wise omission of missing values, samples of $n=2738$ and $n=3056$ observations are analysed for 2002 and 2008 data, respectively. Within the OFS framework, polarization is meant as resoluteness of the opinion of opponents and supporters, whereas floatation can be also interpreted as indecision.

Table 8 reports the best model for each rating variable, selected on the basis of a joint analysis of multiple criteria, including $X^2$ Statistics, likelihood ratio tests for nested models and BIC values. As a general rule, the most parsimonious specification has been preferred in case of weakly significant evidence for a more complex model, if comparable satisfactory results hold for the other criteria (see Appendix 3 for details).

Table 8 Best OFS mixture (see Table 12 in Appendix 3 for details)

Full size table

It follows that:

For the CDU, the structural components of the probability to vote have not changed neither in size nor in intensity from 2002 to 2008;
For the SPD, the neutrality component in 2002 has transformed to a more general yet symmetric indecision component;
For the Greens and the FDP, instead, evidence for the supporter pole was found only in 2008: given the positive asymmetry of the floatation component in 2002 (see Table 9) and its symmetry in 2008, it can be concluded that there has been a movement of the undecided opinions towards the supporter pole from 2002 and 2008.

The parameterization of polarization and indecision accomplished via OFS mixtures allows to identify if and to what extent changes have occurred in the probability to vote for German Parties. Figure 1 shows estimated polarization parameters ${\hat{\delta }}_1,{\hat{\delta }}_3,{\hat{\alpha }}_1,{\hat{\beta }}_3 \in (0,1)$ for all parties in 2002 (left panel) and in 2008 (right panel). Lower and upper bounds of 95%-bootstrap confidence intervals are displayed with star symbols at the edge of the whiskers departing from the point estimates. It follows that:

Polarization and floatation components of the voting probabilities for the CDU are overall stable from 2002 to 2008, in both intensity and size;
For the SPD, a significant decrease is observed for both $\delta _3$ and $\beta _3$: thus, given that no relevant variation is observed for $\delta _1$, it can be inferred that indecision has increased, but positive evaluations have further polarized.
For the Green and the FDP parties, a significant decrease is observed in both $\delta _1$ and $\alpha _1$, indicating that the opposition pole grew in intensity but decreased in size. As a result, it can be inferred that some negative yet un-polarized evaluations have floated towards a symmetric indecision (see also Table 9).

Figure 2 provides a joint representation of estimation results for the sizes of polarization and floatation with a ternary plot of mixing weights (left), whereas a scatter plot of polarization parameters $\alpha _1,\beta _3$ in the unit square is displayed to compare the strengths of unfavourable and favourable opinions over time (right).

Finally, Table 9 reports the chosen asymmetry measure $\gamma _1$ defined in (4) and the adjusted kurtosis value $\gamma _2^{\star }$ (15) for the estimated indecision component for those parties and time points where it is not degenerate. The extent of floatation of negative opinions towards neutrality is then quantified, as is the extent by which un-polarized opinions became more homogeneous from 2002 to 2008 for both the FPD and the Greens (more for the FDP than for the Greens). The reverse circumstance is observed for SPD, for which the neutrality component in 2002 left the place to a general yet symmetric indecision. The analysis and the proposed visualization tools for the results could be replicated conditional to covariates values (as gender, geographical residence, etc) to give local assessments of the polarization and floatation dimensions.

Table 9 Asymmetry and Adjusted kurtosis of the floatation component for the best OFS mixture (see Table 8)

Full size table

Finally, a 10-fold cross-validation is performed to check the ability of the selected best model (Table 8) to predict the rating distribution. Table 10 reports some summarizing indicators: average and 9th decile over folds of the dissimilarity index between the best model $\varvec{p}_\mathrm{train}^{\star }$, estimated on the training set, and the response distribution on the test set ($\varvec{f}_{test}$), are proposed as a proxy of prediction errors for the test set distribution. With the same goal and strategy, the average over folds of the Kullback–Leibler divergence is reported for candidate OFS models. Results indicate that, beyond fitting ability, the flexibility of OFS models allows to attain satisfactory predictive performance.

Table 10 Summarizing results for goodness of fit and predicting ability of the best model, over 10-fold cross-validation

Full size table

5 Final considerations

The paper has discussed mixture specification of Discretized Beta models to explicitly parameterize polarization and floatation of discrete ordered evaluations, as ratings and (truncated) count data. The proposal is more flexible than other alternative models in both fitting performance and interpretation: for instance, the method presented in (Gómez-Déniz et al. 2020) to induce bimodality in a distribution could be applied also to the DB model, at the cost of losing the direct a-priori parameterization of polarization features afforded by the OFS models. A devoted R package for OFS implementation is under development.

Further research will be tailored to the analysis of tail dependencies of polarization and floatation of different survey items with suitable copula modelling, as well as to the implementation of model-based trees to derive response profiles in terms of covariates entailing a significant effect in at least one model’s features (see (Cappelli et al. 2019; Simone et al. 2019) for the case of cub models for rating data). A comparative analysis with mixtures of discretized Kumaraswami distributions also deserve in-depth investigation in future research.

Notes

It is worth to emphasize that the term polarization for ordinal data analysis is not new: in general, it is meant as the extent by which the response distributions of a-priori groups (typically, identified by covariates) are concentrated around certain locations of the response support. A nonparametric approach to assess polarization in this sense is presented in (Mussini 2018), where a detailed summary of different concepts and measures of polarization for ordinal evaluations is provided.
Notice that these conditions, depending on m, are eventually needed only for values of $\max (\alpha ,\beta )$ very close to 1 from below.
The conditions $\frac{\alpha -1}{\alpha +\beta -2}<\frac{1}{m}$ and $\frac{\alpha -1}{\alpha +\beta -2}>1-\frac{1}{m}$ needed to achieve $Mo(R)=1$ or $Mo(R)=m$, respectively, if $\min (\alpha ,\beta )>1$, will not be considered for the sake of simplicity.
For a symmetric distribution ($\alpha =\beta $), the excess kurtosis $\gamma _2(\alpha ,\beta ) = -\frac{6}{2\alpha +3}$ tends to 0 as $\alpha $ grows to infinity, and the distribution becomes degenerate with mass concentrated at the middle of the support. Notice that a symmetric DB distribution has two modal values at the central categories if m is even.
It is worth to remark that the OFS model is reversible with respect to the scale, in the sense that if $R \sim {\text {OFS}}_{111}(\delta _1,\delta _3,\alpha _1,\alpha _2,\beta _2,\beta _3)$, then $m-R+1 \sim {\text {OFS}}_{111}(\delta _3,\delta _1,\beta _3,\beta _2,\alpha _2,\alpha _1)$.
Indeed, from the identity $\min (a,b) = \dfrac{1}{2}\big ((a+b)-|a-b|\big )$ holding for $a,b \in {\mathbb {R}}^{+}$, one can write $Diss(\varvec{f}, \varvec{p}) = 1- \sum \nolimits _{r=1}^m \min (f_r,p_r)$.
The optimization of $Q_3^{(k)}(\alpha _2,\beta _2)$ under the inequality constraints (2) can be implemented via the R package nloptr (Ypma 2018). The optimization procedure requires to set a finite upper bound for the floatation parameters $\alpha _2$, $\beta _2$, estimated on logarithmic scale.
The R package numDeriv has been considered for this task (Gilbert and Varadhan 2019).
The circumstance that standard errors obtained from bootstrap methods outperform the ones obtained from observed information matrix occurred also in (Basford et al. 1997).
If $g(x;\lambda )$ denotes the (shifted) Poisson probability function, $x \in {\mathbb {N}}$, the bimodal discrete shifted Poisson model is defined by the probability mass function:
$$\begin{aligned} f(x; \lambda , \theta ) = w(x; \lambda , \theta )\,g(x;\lambda ),\, x \ge 1 \end{aligned}$$
where $w(x; \lambda , \theta ) = \dfrac{2\lambda + \theta (1+\lambda -x)(2\sqrt{\lambda } + \theta (1+\lambda -t))}{\lambda (2+\theta ^2)}$. In particular, $\theta >0$ or $\theta <0$ for overdispersed or underdispersed distribution with respect to the Poisson.
A discrete random variable $X \sim CMP(\lambda ,\nu )$ has the Conway–Maxwell–Poisson distribution of parameters $\lambda ,\nu >0$ if $Pr(X=x) = \frac{\lambda ^x}{(x!)^{\nu }} \frac{1}{\sum _{j=0}^{\infty } \frac{\lambda ^j}{(j!)^{\lambda }}}$, $x\ge 0$. If $\nu =1$, the Poisson model is recovered, whereas $\nu < 1$ or $\nu >1$ implies overdispersion or underdispersion, respectively.
After omitting zero counts, the observed distribution has been censored at 15, so that the observed scores $\{1,\dots ,15+\}$ have frequencies $(9299,4548,2882,1819,1093,660,474,316,263,209,145,135,111,65,479)$, with saturated log-likelihood $l_{sat}=\sum \nolimits _{r=1}^{15} n_r \log (\frac{n_r}{n})=-41180.28$.
The R package caret has been exploited to split the data (Khun 2020).
The choice of setting $K=30$ allows that that each fold has a moderate sample size of 750 observations.
For the ${\text {OFS}}_{111}$, ${\hat{\delta }}_1=\underset{( 0.023 )}{0.452}, {\hat{\delta }}_3=\underset{( 0.0032 )}{0.0162}, {\hat{\alpha }}_1= \underset{( 0.007 )}{0.175}, \widehat{\log (\alpha _2)}=\underset{( 0.071 )}{0.478},\widehat{\log (\beta _2)}=\underset{( 0.060 )}{2.258}, {\hat{\beta }}_3=\underset{( 0.077 )}{0.010}$, so that ${\text {OFS}}_{111}$ is substantially an ${\text {OFS}}_{110}$ with inflation at $m=15$ (the supporters’ pole is degenerate: see Remark 2).
The approximation (17) stems from a straightforward adaptation to mixtures of the method of moments applied in (Jóhannesson and Giri 1995) to derive a Beta approximation to linear combinations of independent Beta random variables.

References

Abramowitz M, Stegun IA (eds) (1972) Handbook of mathematical functions with formulas, graphs, and mathematical tables, 10th printing. Dover, New York, p 930
MATH Google Scholar
Ahmad KE, Al-Hussaini EK (1982) Remarks on the non-identifiability of mixtures of distributions. Ann Inst Stat Math 34:543–544
Article Google Scholar
Apouey B (2007) Measuring health polarization with self-assessed health data. Health Econ 16(9):875–894
Article Google Scholar
Basford KE, Greenway DR, McLachlan GJ, Peel D (1997) Standard errors of fitted component means of normal mixtures. Comput Stat 12(1):1–17
MATH Google Scholar
Blest DC (2003) A new measure of kurtosis adjusted for skewness. Aust N Z J Stat 45(2):175–179
Article MathSciNet Google Scholar
Capecchi S, Piccolo D (2017) The distribution of Net Promoter Score in socio-economic surveys. In: SIS 2017. Statistics and Data Science: new challenges, new generations. Proceedings of the Conference of the Italian Statistical Society. Petrucci A., Verde R. Editors. 247–252. ISBN: 978-88-6453-521-0
Cappelli C, Simone R, Di Iorio F (2019) CUBREMOT: a model-based tree for ordinal responses. Expert Syst Appl 124:39–49
Article Google Scholar
Efron B (1981) Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599
Article MathSciNet Google Scholar
Fasola S, Sciandra M (2015) New flexible probability distributions for ranking data. In: Morlini I, Minerva T, Vichi M (eds) Advances in statistical models for data analysis. Springer, Berlin, pp 117–124
Chapter Google Scholar
Forbes C, Evans M, Hastings N, Peacock B (2011) Statistical distributions, 4th edn. Wiley, Hoboken
MATH Google Scholar
GESIS Leibniz Institute for the Social Sciences (2016) German General Social Survey (ALLBUS) - Cumulation 1980-2014, GESIS Data Archive, Cologne. ZA4584 Data file version 1.0.0. https://doi.org/10.4232/1.12574
Gibbs AL, Su FE (2002) On choosing and bounding probability metrics. Int Stat Rev 70:419–435
Article Google Scholar
Gilbert P, Varadhan R (2019) numDeriv: accurate numerical derivatives. R package version 2016.8-1.1. https://CRAN.R-project.org/package=numDeriv
Gómez-Déniz E, Pérez-Rodríguez JV, Reyes J, Gómez HW (2020) A bimodal discrete shifted Poisson distribution. A case study of tourists’ length of stay. Symmetry 12(3):442
Article Google Scholar
Grilli L, Rampichini C, Varriale R (2015) Binomial mixture modelling of University Credits. Commun Stat Theory Methods 44(22):4866–4879
Article Google Scholar
Jóhannesson B, Giri N (1995) On approximations involving the Beta distribution. Commun Stat Simul Comput 24(2):489–503
Article MathSciNet Google Scholar
Jones MC (2009) Kumaraswamy’s distribution: a beta-type distribution with some tractability advantages. Stat Methodol 6(1):70–81
Article MathSciNet Google Scholar
Leti G (1983) Statistica descrittiva. Il Mulino, Bologna
Google Scholar
Kuhn M (2020) caret: Classification and Regression Training. R package version 6.0-86. https://CRAN.R-project.org/package=caret
McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions, 2nd edn. Wiley Series in Probability and Statistics
Morrison DG (1979) Purchase intentions and Purchase Behavior. J Mark 43(2):65–74
Article Google Scholar
Mussini M (2018) On measuring polarization for ordinal data: an approach based on the decomposition of the Leti index. Stat Transit New Ser 19(2):277–296
Article Google Scholar
Piccolo D, Simone R (2019) The class of cub models: statistical foundations, inferential issues and empirical evidence. Stat Method Appl 28(3):389–435 (with discussions and rejoinder)
Reichheld FF (2003) The one number you need to grow. Harv Bus Rev 81:46–54
Google Scholar
Simone R (2021) An accelerated EM algorithm for mixture models with uncertainty for rating data. Comput Stat 36:691–714
Article MathSciNet Google Scholar
Simone R, Iannario M (2018) Analysing sport data with clusters of opposite preferences. Stat Model 18(5–6):505–524
Article MathSciNet Google Scholar
Simone R, Tutz G (2018) Modelling uncertainty and response styles in ordinal data. Stat Neerl 72:224–245
Article MathSciNet Google Scholar
Simone R, Cappelli C, Di Iorio F (2019) Modelling marginal ranking distributions: the uncertainty tree. Pattern Recogn Lett 125:278–288
Article Google Scholar
Sur P, Shmueli G, Bose S, Dubey P (2015) Modeling bimodal discrete data using Conway–Maxwell–Poisson mixture models. J Bus Econ Stat 33(3):352–365
Article MathSciNet Google Scholar
Ursino M (2014) Ordinal Data: a new model with applications, PhD Thesis, http://porto.polito.it/2535701/, Politecnico di Torino, Italy
Ursino M, Gasparini M (2018) A new parsimonious model for ordinal longitudinal data with application to subjective evaluation of a gastrointestinal disease. Stat Methods Med Res 27(5):1376–1393
Article MathSciNet Google Scholar
Yakowitz SJ, Spragins JD (1968) On the identifiability of finite mixtures. Ann Math Stat 39:209–214
Article MathSciNet Google Scholar
Ypma J, with contributions by Borchers HW, Eddelbuettel D (2018) https://CRAN.R-project.org/package=nloptr

Download references

Author information

Authors and Affiliations

University of Naples Federico II: Universita degli Studi di Napoli Federico II, Naples, Italy
Rosaria Simone

Authors

Rosaria Simone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rosaria Simone.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Miscellanea about the Discretized Beta model

For a generic random variable X, let $\mu $ and $\sigma $ be the mean value and the standard deviation, respectively. If $m_3$ and $m_4$ denote the standardized central moments around the mean of order three and four, respectively, then the meson f is the location index proposed in (Blest 2003), defined by:

$$\begin{aligned} f = \bigg (\sqrt{\frac{1}{4}\,m_3^2 +1} + \frac{1}{2}m_3\bigg )^{\frac{1}{3}} - \bigg (\sqrt{\frac{1}{4}m_3^2 +1} - \frac{1}{2}m_3\bigg )^{\frac{1}{3}} \end{aligned}$$

(14)

and verifying ${\mathbb {E}}[(X-\xi )^3] = 0 $, if $\xi = \mu + f\sigma $, so that $\xi $ minimizes the fourth moment of X around the mean $\mu $. Then, the measure of kurtosis adjusted for skewness proposed in (Blest 2003) allows to compare distributions with the same degree of flatness, regardless of the location of their tails. It is defined as the standardized fourth moment around the meson f, namely:

$$\begin{aligned} \gamma _2^{\star } = m_4 - 3\big ((1+f^2)^2 -1\big ). \end{aligned}$$

(15)

For the DB model, Fig. 3 displays contour lines for the adjusted kurtosis index (left) and selected DB distributions with corresponding values of $\gamma _1(\alpha ,\beta ), \gamma _2^{\star }(\alpha ,\beta )$ (right). As a benchmark, notice that $\gamma _2^{\star }=1.8$ for the uniform distribution, whereas $\gamma _2^{\star } \rightarrow 3$ if $\alpha =\beta \rightarrow \infty $.

1.1 Appendix 1.2: On the observed information matrix for DB and OFS models

If $\Gamma (x)$ denotes the Euler Gamma function, let $\psi (x) = \frac{d}{dx} \log (\Gamma (x))$ be the digamma function, and $\psi _1(x) = \frac{d}{dx}\psi (x)$ be the trigamma function. To shorten the notation, for $k=1,\dots ,m$, let $db_k = Pr(R=k|a,b)= \dfrac{1}{B(a,b)} I_k(a,b)$, where $I_k(a,b) = \int \limits _{\frac{k-1}{m}}^{\frac{k}{m}}x^{a-1}(1-x)^{b-1}dx$, and consider the identity $ \dfrac{\partial B(x,y)}{\partial x} = B(x,y)\big (\psi (x) - \psi (x+y)\big ).$ By virtue of the symmetry of the Beta function ($B(x,y)=B(y,x)$), the first- and second-order derivatives of the logarithm of DB probabilities-according to the chosen parameterization-are:

$\frac{\partial \log (db_k)}{\partial a} = \psi (a+b) - \psi (a) + \frac{1}{I_k(a,b)} \frac{\partial I_k(a,b)}{\partial a} $;
$\frac{\partial \log (db_k)}{\partial b} = \psi (a+b) - \psi (b) + \frac{1}{I_k(a,b)} \frac{\partial I_k(a,b)}{\partial b} $;
$\frac{\partial ^2 \log (db_k)}{\partial ^2 a} = \psi _1(a+b) - \psi _1(a) - \frac{1}{I_k(a,b)^2} \bigg (\frac{\partial I_k(a,b)}{\partial a}\bigg )^2 \,+\, \frac{1}{I_k(a,b)} \frac{\partial ^2 I_k(a,b)}{\partial ^2 a} $;
$\frac{\partial ^2 \log (db_k)}{\partial ^2 b} = \psi _1(a+b) - \psi _1(b) - \frac{1}{I_k(a,b)^2} \bigg (\frac{\partial I_k(a,b)}{\partial b} \bigg )^2 \,+\, \frac{1}{I_k(a,b)} \frac{\partial ^2 I_k(a,b)}{\partial ^2 b} $;
$\frac{\partial ^2 \log (db_k)}{\partial a \,\partial b} = \psi _1(a+b) - \frac{1}{I_k(a,b)^2} \bigg (\frac{\partial I_k(a,b)}{\partial a} \bigg ) \bigg (\frac{\partial I_k(a,b)}{\partial b} \bigg )\,+\, \frac{1}{I_k(a,b)} \frac{\partial ^2 I_k(a,b)}{\partial a \partial b} $;

where:

$\frac{\partial I_k(a,b)}{\partial a} = \int \nolimits _{\frac{k-1}{m}}^{\frac{k}{m}} x^{a-1}(1-x)^{b-1}(\log (x)) dx$; $\frac{\partial ^2 I_k(a,b)}{\partial ^2 a} = \int \nolimits _{\frac{k-1}{m}}^{\frac{k}{m}} x^{a-1}(1-x)^{b-1}(\log (x))^2 dx$;
$\frac{\partial I_k(a,b)}{\partial b} = \int \nolimits _{\frac{k-1}{m}}^{\frac{k}{m}} x^{a-1}(1-x)^{b-1}(\log (1-x)) dx$; $\frac{\partial ^2 I_k(a,b)}{\partial ^2 b} = \int \nolimits _{\frac{k-1}{m}}^{\frac{k}{m}} x^{a-1}(1-x)^{b-1}(\log (1-x))^2 dx$;
$\frac{\partial ^2 I_k(a,b)}{\partial a \partial b} = \int \nolimits _{\frac{k-1}{m}}^{\frac{k}{m}} x^{a-1}(1-x)^{b-1}(\log (1-x)\log (x)) dx$.

Then, from the results above, the observed information matrix for OFS models can be promptly derived (the chain rule has to be suitably applied if the log transform is considered for the parameters of a given DB component). Similar steps are needed to compute first- and second-order derivatives of the complete log-likelihood function in order to obtain the observed information matrix within the EM algorithm with Louis’ identity and to speed up convergence, as proposed in (Simone 2021) for cub models.

Appendix 2: On identifiability of Discretized Beta mixture models

On the basis of a characterization proven in (Yakowitz and Spragins 1968), mixtures of Beta distributions are not identifiable, in general (Ahmad and Al-Hussaini 1982). Thus, suitable constraints should be put on parameters of mixture components when combining DB models for polarization with a DB model for floatation. The core of the discussion is the following theorem, whose proof follows straightforwardly.

Theorem 1

Let $f(x;\alpha ,\beta )$ be the Beta probability density function over [0, 1], with $\alpha , \beta >0$. Then:

$$\begin{aligned} f(x;\alpha ,\beta ) \,=\, c_1\, f(x; \alpha ,\beta +1) \,+\,(1-c_1) \, f(x; \alpha +1,\beta ) \end{aligned}$$

(16)

where $c_1= \dfrac{\beta }{\alpha +\beta }$. In particular, if $\max (\alpha ,\beta )<1$, the U-shaped Beta density function $f(x;\alpha ,\beta )$ can be written as a mixture of a J-shaped Beta $f(x; \alpha +1,\beta )$ and a reverse J-shaped Beta $f(x; \alpha ,\beta +1)$.

Mixture specification within the DB family should consider also the following arguments leading to a Beta approximation of Beta mixtures^{Footnote 16}.

For a fixed $k \ge 2$, consider a mixture $g(x) = \sum \nolimits _{i=1}^k d_i f(x; \alpha _i,\beta _i)$ of Beta densities. If $\mu _{i1}$ and $\mu _{i2}$ denote the first and second moments of the i-th mixture component, let $\mu _1 = \sum \nolimits _{i=1}^k d_i \mu _{i1}$, $\mu _2 = \sum \nolimits _{i=1}^k d_i \mu _{i2}$ be the first and second moment of the mixture, respectively. If $s= \mu _2 - \mu _1^2$ is the variance of the mixture, and $h= \frac{\mu _1}{1-\mu _1}$, the following approximation can be derived:

$$\begin{aligned} g(x) \approx f(x; \alpha ,\beta ), \quad {\text { with}} \beta = \frac{h}{s(1+h)^3} - \frac{1}{1+h},\quad \alpha = h\beta . \end{aligned}$$

(17)

For instance, assume that $k=2$ and that $X_1 \sim f(\alpha _1,\beta _1)$ is J-shaped, whereas $X_2 \sim f(\alpha _2,\beta _2)$ is reversed J-shaped. Their mixture g(x) (with weights $d_1$ and $1-d_1$) can be approximated by a U-shaped Beta density $f(x; \alpha ,\beta )$ with parameters $(\alpha ,\beta )$ obtained as in (17). Table 11 reports some instances. The last 4 columns report the Dissimilarity index between the discretized versions of g(x) and its approximation $f(x; \alpha ,\beta )$ given in (17), for varying number of categories m. Results indicate that this approximation is satisfactory: thus, an ${\text {OFS}}_{101}$ model could be approximated by a DB$(\alpha ,\beta )$ model, which in turn can be written as a further mixture of two DB models after discretization of the representation in (16). Then, specifying DB mixtures is a challenging task, especially for small m.

Table 11 Instances of Beta approximation of 2-component mixtures of Beta distributions according to (17)

Full size table

Remark 5

As to explicative performances, it is worth to notice that the ${\text {OFS}}_{101}$ model allows to assess the prevalence of opponents with respect to supporters directly in terms of $\delta _1$, whereas this can be assessed only indirectly from the mixture decomposition established in Theorem 1 for a DB$(\alpha _1,\beta _3)$ model.

1.1 Appendix 2.1: Two-component mixture of DB models for polarization and floatation

With reference to model specification (6), assume that $\max (\alpha _1,\beta _1)>1$: without loss of generality, assume that $\alpha _1 = \min (\alpha _1,\beta _1) < 1$ and that $\beta _1 = \max (\alpha _1,\beta _1) > 1$. By applying Theorem 1 on the latent polarization component with parameters $(\alpha _1,\beta _1)$, it follows that:

$$\begin{aligned} Pr(R=r|\varvec{\theta })&= \frac{(1-\delta )\beta _1}{\alpha _1+\beta _1} db(r;\alpha _1,\beta _1+1) \nonumber \\&\quad + \bigg (\frac{ (1-\delta )\alpha _1}{\alpha _1+\beta _1} db(r;\alpha _1 + 1,\beta _1) \,+\, \delta \,db(r;\alpha _2,\beta _2) \bigg ) \end{aligned}$$

(18)

$$\begin{aligned}&\approx (1-\delta ^{\star }) db(r;\alpha _1^{\star },\beta _1^{\star }) \,+\, \delta ^{\star } db(r;\alpha _2^{\star },\beta _2^{\star }) \,=\,Pr(R=r|\varvec{\theta }^{\star }), \end{aligned}$$

(19)

where:

$\alpha _1^{\star } = \alpha _1 \in (0,1)$, $\beta _1^{\star }=\beta _1+1 > 1$ (so that DB$(\alpha _1^{\star },\beta _1^{\star })$ is reverse J-shaped as DB$(\alpha _1,\beta _1)$);
$\delta ^{\star } = \delta + \frac{(1-\delta )\alpha _1}{\alpha _1+\beta _1}$;
$\alpha _2^{\star }, \beta _2^{\star }$ are obtained numerically following (17) and thus satisfy:
$$\begin{aligned} \delta ^{\star } db(r;\alpha _2^{\star },\beta _2^{\star }) \,\approx \, \frac{ (1-\delta )\alpha _1}{\alpha _1+\beta _1} db(r;\alpha _1 + 1,\beta _1) \,+\, \delta \,db(r;\alpha _2,\beta _2). \end{aligned}$$

If the constraints $\min (\alpha _2^{\star },\beta _2^{\star }) > 1$ and (2) are satisfied, identifiability concerns may arise depending on the goodness of the approximation $Pr(\,\cdot ;\varvec{\theta })\,$ $\approx \,$ $Pr(\,\cdot ;\varvec{\theta }^{\star })$ and on the distance between parameter vectors.

Similar issues may arise if $\max (\alpha _1,\beta _1)\le 1$. Applying Theorem 1 iteratively to the underlying Beta mixture, it follows that:

$$\begin{aligned} Pr(R=r|\varvec{\theta })&= (1-\delta )\bigg (\frac{\beta _1}{\alpha _1+\beta _1}db(r;\alpha _1,\beta _1+1) + \frac{\alpha _1}{\alpha _1+\beta _1}db(r;\alpha _1 +1,\beta _1)\bigg ) + \delta db(r;\alpha _2,\beta _2) \\&\approx (1-\delta ^{\star }) db(r;\alpha _1^{\star },\beta _1^{\star }) + \delta ^{\star } db(r;\alpha _2^{\star },\beta _2^{\star }) \, = \, Pr(R=r|\varvec{\theta }^{\star }), \end{aligned}$$

where (17) implies that:

$$\begin{aligned} f(x;\alpha _1^{\star },\beta _1^{\star }) \approx \frac{d_1}{d_1+d_2}\,f(x;\alpha _1,\beta _1+2) \,+ \frac{d_2}{d_1+d_2} f(x;\alpha _1+2,\beta _1); \end{aligned}$$
(20)
$d_1 = (1-\delta )\frac{\beta _1}{\alpha _1+\beta _1}\frac{\beta _1+1}{\alpha _1+\beta _1+1}$ and $d_2 = (1-\delta ) \frac{\alpha _1}{\alpha _1+\beta _1}\frac{\alpha _1+1}{\alpha _1+\beta _1+1}$;
$$\begin{aligned}&\delta ^{\star }\, f(x;\alpha _2^{\star },\beta _2^{\star })\nonumber \\&\approx \bigg ( \dfrac{2(1-\delta )\alpha _1\,\beta _1}{(\alpha _1+\beta _1)(\alpha _1+\beta _1+1)}f(x;\alpha _1+1, \beta _1+1)\, + \, \delta f(x;\alpha _2,\beta _2 )\bigg ).\nonumber \\ \end{aligned}$$
(21)

Thus, despite immediate interpretation of parameters, the mixture specification in (6) does not ensure identifiability.

1.2 Appendix 2.2: Three-component mixture of DB models for polarization and floatation

From the specification point of view, model (7) has a higher model complexity than model (6), yet the two mixture specifications have the same explicative power. Theorem 1 implies that, if $\max (\alpha _1,\beta _1) < 1$, one can write:

$$\begin{aligned}&R \sim (1-\delta ){\text {DB}}(\alpha _1,\beta _1) + \delta {\text {DB}}(\alpha _2,\beta _2)\\&\,\sim \, \delta _1 {\text {DB}}(\alpha _1^{\star },\beta _1^{\star }) + \delta _2 {\text {DB}}(\alpha _2^{\star },\beta _2^{\star }) + \delta _3 {\text {DB}}(\alpha _3^{\star },\beta _3^{\star }) \end{aligned}$$

with $\delta _1 = (1-\delta )\frac{\beta _1}{\alpha _1 + \beta _1}, \alpha _1^{\star } = \alpha _1$, $\beta _1^{\star } = 1+ \beta _1 > 1$; $\delta _2=\delta $, $\alpha _2^{\star } = \alpha _2$, $\beta _2^{\star }=\beta _2$; $\delta _3 = (1-\delta )\frac{\alpha _1}{\alpha _1 + \beta _1}$, $\alpha _3^{\star } = \alpha _3 + 1, \beta _3^{\star }=\beta _3$.

Vice versa, the convex combination of the two polarization components within the mixture (7) can be approximated by a unique U-shaped polarization component. Indeed, if a continuous random variable X has density:

$$\begin{aligned} g_X(x) = \frac{1}{\delta _1+\delta _3}\big (\delta _1 f(x;\alpha _1,\beta _1) + \delta _3 f(x;\alpha _3,\beta _3)\big ), \end{aligned}$$

then $g_X(x)$ will be U-shaped and can be numerically approximated by a Beta model according to (17). Thus, for the underlying Beta specification of the 3-component mixture (7), it follows that:

$$\begin{aligned} \delta _1 \, f(x;\alpha _1,\beta _1) + \delta _3 \, f(x;\alpha _3,\beta _3) \,\approx \, (\delta _1+\delta _3) \, f(x;\alpha ,\beta ). \end{aligned}$$

(22)

If parameters $(\alpha ,\beta )$ are obtained numerically as in (17), then the 3-component mixture (7) can be approximated by a 2-component mixture as in (6):

$$\begin{aligned} Pr(R=r|\varvec{\theta }) \,&\approx \,(\delta _1+\delta _3)\,db(r;\alpha ,\beta )\, +\, \delta _2\, db(r;\alpha _2,\beta _2). \end{aligned}$$

Then, identifiability issues may arise for the 3-component mixture (7) if $\varvec{\theta }^{\star }$ and $\varvec{\theta }$ are not close enough while the approximation is good: indeed, by applying Theorem 1 to the approximating Beta$(\alpha ,\beta )$ in (22), since $\alpha ,\beta \in (0,1)$, one could write:

$$\begin{aligned} Pr(R=r|\varvec{\theta })&\approx (\delta _1+\delta _3)\bigg (\frac{\beta }{\alpha +\beta }db(r;\alpha ,\beta +1) + \frac{\alpha }{\alpha +\beta }db(r;\alpha +1,\beta )\bigg )\\&+\delta _2 db(r;\alpha _2,\beta _2) \\&= \delta _1^{\star }db(r;\alpha _1^{\star },\beta _1^{\star }) + \delta _2^{\star } db(r;\alpha _2^{\star },\beta _2^{\star }) + \delta _3^{\star }db(r;\alpha _3^{\star },\beta _3^{\star }) \end{aligned}$$

with $\delta _1^{\star } = (\delta _1+\delta _3)\frac{\beta }{\alpha +\beta }$, $\alpha _1^{\star } = \alpha \in (0,1)$, $\beta _1^{\star } = \beta + 1 > 1$, $\delta _2^{\star } = \delta _2, \alpha _2^{\star }=\alpha _2, \beta _2^{\star } = \beta _2$, $\delta _3^{\star } = (\delta _1+\delta _3)\frac{\alpha }{\alpha +\beta }$, $\alpha _3^{\star } = \alpha + 1 > 1$, $\beta _3^{\star } = \beta _3 \in (0,1)$.

Appendix 3: Fitting results for the case study

Table 12 reports the main fitting indicators for the competing models estimated on the data used in Sect. 4: for each criterion, the best performance are highlighted in bold font.

Table 12 Summary of fitting results for the case study presented in Sect. 4

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Simone, R. On finite mixtures of Discretized Beta model for ordered responses. TEST 31, 828–855 (2022). https://doi.org/10.1007/s11749-022-00800-7

Download citation

Received: 21 January 2021
Accepted: 21 January 2022
Published: 24 February 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11749-022-00800-7

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On finite mixtures of Discretized Beta model for ordered responses

Abstract

Similar content being viewed by others

Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares

A simple introduction to Markov Chain Monte–Carlo sampling

Mixture Models: Latent Profile and Latent Class Analysis

1 Motivation

2 Discretized Beta mixtures for polarization and floatation of ordered data

Definition 1

3 Finite mixtures of Discretized Beta model

3.1 The OFS mixture for polarization and floatation of ordered evaluations

Definition 2

Definition 3

Remark 1

Remark 2

Remark 3

3.2 Fitting performances and model selection

3.3 Inferential issues for the OFS model

3.4 Small simulation experiment

3.5 Standard errors for OFS parameters

3.6 A comparative discussion with the state of the art

Remark 4

4 A case study on the probability to vote for German Political Parties

5 Final considerations

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix 1: Miscellanea about the Discretized Beta model

1.1 Appendix 1.2: On the observed information matrix for DB and OFS models

Appendix 2: On identifiability of Discretized Beta mixture models

Theorem 1

Remark 5

1.1 Appendix 2.1: Two-component mixture of DB models for polarization and floatation

1.2 Appendix 2.2: Three-component mixture of DB models for polarization and floatation

Appendix 3: Fitting results for the case study

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation