Beta distribution is a well-known and widely used distribution for modeling and analyzing lifetime data, due to its interesting characteristics. In this paper, a six parameters beta distribution is introduced as a generalization of the two (standard) and the four parameters beta distributions. This distribution is closed under scaling and exponentiation, and has reflection symmetry property, has some well-known distributions as special cases, such as, the two and four parameters beta, generalized modification of the Kumaraswamy, generalized beta of the first kind, the power function, Kumaraswamy power function, Minimax, exponentiated Pareto, and the generalized uniform distributions. Its moments about the origin, moment generating function, incomplete moments, mean deviations, are derived. The maximum likelihood estimation method is used for estimating its parameters and applied to estimate the parameters of the six different simulated data sets of this distribution, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from the different simulated sample sizes. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two different mosques, were used to illustrate the usefulness and the flexibility of this distribution, as well as, presents better fitting than the other gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions
Hinweise
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Due to its interesting characteristics, the beta distribution is one of the well-known continuous distribution, that has a wide range of application in various filed, such as reliability applications and production quality control. It has a flexible shape, that reflects a wide range of natural and empirical phenomena in nature and reality that can be modelling with this distribution. Its domain, the interval from zero to one, add another interesting characteristic to this distribution by allowing it to consider as a probability distribution of probabilities, such as fraction of time, measurements whose values (or relative values) all lie between zero and one, or the random behavior of percentages and fractions, especially, in the cases when we have no idea about the probability, and therefore, it can be used to represents all probabilities. Another area that used beta distribution for representing possible values of probabilities or a distribution of the probabilities is the Bayesian studies, as being the prior distribution, that is widely used. In fact, it is one of the three common distributions, with the rectangular/uniform and normal distributions, that are employed to represents within the framework Bayesian analysis of continuous variables, Sheskin [1, p. 397]. Data mining methods and techniques need to use information about the prior probability knowledge, hence the beta distribution is representing a candidate for such situations, see Shi [2], and Olson and Shi [3] for further details. For an intensive reference of the beta distribution see Johnson et al. [4, p. 210–275].
The probability density function (pdf) of the four parameters beta distribution, Johnson et al. [5, p. 210], is given by;
where, the parameters \(\alpha ,\)\(\beta ,\)\(a\) and b satisfy that \(\alpha > 0,\)\(\beta > 0,\)\(a\) and \(b\) are real number such that \(a < b\), \(B\left( {\alpha , \beta } \right)\) is the beta function, Abramowitz and Stegun [6, p. 258], defined by;
This two parameters form is called sometimes, the standard beta distribution, which is obtained from (1) by making the transformation; \(x = \frac{t - a}{b - a}\).
One direction of the research employing the beta distribution is the generalization of the form given by (4), in order to be even more flexible and cover a lot of shapes.
Armero and Bayarri [7] introduced the Gauss hypergeometric distribution, with parameters \(p\), \(q, r\) and \(\delta ,\) as a generalization to the beta distribution when they studied a Bayesian queuing theory problem, with the following pdf;
where \(p > 0,\)\(q > 0,\)\(- \infty < r < \infty\), \(\delta > - 1,\) and \(F_{{\left( {2,1} \right)}}\) is the generalized hypergeometric function defined for non-negative integers n and m by;
where \(p > 0,\)\(q > 0\) and \(- \infty < s < \infty\).
Pathan et al. [9] introduced a five parameters distribution as a generalization beta distribution, called it generalized beta distribution, with pdf given by;
where the parameters \(\alpha ,\)\(\beta ,\)\(\sigma ,\)\(\rho\) and \(\gamma\) satisfy that \(\alpha > 0,\)\(\beta > 0,\)\(0 \le \sigma < 1,\)\(\rho\) and \(\gamma\) are real numbers and \(\Phi _{1} \left( . \right)\) is the Humbert’s confluent hypergeometric function given in Srivastava and Manocha [10, p. 58, Eq. (36)], and derive expressions for its distribution function moments.
Ng et al. [11] study the properties and evaluate the prediction level of a 6 parameters generalized beta distribution model with pdf given by;
where the parameters \(\alpha ,\)\(\beta ,\)\(\gamma ,\)\(\sigma ,\)\(z\) and \(\rho\) satisfy that \(\alpha > 0,\)\(\beta > 0,\)\(\gamma > 0,\)\(\sigma > 0,\)\(z < 0.5\) and \(\rho > \alpha + \beta - \gamma .\)
Although Ng et al. [11], who provided a nice literature review for the beta family, showed interesting advances of this distribution with pdf given by (8) for fitting many different types of data, as well as that of Armero and Bayarri [7], Gordy [8] and Pathan et al. [9], but it is not easy to work with empirically. Finally, Gómez-Déniz and Sarabia [12] introduced a generalization of the standard beta distribution with bounded support, and study some of its basic properties, the behavior of its maximum likelihood estimators through simulation and derive its multivariate version.
The rest of the paper is organized as follows. Section 2 defines the six parameters beta distribution (SPBD). Section 3 gives some properties of this distribution, these properties are; boundaries and some limits of the pdf of SPBD, series expansion of its pdf, its mode, quantile function, reliability function, hazard function, special cases of SPBD, some transformation of the SPBD, its scaling, exponentiation and reflection symmetry properties, generation of its random variates, its order statistics distribution, moments about the origin, mean and variance, moment generating function, harmonic mean, incomplete moments, mean deviations, probability weighted moments, Renyi entropy, and Lorenz and Bonferroni curves. Section 4 introduces estimation of its parameters using the method of maximum likelihood estimation (MLE). Section 5 gives six miscellaneous simulation study of the SPBD to check the performance of the MLE. Section 6 uses the SPBD and other nested and related distributions to fit two different real-life data. Finally, Sect. 7 ends with conclusions.
2 The Six Parameters Beta Distribution
Let \(0 < a, b, \alpha , \beta ,A,B < \infty\), such that \(A < B\), and define the function \(f\) by:
where \(B\left( {\alpha , \beta } \right)\) is the beta function defined by (2). We will write \(f\left( x \right)\) instead of \(f\left( {x;a, b, \alpha , \beta ,A,B} \right)\) for simplicity. We have the following proposition;
Proposition 1
The function\(f\)defined by (9) is a pdf with its cumulative distribution function (CDF) F given by;
Since \(0 < a, b, \alpha , \beta ,A,B < \infty\), and \(aA^{{\frac{1}{b}}} < x < aB^{{\frac{1}{b}}}\), then \(A < \left( {\frac{x}{a}} \right)^{b} < B\), hence \(\left( {\frac{x}{a}} \right)^{b} - A > 0\) and also \(B - \left( {\frac{x}{a}} \right)^{b} > 0\), implying that f given in (9) is non-negative. Now;
Hence, \(\mathop \int \limits_{ - \infty }^{ + \infty } f\left( x \right)dx = 1\). It follows that, for any x such that, \(a\alpha^{{\frac{1}{b}}} < x < a\beta^{{\frac{1}{b}}}\);
The rv X is said to have a SPBD with parameters \(a, b, \alpha , \beta ,A\, {\text{and}}\,B\) written as \(X\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)), if its pdf is given by (9), or equivalently, its CDF is given by (10) or (13).
Figure 1 shows some plots of the pdf of the SPBD for some of its parameter’s values, inducting that this distribution has a lot of different flexible shapes.
×
3 Some Characteristics of the SPBD
3.1 Boundaries and Some Limits of the pdf
Let us study the behavior of the pdf of the \({\text{SPBD}}(a, b, \alpha , \beta ,A,B\)) at certain points. At the boundary’s points, we have from (9) for \(0 < \beta < \infty\), that;
Since \(0 < a\) and \(aA^{{\frac{1}{b}}} < x\), then \(A < \left( {\frac{x}{a}} \right)^{b}\), that is \(\frac{A}{{\left( {\frac{x}{a}} \right)^{b} }} < 1\), we can write
Therefore, \(\frac{\partial }{\partial x}f\left( x \right) = 0,\) is equivalent to either \(f\left( x \right) = 0,\) which is discussed in Sect. 3.1 above, or
Let discuss the real roots of (20), according to the following cases.
Case 1
If \(\alpha + \beta \ne 1\), \(b = \frac{1}{\alpha + \beta - 1}\), and \(\left( {1 - b\alpha } \right)B \ne b\left( {1 - b\beta } \right)A\), that is when \(c_{1 } = 0\) and \(c_{2 } \ne 0\), then (20) has a single root given by;
If \(b \ne \frac{1}{\alpha + \beta - 1}\), that is \(c_{1 } \ne 0\), then the real roots of (20) in terms of x, that is when \(c_{2 }^{2} - 4c_{1 } c_{3 } \ge 0\), are given by;
Since \(\frac{{\partial^{2} }}{{\partial x^{2} }}f\left( {x_{i} } \right), {\text{for }}i = 1, 2,\, {\text{and }}\,3\) is not easy to be evaluated, an empirical evaluation has to be studied to see at which point \(x_{i}\) we have a local maximum in order to determined the mode of the SPBD.
3.4 Quantile Function
Let \(0 < p < 1\), then the quantile function of the rv \(X\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)), \(Q\), is defined by;
$$Q\left( u \right) = { \inf }\left\{ {x \in {\mathbb{R}};p \le F\left( x \right)} \right\}$$
Table 1 represents parameters values and domain ranges of the some selected SPBD data sets, which has different shapes and domain range, that will use for our simulation study in Sect. 5, as well as, will be used for computing of certain statistics of SPBD later in this section, while Fig. 2 represents the plots of the quantile functions of these SPBD data sets.
Table 1
Parameters values of the some selected SPBD data sets
Data set
Parameters
Domain Range
a
b
α
β
A
B
Minimum
Maximum
1
1.8
2.3
1.4
3.9
0
1
0
1.8
2
1.5
3.1
0.93
2.65
0.015
1.1
0.387020171
1.546834102
3
1.5
5.75
0.43
5.65
0.01
2
0.673387689
1.69217121
4
2
1.3
1.6
3.8
0.5
1.8
1.17346046
3.143355214
5
2
1.2
2.3
1.8
0.5
1.8
1.122462048
3.264052108
6
2
0.45
2.15
0.65
0.4
1.2
0.261047095
2.999081861
×
3.5 Reliability Function
The reliability (survival) function of \(X\sim {\text{SPBD}}(a, b,\alpha , \beta ,A,B\)) using (13), is given by;
The hazard function \(, h\left( x \right),\) of the rv \(X\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)), using (9) and (13), is given for \(aA^{{\frac{1}{b}}} < x < aB^{{\frac{1}{b}}}\), by;
\({\text{The}}\, {\text{SPBD}}(\alpha , \beta , 1, 1, 0, 1)\) is the generalized uniform distribution, Tiwari et al. [18], with pdf;
$$f\left( x \right) = \frac{\beta }{\alpha } \left( {\frac{x}{\alpha }} \right)^{\beta - 1} ,\quad 0 < x < \alpha$$
We may note that the Kumaraswamy (Case 4), standard uniform (Case 5), triangular (Case 6), Kumaraswamy power function (Case 8), minimax (Case 9), Pareto (Case 10), and the generalized uniform (Case 11) distribution are all special cases of the generalized beta of the first kind distribution (Case 4).
3.8 Transformations
Lemma 2
1.
Let the rv\({\text{U}}\)has the standard uniform distribution, \({\text{U}}(0,1\)), and the rv\({\text{X}}\)defined by\(X = a\left[ {A + \left( {B - A} \right)I^{ - 1} \left( {U,\alpha , \beta } \right)} \right]^{{\frac{1}{b}}}\), then\({\text{X}}\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)).
2.
Let the rv\({\text{X}}\sim {\text{SPBD}}(1, 1, 1,1, 0, 1\)), then the rv\({\text{Y}}\)defined by\({\text{Y}} = \left( {\frac{{1 - {\text{X}}}}{\text{X}}} \right)^{{\frac{1}{\updelta}}} \cdot {\text{e}}^{{ - \frac{\upgamma}{\updelta}}}\)has a log-logistic distribution with parameters\({\updelta }\)and\({\upgamma }\), Johnson et al. [4, p. 151], with CDF given by;
Let the rv\({\text{X}}\sim {\text{SPBD}}(a, b, 1,c,A,B\)), then the rv\({\text{Y}}\)defined by\({\text{Y}} = B - \left( {\frac{x}{a}} \right)^{b}\)has the generalized uniform distribution, Tiwari et al. [18], with CDF given by;
$$F_{Y} \left( y \right) = \left[ {\frac{y}{B - A}} \right]^{c} ,\quad 0 \le y \le B - A$$
4.
Let the rv\({\text{X}}\sim {\text{SPBD}}(a, b, 1,c,A,B)\)then the rv\({\text{Y}}\)defined by\({\text{Y}} = \left( {B - A} \right)^{ - 1} \left[ {B - \left( {\frac{x}{a}} \right)^{b} } \right]\)has the beta distribution with parameters 1 and\(c\), with CDF given by;
$$F_{Y} \left( y \right) = y^{c} ,\quad 0 \le y \le 1$$
5.
Let the rv\(X\sim {\text{SPBD}}(1, {\text{b}}, 1,1, 0, 1\)), then the rv\(Y\)defined by\(Y = \theta - b^{2} log\left( X \right)\)has an exponential distribution with parameters\(\theta\)and\(b\), Johnson et al. [5, p. 494], with CDF given by;
$$F_{Y} \left( y \right) = 1 - e^{{ - \left( {\frac{y - \theta }{b}} \right)}} ,\quad x > \theta$$
6.
Let the rv\(X\sim {\text{SPBD}}(1, {\text{b}}, 1,1, 0, 1\)), then the rv\(Y\)defined by\(Y = \mu - \beta log\left[ { - blog\left( {X^{b} } \right)} \right]\)has a Gumbel (generalized extreme value type-I) distribution with parameters\(\mu\)and\(\beta\), Forbes et al. [19, p. 98], with CDF given by;
Let the rv\(X\sim {\text{SPBD}}(1, {\text{b}}, 1,1, 0, 1\)), then the rv\(Y\)defined by\(Y = \alpha + \beta log\left( {\frac{{X^{b} }}{{1 - X^{b} }}} \right)\)has a logistic distribution with parameters\(\alpha\)and\(\beta\), Johnson et al. [4, p. 115], with CDF given by;
Let the rv\(X\sim {\text{SPBD}}(1, {\text{b}}, 1,1, 0, 1)\), then the rv\(Y\)defined by\(Y = \frac{k}{X}\), where k is a positive constant, has a Pareto distribution with parameters\(k\)and\(b\), Johnson et al. [4, p. 574], with CDF given by;
$$F_{Y} \left( y \right) = 1 - \left( {\frac{k}{y}} \right)^{b}$$
9.
Let the rv\(X\sim {\text{SPBD}}(1, {\text{b}}, 1,1, 0, 1)\), then the rv\(Y\)defined by\(Y = \xi + \alpha \left[ { - log\left( {X^{b} } \right)} \right]^{{\frac{1}{\theta }}}\)has a Weibull distribution with parameters\(\xi\), \(\alpha ,\)and\(\theta\), Johnson et al. [4, p. 629], with its CDF given by;
Therefore, the rv \({\text{X}}\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)).
Proof of cases (2) through (9) can be shown on the same lines as the proof of (1).
We may note that the SPBDs stated in Cases 2, 5, 6, 8, and 9 are all special cases of the generalized beta of the first kind distribution (see Case 4 of Sect. 3.7).□
3.9 Scaling Property
Proposition 3
(The SPBD is closed under scaling) Let the rv\(X\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)) and let the rv\(Y = cX\), where\(0 < c < \infty\), then\(Y\sim {\text{SPBD}}(ca, b, \alpha , \beta ,A,B).\)
On the same lines as the proof of Proposition 3, we can prove the following Propositions 4 and 5.□
3.10 Exponentiation Property
Proposition 4
(The SPBD is closed under exponentiation) Let the rv\(X\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)) and let the rv\(Y = X^{c}\), where\(0 < c < \infty\), then\(Y\sim {\text{SPBD}}\left( {a^{c} ,\frac{b}{c}, \alpha , \beta ,A,B} \right).\)
3.11 Reflection Symmetry Property
Proposition 5
Let the rv\(X\sim {\text{SPBD}}(a, 1, \alpha , \beta ,A,B)\)and let the rv\(Y = a\left( {B + A} \right) - X\), then\(Y\sim {\text{SPBD}}(a,1, \beta ,\alpha ,A,B).\)
3.12 Generate SPBD Random Variates
Using result Lemma 2(1), we can generate \({\text{SPBD}}\left( {a, b, \alpha , \beta ,A,B} \right)\) random variates as follows;
Let \(X_{1}\), \(X_{2}\), …, \(X_{n}\) be a random sample of size n from \({\text{SPBD}}(a, b, \alpha , \beta ,A,B\)), with pdf f and CDF F, and let \(X_{1:n}\), \(X_{2:n}\), …, \(X_{n:n}\) be their order statistics, then for \(i = 1, 2, 3, \ldots ,n\), the pdf of i-th order statistics \(X_{i:n}\), \(f_{i:n} \left( x \right),\) is given for by;
Let \(k = 1, 2, 3, \ldots\), then the moment of the rv \(X\sim {\text{SPBD}}\left( {a, b, \alpha , \beta ,A,B} \right)\) of order k about zero is given by;
Table 2 represents the mean, median, mode and variance of the selected SPBD data sets that are given in Table 1.
Table 2
Mean, median, mode and variance of the selected SPBD data sets
Data set
Variable range
Moments
Minimum
Maximum
Mean
Median
Mode
Variance
1
0
1.8
0.94632
0.955627
0.984679
0.096145
2
0.38702
1.546834
0.942554
0.953689
1.01134
0.069825
3
0.673388
1.692171
0.969019
0.948343
–
0.048677
4
1.17346046
3.143355
1.80987
1.76616
1.60839
0.135081
5
1.122462
3.264052
2.36608
2.39793
2.53353
0.215075
6
0.261047
2.999082
2.14465
2.31082
–
0.501776
3.16 The Moment Generating Function
Similarly, the moment generating function of the rv \(X\sim {\text{SPBD}}\left( {a, b, \alpha , \beta ,A,B} \right)\), \(M_{X} \left( t \right)\), can be found to be;
The mean deviation of \(X\sim {\text{SPBD}}\left( {a, b, \alpha , \beta ,A,B} \right)\) about its mean \(\mu = E\left( X \right)\), \(MD\left( \mu \right)\), is given by;
Let us compute the Renyi entropy as a measure of variation of the uncertainty of the rv \(X\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)). For \(\theta > 0\) such that \(\theta \ne 1\), we have for the rv \(X\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)) that;
For \(0 < \pi < 1\), the Lorenz curve, \(L\left( \pi \right)\), and Bonferroni curves, \(B\left( \pi \right)\), for the rv \(X\sim {\text{SPBD}}(a, b, \alpha , \beta ,A,B\)), are given by, respectively;
where \(Q\left( \pi \right)\) is the quantile function of the rv X at \(\pi\), and \(I\left( {z,k} \right)\) is the incomplete moment of the rv X. Therefore, using (25) and (26), we have that;
The maximum likelihood estimation (MLE) method will be used for estimating the parameters of the SPBD. Let \(x_{1} ,x_{2} , \ldots ,x_{n}\) be a random sample from \({\text{SPBD}}(a, b, \alpha , \beta ,A,B\)), as given by (9), then we want to estimates the parameters \(a, b, \alpha , \beta ,A, \,{\text{and}}\,B\) by maximizing the log-likelihood function, where the likelihood function L = L (\(a, b, \alpha , \beta ,A,B;x_{1} ,x_{2} , \ldots ,x_{n} )\) can be written as;
Let us inspect the normal equations \(\frac{\partial }{\partial a}\log L = 0, \frac{\partial }{\partial b}\log L = 0, \ldots , \frac{\partial }{\partial B}\log L = 0,\) to see if they admit an explicit solution. We have that;
and, since \(aA^{{\frac{1}{b}}} < x < aB^{{\frac{1}{b}}} ,\) then the MLE of \(aA^{{\frac{1}{b}}}\) and \(aB^{{\frac{1}{b}}}\) are; respectively, \(x_{1:n}\) and \(x_{n:n}\),; that is \(\widehat{aA}^{{\frac{1}{b}}} = x_{1:n}\) and \(\widehat{aB}^{{\frac{1}{b}}} = x_{n:n}\), and hence;
Since Eqs. (28)–(33) are not easy to be solved explicitly, numerical technique, as Newton Rapson method or any other well-known optimization algorithm, see Shi et al. [22], may be employed to do so, or to use a well-known software package, such as maxLik, Henningsen and Toomet [23], or GAMLSS, Stasinopoulos and Rigby [24], to find the MLE of the parameters of the SPBD.
5 A Simulation Study
In order to examine the performance of the MLE method given in Sect. 4, we perform a simulation study to do so. The bias and the mean squares errors (MSE) of the estimates are the principle measures of the performance.
The statistical software R and the Absoft Pro Fortran compiler are employed for computing. The maxLik package of the statistical software R is used mainly for computing the MLEs, see Henningsen and Toomet [23] for details of this package, while the Absoft Pro Fortran is used for other needed computations.
The six miscellaneous SPBD models given in Table 1, that have different pdf’s shapes and variable ranges, will be used to simulated data sets for each model, and for each data set, the bias and the MSE are computed for the MLE of the model parameters for different simulated sample sizes. The sample sizes that will be taken are 25, 50, 100, 300, 500, and 1000. In each situation, the parameters of, \(\theta\) say, the first model of the six SPBD models given in Table 1, are estimated from 5000 random variates generated from the given SPBD model, and the sample mean, bias, variance, and the MSE for the parameters are computed as; \(Mean\left( {\hat{\theta }} \right) = \frac{1}{5000}\mathop \sum \nolimits_{i = 1}^{5000} \hat{\theta }_{i} = \bar{\bar{\theta }}\,{\text{say}}\), Bias \(\left( {\hat{\theta }} \right) = \bar{\bar{\theta }} - \theta\), Var \(\left( {\bar{\bar{\theta }}} \right) = \frac{1}{5000}\mathop \sum \nolimits_{i = 1}^{5000} \left( {\hat{\theta }_{i} - \bar{\bar{\theta }}} \right)^{2}\), and hence M\(SE\left( {\bar{\bar{\theta }}} \right) = {\text{Var}}\left( {\bar{\bar{\theta }}} \right) + \left[ {{\text{Bias}}\left( {\hat{\theta }} \right)} \right]^{2}\). This procedure is repeated for each sample size, then repeated for each SPBD model.
Table 3 shows the bias of the estimated parameters of the different simulated SPBD data sets for each sample size, while Table 4 presents the MSE of the estimated parameters of the different simulated SPBD data sets for each sample size. Both Tables 3 and 4 show, for each of the SPBD model parameters, that the bias and MSE decreases as the sample size increases. Figure 3 shows the behaviour of the MSE plots of the estimated parameters for six the SPBD simulated data sets, which shows graphically, for of the SPBD model parameters, that the MSE decreases as the sample size increases. Hence, from the result, as the MLS plots decreases as the sample size increases, we may conclude that the MLE method seems to have high efficiency as the sample size become large.
Table 3
The bias of the estimated parameters of the simulated SPBD data sets for each sample size n
n
Actual value
Bias
a
b
α
β
A
B
\(\hat{a}\)
\(\hat{b}\)
\(\hat{\alpha }\)
\(\hat{\beta }\)
\(\hat{A}\)
\(\hat{B}\)
25
1.8
2.3
1.4
3.9
0
1
− 0.34878
0.329715
0.013104
0.399618
0.420087
0.601777
1.5
3.1
0.93
2.65
0.015
1.1
0.140342
− 0.0969
0.056847
− 0.05489
0.464818
− 0.48701
1.5
5.75
0.43
5.65
0.01
2
− 0.37808
− 0.27911
0.315561
− 0.30242
− 0.00807
− 0.19319
2
1.3
1.6
3.8
0.5
1.8
0.039897
− 0.41322
0.612345
0.366937
0.149
− 0.14444
2
1.2
2.3
1.8
0.5
1.8
− 0.18231
− 0.25249
0.334888
0.198619
− 0.11285
0.232156
2
0.45
2.15
0.65
0.4
1.2
0.396452
0.403079
− 0.34372
0.335451
− 0.1812
0.263205
50
1.8
2.3
1.4
3.9
0
1
0.512223
− 0.29606
0.58745
− 0.58925
0.451566
− 0.34638
1.5
3.1
0.93
2.65
0.015
1.1
0.414662
− 0.15015
0.687373
− 0.73152
0.294003
− 0.49454
1.5
5.75
0.43
5.65
0.01
2
0.070608
0.459072
0.278546
− 0.12252
0.102261
0.156337
2
1.3
1.6
3.8
0.5
1.8
0.153985
− 0.4023
− 0.08337
0.381255
− 0.2772
− 0.32743
2
1.2
2.3
1.8
0.5
1.8
0.111967
0.199792
0.105013
− 0.2507
0.080938
0.051812
2
0.45
2.15
0.65
0.4
1.2
0.275
− 0.28298
− 0.36704
− 0.26281
− 0.14315
− 0.26767
100
1.8
2.3
1.4
3.9
0
1
− 0.10525
0.227391
− 0.09405
− 0.67146
0.4522
0.262165
1.5
3.1
0.93
2.65
0.015
1.1
0.019408
− 0.24609
− 0.13464
0.094321
0.033752
− 0.31785
1.5
5.75
0.43
5.65
0.01
2
0.153569
− 0.36599
− 0.40473
− 0.06738
0.191963
− 0.16552
2
1.3
1.6
3.8
0.5
1.8
− 0.07852
− 0.21469
− 0.08842
0.090949
0.160529
− 0.20546
2
1.2
2.3
1.8
0.5
1.8
0.431635
0.23417
0.037447
0.241197
− 0.10608
0.343295
2
0.45
2.15
0.65
0.4
1.2
− 0.33975
0.166262
− 0.02429
− 0.4521
0.334471
0.360404
300
1.8
2.3
1.4
3.9
0
1
0.274458
0.049549
0.474342
0.345484
0.368289
− 0.24068
1.5
3.1
0.93
2.65
0.015
1.1
0.424818
0.026234
− 0.15182
0.237266
− 0.49699
0.312069
1.5
5.75
0.43
5.65
0.01
2
0.091822
0.012653
− 0.37405
− 0.31214
− 0.41615
0.322652
2
1.3
1.6
3.8
0.5
1.8
− 0.34078
− 0.25521
0.336573
0.489306
0.169098
0.536246
2
1.2
2.3
1.8
0.5
1.8
− 0.21183
0.072752
− 0.12231
− 0.05324
− 0.47379
0.106958
2
0.45
2.15
0.65
0.4
1.2
− 0.15788
0.177951
0.208611
0.352764
0.153525
− 0.16458
500
1.8
2.3
1.4
3.9
0
1
− 0.37304
0.278684
− 0.31915
− 0.19441
0.432184
0.377181
1.5
3.1
0.93
2.65
0.015
1.1
0.373571
− 0.40604
− 0.38178
− 0.47487
0.248089
0.219282
1.5
5.75
0.43
5.65
0.01
2
− 0.02872
− 0.32082
0.380835
− 0.06513
− 0.05995
0.143485
2
1.3
1.6
3.8
0.5
1.8
0.507451
− 0.11285
0.159177
0.147718
0.223398
0.356879
2
1.2
2.3
1.8
0.5
1.8
− 0.36108
0.142093
0.102362
− 0.36187
− 0.50241
0.268284
2
0.45
2.15
0.65
0.4
1.2
0.094234
0.134663
− 0.22689
− 0.2296
0.042163
− 0.00386
1000
1.8
2.3
1.4
3.9
0
1
0.204721
− 0.32209
0.257236
0.110389
0.104925
0.215005
1.5
3.1
0.93
2.65
0.015
1.1
0.098276
0.118352
0.104907
0.002979
0.410095
− 0.38891
1.5
5.75
0.43
5.65
0.01
2
− 0.14047
0.252312
− 0.46269
0.190593
− 0.14857
0.29962
2
1.3
1.6
3.8
0.5
1.8
0.177988
0.011111
− 0.31224
− 0.04958
0.227256
− 0.11569
2
1.2
2.3
1.8
0.5
1.8
0.258915
0.015733
− 0.3335
− 0.37711
− 0.35998
− 0.04232
2
0.45
2.15
0.65
0.4
1.2
0.060013
0.171233
0.137993
0.116084
0.389542
− 0.13421
Table 4
The MSE of the estimated parameters of the simulated SPBD data sets for each sample size n
n
Actual value
MSE
a
b
α
β
A
B
\(\hat{a}\)
\(\hat{b}\)
\(\hat{\alpha }\)
\(\hat{\beta }\)
\(\hat{A}\)
\(\hat{B}\)
25
1.8
2.3
1.4
3.9
0
1
1.342013
1.954471
0.932849
1.895358
2.084947
1.762417
1.5
3.1
0.93
2.65
0.015
1.1
1.402194
2.396725
0.793264
1.629038
1.911926
2.325153
1.5
5.75
0.43
5.65
0.01
2
1.477281
1.833293
1.406634
2.277118
1.563066
1.61373
2
1.3
1.6
3.8
0.5
1.8
1.640697
2.313115
0.900516
1.306511
1.6181
1.904378
2
1.2
2.3
1.8
0.5
1.8
0.944398
2.117512
1.005152
2.534064
1.15113
1.954144
2
0.45
2.15
0.65
0.4
1.2
1.342938
2.332513
1.467529
1.019239
1.239475
1.407573
50
1.8
2.3
1.4
3.9
0
1
1.170272
1.657704
0.799091
1.401564
1.755979
1.654412
1.5
3.1
0.93
2.65
0.015
1.1
1.132947
1.743292
0.598261
1.309235
1.681494
1.776936
1.5
5.75
0.43
5.65
0.01
2
1.324648
1.781486
1.32801
2.060954
1.019974
1.361083
2
1.3
1.6
3.8
0.5
1.8
1.31652
1.99784
0.701776
1.157632
1.311115
1.187063
2
1.2
2.3
1.8
0.5
1.8
0.785298
1.910105
0.862128
2.282315
1.021342
1.489123
2
0.45
2.15
0.65
0.4
1.2
0.993972
2.011772
1.162289
0.873739
1.157483
1.251442
100
1.8
2.3
1.4
3.9
0
1
0.756539
1.210055
0.607076
0.992914
1.460646
1.345788
1.5
3.1
0.93
2.65
0.015
1.1
0.88934
1.199509
0.404822
1.201444
1.357685
1.206554
1.5
5.75
0.43
5.65
0.01
2
1.117483
1.529878
1.037574
1.595888
0.829085
1.121212
2
1.3
1.6
3.8
0.5
1.8
0.998634
1.752594
0.537625
0.828906
1.06102
0.939682
2
1.2
2.3
1.8
0.5
1.8
0.565906
1.546855
0.705282
2.017984
0.717713
1.153898
2
0.45
2.15
0.65
0.4
1.2
0.805249
1.722256
0.777865
0.696402
0.851498
1.018501
300
1.8
2.3
1.4
3.9
0
1
0.547782
0.852198
0.401467
0.644066
0.773053
1.007341
1.5
3.1
0.93
2.65
0.015
1.1
0.711853
0.781424
0.269156
0.788936
0.916783
0.831953
1.5
5.75
0.43
5.65
0.01
2
0.910827
1.280392
0.930746
1.260824
0.410875
0.789677
2
1.3
1.6
3.8
0.5
1.8
0.609068
1.387463
0.38542
0.5643
0.865376
0.664865
2
1.2
2.3
1.8
0.5
1.8
0.412669
1.172685
0.475687
1.818785
0.500591
0.891926
2
0.45
2.15
0.65
0.4
1.2
0.483891
1.390424
0.570112
0.455555
0.530735
0.691226
500
1.8
2.3
1.4
3.9
0
1
0.427417
0.64704
0.332296
0.435204
0.200255
0.854437
1.5
3.1
0.93
2.65
0.015
1.1
0.449441
0.434471
0.155422
0.575885
0.650976
0.650911
1.5
5.75
0.43
5.65
0.01
2
0.7656
1.07478
0.704428
0.970532
0.325475
0.618804
2
1.3
1.6
3.8
0.5
1.8
0.272669
1.19933
0.241967
0.271555
0.418878
0.422975
2
1.2
2.3
1.8
0.5
1.8
0.278361
0.995957
0.321993
1.531532
0.367942
0.542543
2
0.45
2.15
0.65
0.4
1.2
0.178667
0.82391
0.340492
0.202494
0.388572
0.487987
1000
1.8
2.3
1.4
3.9
0
1
0.354872
0.41543
0.213608
0.304466
0.101616
0.472349
1.5
3.1
0.93
2.65
0.015
1.1
0.301129
0.205473
0.081764
0.27106
0.33213
0.376744
1.5
5.75
0.43
5.65
0.01
2
0.645736
0.648055
0.658258
0.585563
0.148738
0.425584
2
1.3
1.6
3.8
0.5
1.8
0.149529
0.628928
0.113052
0.243147
0.300697
0.311164
2
1.2
2.3
1.8
0.5
1.8
0.178081
0.714394
0.16387
1.281251
0.198019
0.215385
2
0.45
2.15
0.65
0.4
1.2
0.052826
0.547684
0.102232
0.057344
0.159444
0.255718
×
Table 5 shows the actual values and the MLE parameter values (as the average values for the 5000 replications) of the different simulated SPBD data sets, and Fig. 4 shows visually their corresponding pdf’s plots.
Table 5
Actual and MLE parameters values of the simulated SPBD data sets
Data set
Value
Parameters
Variable Range
a
b
α
β
A
B
Minimum
Maximum
1
Actual
1.8
2.3
1.4
3.9
0
1
0
1.8
MLE
1.777778
2.339845
1.373519
3.877778
0.000111
1.01218
0.036285462
1.787000108
2
Actual
1.5
3.1
0.93
2.65
0.015
1.1
0.38702
1.546834
MLE
1.435333
3.117889
0.923133
2.587889
0.019188
1.211111
0.403895566
1.526273077
3
Actual
1.5
5.75
0.43
5.65
0.01
2
0.673387689
1.69217121
MLE
1.487889
5.855124
0.411889
5.444889
0.010134
2.227889
0.679167201
1.706033177
4
Actual
2
1.3
1.6
3.8
0.5
1.8
1.17346046
3.143355214
MLE
1.993333
1.443825
1.473245
3.895556
0.46999
1.933889
1.18159022
3.147485177
5
Actual
2
1.2
2.3
1.8
0.5
1.8
1.122462048
3.264052108
MLE
1.879889
1.213113
2.351245
1.778986
0.512333
1.933333
1.083200324
3.236995497
6
Actual
2
0.45
2.15
0.65
0.4
1.2
0.261047095
2.999081861
MLE
2.054789
0.467478
2.035899
0.598789
0.397999
1.198758
0.286325174
3.028201867
×
In conclusion, the simulation indicates that the MLE method is appropriate and can be used to estimate the parameters of the SPPBD models.
6 Application of Fitting SPBD Model to Real-Life Data
We consider two real-life data sets in order to show the usefulness of the proposed estimation procedure to estimate and fit the SPBD model to these real-life data sets. The data sets are;
Data Set 1
Represents the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray (the early morning and first pray of the day) in Al-Mani Jamieh Mosque (Masjid no. 942), where Friday prayers are held and it accommodates more than two thousand worshipers, in Al-Waab town in Doha-Qatar. The data consists of 4539 observations recorded in this masjid for the period from 30th October 2017 till 15th January 2020. We will abbreviate this data set by main street mosque data.
Data Set 2
Represents the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in Saeed bin Fahad Al-Dosari Mosque (Masjid no. 1031), where Friday prayers are not held and it accommodates no more than two hundred fifty worshipers, in Al-Waab town in Doha-Qatar. The data consists of 3360 observations recorded in this mosque for the period from 25th January 2015 to 20th October 2017. We will abbreviate this data set by within streets mosque data.
Table 6 presents some statistics of the observed mosque data sets.
Table 6
Some statistics of the observed mosque data sets
Statistics
Observed
Main
Within
No. of observation
4539
3360
Mean
7.0986
5.2372
Standard error of mean
0.08194
0.07554
Median
5.65258
4.4779
Mode
0.685519
0.6196016
SD
5.52032
4.37859
Variance
19.172
30.474
Skewness
0.706
1.162
Standard error of skewness
0.036
0.042
Kurtosis
− 0.378
1.161
Standard error of kurtosis
0.073
0.084
Minimum
0.07
0.08
Maximum
24.9
24.9
Percentiles
25
2.44012
1.46321
50
5.5
4.4779
75
10.64123
7.60603
Using both mosque data sets, the MLE method was employed to estimate the parameters of the SPBD model for each, and Table 7 shows the actual and the predicted frequencies, model parameters estimates, the Chi squares goodness of fit test for the SPB, the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, as well as, the likelihood ratio test (LRT) for the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions. Figure 5, illustrating the histograms and the fitted pdfs for both main and within street mosque data sets. Now, for the main street data set case, since the p values of Chi squares goodness of fit test for the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, is smaller than 0.05, and that the p value of the SPBD model equals to 0.9488, the SPBD performs better than all these distributions. Although, for the within street mosque data set, the Chi squares goodness of fit test p value of the generalized beta of the first kind distribution equals to 0.23087 inducting that this distribution can fit this data, the SPBD model perform better in this case since its p value equals to 0.96088, and since the p values of Chi squares goodness of fit test for the gamma, the exponential, and the four parameters beta, is smaller than 0.05, the SPBD performs better than all these distributions also. Next, the p values of the likelihood ratio test (LRT) for the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions, are less than 0.05, indicating statistically, that SPBD preforms better, in both main and within street data sets. These finding indicates that the SPBD outperforms the gamma, exponential, the four parameters beta, and the generalized beta of the first kind distributions and provides the best fit for both main and within mosque data sets.
Table 7
Observed and predicted frequencies, model parameters estimates and goodness of fit for mosque data sets
Data range
Main streets mosque
Within streets mosque
Observed
Predicted
Observed
Predicted
Proposed 6 parameters beta
Gamma
Exponential
4 parameters beta
Generalized Bata of the first kind
Proposed 6 parameters beta
Gamma
Exponential
4 parameters beta
Generalized Bata of the first kind
0.0–1.0
599
589
259
596
699
533
523
517
380
583
611
536
1.1–2.0
405
412
417
518
451
426
430
428
465
482
416
430
2.1–3.0
373
362
460
450
381
375
372
374
441
398
346
367
3.1–4.0
323
330
453
391
335
339
330
329
388
329
297
317
4.1–5.0
310
306
423
339
300
310
285
288
329
272
257
276
5.1–6.0
291
284
382
295
272
285
250
251
273
224
224
240
6.1–7.0
260
265
337
256
247
263
215
217
223
185
196
208
7.1–8.0
243
247
294
222
226
242
182
186
181
153
171
179
8.1–9.0
219
229
253
193
207
223
160
158
145
127
148
154
9.1–10.0
221
212
215
168
189
205
129
133
116
105
128
131
10.1–11.0
188
195
182
146
172
188
110
111
92
86
110
111
11.1–12.0
185
178
153
126
157
172
86
91
73
71
94
93
12.1–13.0
151
162
128
110
142
156
75
73
58
59
80
77
13.1–14.0
143
145
107
95
128
141
51
58
45
49
67
62
14.1–15.0
135
128
89
83
115
126
49
45
35
40
55
50
15.1–16.0
108
112
74
72
102
112
36
34
28
33
44
39
16.1–16.0
99
96
61
62
90
99
26
25
22
27
35
30
17.1–17.0
71
81
50
54
78
85
18
17
17
23
27
22
18.1–18.0
68
66
41
47
67
72
13
11
13
19
20
16
19.1–20.0
49
52
34
41
56
60
8
7
10
15
14
10
20.1–21.0
44
38
28
36
45
48
5
4
8
13
10
6
21.1–22.0
24
26
23
31
35
36
3
2
6
11
6
3
22.1–23.0
18
16
19
27
25
25
2
1
5
9
3
2
23.1–24.0
8
7
15
23
15
14
1
0
4
7
1
1
24.1–25.0
4
1
42
158
5
4
1
0
3
40
0
0
Total
4539
4539
4539
4539
4539
4539
3360
3360
3360
3360
3360
3360
Model parameters
\(\hat{a}\)
25.01238
\(\hat{\varvec{\lambda }}\) = 0.213
\(\hat{\varvec{\lambda }}\) = 0.141
1
24.97111
a
24.20246
\(\hat{\varvec{\lambda }}\) = 0.273
\(\hat{\varvec{\lambda }}\) = 0.191
1
24.511778
\(\hat{b}\)
2.023441
\(\hat{\beta }\) = 1.643
1
0.999999
b
1.159141
\(\hat{\beta }\) = 1.431
1
0.999999
\(\hat{\alpha }\)
0.381451
0.7533
0.89295
α
0.799451
0.80945
0.915556
\(\hat{\beta }\)
2.612456
2.00145
2.239999
β
3.901245
2.97825
3.31778
\(\hat{A}\)
0.000012
0
0
A
0.000132
0
0
\(\hat{B}\)
0.998912
25
1
B
1.038113
25
1
Goodness of Fit
\(\chi^{2}\)
8.7141
736.85
416.134
53.1875
48.22419
\(\chi^{2}\)
6.2113
101.441
112.324
52.4733
19.7677
df
17*
21*
23
20*
18*
df
14*
21*
23
17*
16*
p value
0.9488
0.0
0.0
0.00008
0.000141
p value
0.96088
0.0
0.0
0.000017
0.23087
Likelihood ratio test (nested)**
LRT
10.3345
8.76259
LRT
20.5047
7.61264
df
2
2
df
2
2
p value
0.0056
0.01251
p value
0.00004
0.02222
*The number of internals were adjusted in order to make the expected number of observations in each interval equal to or greater than 5, which is in tern effected the number of the degree of the freedom
**The 4 parameters beta distribution and the generalized beta of the first kind distribution are special cases of the SPBD, see Sect. 3.7 cases 1 and 4
×
7 Summary
A new six parameters beta distribution is introduced, which has a more flexible shape and a wide bounded domain than the than the two (standard) and the four parameters beta distributions, and its properties consisting of, and some of its different various shapes are given to show its flexibility. Its boundaries, limits, mode, quantities, reliability and hazard functions, Renyi entropy, Lorenz and Bonferroni curves are studied. This distribution is closed under scaling and exponentiation, and has reflection symmetry property, and has some well-known distributions as special cases, such as, the two and four parameters beta, generalized modification of the Kumaraswamy, generalized beta of the first kind, the power function, Kumaraswamy power function, Minimax, exponentiated Pareto, and the generalized uniform distributions. Its order statistics, moment generating function, with its moments consisting of the mean, variance, moments about the origin, harmonic, incomplete, probability weighted moments, and mean deviations are derived. The maximum likelihood estimation method is used for estimating its parameters and applied to estimate the parameters of six different simulated data sets of this distribution having different pdf shapes, in order to check the performance of the estimation method through the estimated parameters mean squares errors computed from different simulated sample sizes, which are shown to be decreasing as the sample size increases, indicating that the MLE method is appropriate and can be used to estimate the parameters of the SPPBD models. Finally, two real life data sets, represent the waiting period of Muslim worshipers from the time of entering the mosque till the actual time of starting Alfajir pray in two different mosques, are used in order to show the usefulness and the flexibility of this distribution in application to real-life data sets. The MLE method was employed using these data set to estimate the parameters of the SPBD, the gamma, the exponential, the four parameters beta, and the generalized beta of the first kind distributions, and the Chi squares goodness of fit test for these distributions, as well as, the LRT for the nested models of the SPB distribution, namely; the four parameters beta, and the generalized beta of the first kind distributions, were employed, and all the results through the p values of these tests, statistically, outperforms SPBDs over the other stated distributions.
Acknowledgements
Open Access funding provided by the Qatar National Library. The publication of this article was funded by the Qatar National Library.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.