Skip to main content
main-content

Über dieses Buch

This book gives a description of the group of statistical distributions that have ample application to studies in statistics and probability. Understanding statistical distributions is fundamental for researchers in almost all disciplines. The informed researcher will select the statistical distribution that best fits the data in the study at hand. Some of the distributions are well known to the general researcher and are in use in a wide variety of ways. Other useful distributions are less understood and are not in common use. The book describes when and how to apply each of the distributions in research studies, with a goal to identify the distribution that best applies to the study. The distributions are for continuous, discrete, and bivariate random variables. In most studies, the parameter values are not known a priori, and sample data is needed to estimate parameter values. In other scenarios, no sample data is available, and the researcher seeks some insight that allows the estimate of the parameter values to be gained.

This handbook of statistical distributions provides a working knowledge of applying common and uncommon statistical distributions in research studies. These nineteen distributions are: continuous uniform, exponential, Erlang, gamma, beta, Weibull, normal, lognormal, left-truncated normal, right-truncated normal, triangular, discrete uniform, binomial, geometric, Pascal, Poisson, hyper-geometric, bivariate normal, and bivariate lognormal. Some are from continuous data and others are from discrete and bivariate data. This group of statistical distributions has ample application to studies in statistics and probability and practical use in real situations. Additionally, this book explains computing the cumulative probability of each distribution and estimating the parameter values either with sample data or without sample data. Examples are provided throughout to guide the reader.

Accuracy in choosing and applying statistical distributions is particularly imperative for anyone who does statistical and probability analysis, including management scientists, market researchers, engineers, mathematicians, physicists, chemists, economists, social science researchers, and students in many disciplines.

Inhaltsverzeichnis

Frontmatter

Chapter 1. Statistical Concepts

Abstract
A statistical distribution is a mathematical function that defines how outcomes of an experimental trial occur randomly in a probable way. The outcomes are called random variables, and their admissible region lies in a specified sample space that is associated with each individual distribution. The statistical distributions are mostly of two type: continuous and discrete. The continuous probability distributions apply when the random variable can fall anywhere between two limits, such as the amount of rain-water that accumulates in a five-gallon container after a rainfall. The discrete probability distribution pertains when the outcomes of the experiment are specific values, like the number of dots that appear on a roll of two dice. The distributions may also be classified as univariate or multivariate. The univariate is when the distribution has only one random variable; multivariate is when two or more random variables are associated with the distribution. The statistical distributions in this book pertain to the commonly used univariate continuous and discrete probability distributions, and to the most frequently applied bivariate continuous statistical distributions, where bivariate distributions have two jointly related random variables.
Nick T. Thomopoulos

Chapter 2. Continuous Uniform

Abstract
The continuous uniform distribution applies when the random variable can fall anywhere equally likely between two limits. The distribution is often called when an analyst does not have definitive information on the range and shape of the random variable. For example, management may estimate the time to finish a project is equally likely between 50 and 60 h. A baseball is hit for a homerun and the officials estimate the ball traveled somewhere between 410 and 430 feet. The amount of official snowfall at a location on a wintry day is predicted between 1 and 5 inches. The chapter lists the probability density, cumulative distribution, mean, variance and standard deviation of the random variable. Also described is the α-percent-point of x that identifies the value of x where the cumulative probability is α. When the parameter limit values are not known, and sample data is available, estimates of the parameter values are obtained. Two estimates are described, one by way of the maximum-likelihood method, and the other by the method-of-moments. When sample data is not available, experts are called to obtain the estimates. Often both of the limits are unknown and estimates on both are needed. Sometimes only the low limit is known, and on other occasions, only the upper limit is unknown. The way to estimate the parameter values is described for all three scenarios.
Nick T. Thomopoulos

Chapter 3. Exponential

Abstract
The exponential distribution peaks when the random variable is zero and gradually decreases as the variable value increases. The distribution has one parameter and has easy computations on the probability density and cumulative distribution. The distribution has a memory-less property where the probability of the next event occurring in a fixed interval is the same no matter the start time of the interval. This distribution is fundamental in queuing theory since it is used as the variable for the time between arrivals to a system, and also the time of service. The distribution also applies in studying reliability where it is assigned as the time to fail for an item. When the parameter value is not known, sample data is used to obtain an estimate, and when no sample data is available, an approximate measure on the distribution allows the analyst to estimate the parameter value.
Nick T. Thomopoulos

Chapter 4. Erlang

Abstract
The origin of the Erlang distribution is attributed to Agner Erlang, a Danish engineer for the Copenhagen Telephone Company, who in 1908 was seeking how many circuits are needed to accommodate the voice traffic on their telephone system. The distribution has two parameters, k, θ, where k represents the number of exponential variables that are summed to form the Erlang variable. The exponential variables have the same parameter, θ, as the Erlang. The Erlang has shapes that range from exponential to normal. This distribution is heavily used in the study of queuing systems, representing the time between arrivals, and also the time to service a unit. The chapter shows how to compute the cumulative probability, F(x), where x is the random variable. When the parameter values are not known, and sample data is available, measures from the data is used to estimate the parameter values. When the parameter values are not known, and no sample data is available, approximates of some measures from the distribution, usually by an expert, are obtained and these allow estimates of the parameter values. The Erlang is a special case of the gamma distribution, as will be seen in Chap. 5, where the parameter k is a positive integer for the Erlang, and is any positive number for the gamma.
Nick T. Thomopoulos

Chapter 5. Gamma

Abstract
Karl Pearson, a famous British Professor, introduced the gamma distribution in 1895. The distribution, originally called the Pearson type III distribution, was renamed in the 1930s to the gamma distribution. The gamma distribution has many shapes ranging from an exponential-like to a normal-like. The distribution has two parameters, k and θ, where both are larger than zero. When k is a positive integer, the distribution is the same as the Erlang. Also, when k is less or equal to one, the mode is zero and the distribution is exponential-like; and when k is larger than one, the mode is greater than zero. As k increases, the shape is like a normal distribution. There is no closed form solution to compute the cumulative probability, but quantitative methods have been developed and are available. Another method is developed in this chapter and applies when k ranges from 1 to 9. When sample data is available, estimates of the parameter values are obtained. When no sample data is available, estimates of the parameter values are obtained using approximations on some distribution measures.
Nick T. Thomopoulos

Chapter 6. Beta

Abstract
The beta distribution, introduced in 1895 by Karl Pearson a famous British mathematician, was originally called the Pearson type 1 distribution. The name was changed in the 1940s to the beta distribution. Thomas Bayes also applied the distribution in 1763 as a posterior distribution to the parameter of the Bernoulli distribution. The beta distribution has many shapes that range from exponential, reverse exponential, right triangular, left triangular, skew right, skew left, normal and bathtub. The only fault is that it is a bit difficult to apply to real applications. The beta has two main parameters, k1 and k2 that are both larger than zero; and two location parameters a and b that define the limits of the admissible range. When the parameters are both larger than one, the beta variable is skewed to the left or to the right. These are the most used shapes of the distribution, and for this reason, the chapter concerns mainly these shapes. The random variable is denoted here as w where (a ≤ w ≤ b). A related variable, called the standard beta, x, has a range of 0–1. The mathematical properties of the probability density are defined with the standard beta. This includes the beta function and the gamma function, which are not easy to calculate. An algorithm is listed and needs to be calculated via a computed. There is no closed form solution to the cumulative probability, and thereby, a quantitative method is shown in the chapter examples. A straightforward process is used to convert from w to x and from x to w. When sample data is available, the sample average and mode are needed to estimate the parameter values of k1 and k2. A regression fit is developed to estimate the average of x from the mode of x. When sample data is not available, best estimates of the limits (a, b) and the most-likely value of w are obtained to estimate the parameter values.
Nick T. Thomopoulos

Chapter 7. Weibull

Abstract
The Weibull distribution was formally introduced by Waloddi Weibull, a Swedish mathematician in 1939. The distribution was earlier used by a Frenchman, Maurice Frechet in 1927, and applied by R. Rosin and E. Rammler in 1933. The Weibull distribution has shapes that range from exponential-like to normal-like, and the random variable, w, takes on values of γ or larger. A related distribution, the standard Weibull with variable, x, has values of zero or larger. Both distributions have the same parameters (k1, k2) and these form the shape of the distribution. When k1 ≤ 1, the mode of the standard Weibull is zero and the shape is exponential-like; when k1 > 1, the mode is larger than zero, and when k1 is 3 or larger, the shape is normal–like. The mathematical equations for the probability density and the cumulative probability are shown and are easy to compute. However, the calculation of the mean and variance of the distribution are not so easy to compute and require use of the gamma function. Methods to estimate the parameters, γ, k1, k2, are described when sample data is available. When no data is available, and an expert type person provides approximations of some measure of the distribution, methods are shown how to estimate the parameter values.
Nick T. Thomopoulos

Chapter 8. Normal

Abstract
In 1809, Carl Friedrich Gauss introduced the method of least squares, the maximum likelihood estimator method and the normal distribution, which is often referred as the Gaussian distribution. The normal distribution is the most commonly used distribution in all disciplines. Tne normal has a random variable x with two parameters, μ is the mean, and σ is the standard deviation. A related distribution is the standard normal with random variable z whose mean is zero and standard deviation is one. An easy way to convert from x to z and also from z to x is shown. Tables on the standard normal distribution are available in almost all statistics books. There is no closed-form solution to the cumulative probability, denoted as F(z), and thereby various quantitative methods have been developed over the years. This chapter provides the Hasting’s approximation formula to find F(z) from z: and also another Hasting’s approximation formula to find z from F(z). When sample data is available, the sample average and sample standard deviation are used to estimate the mean and standard deviation of the normal. When sample data is not available, a method is shown on how to estimate the mean and standard deviation of the normal from some approximate measures of the distribution.
Nick T. Thomopoulos

Chapter 9. Lognormal

Abstract
Two British mathematicians, Francis Galton and Donald McAlister, introduced the lognormal distribution in 1879. The lognormal distribution is sometimes referred as the Galton distribution. The lognormal variable begins at zero, its density peaks soon after and thereafter tails down to higher x values. The variable x is lognormal distributed when another variable, y, formed by the logarithm of x, becomes normally distributed. The probability density of x is listed in the chapter, while the associated cumulative distribution function, F(x), is not since there is no closed–form solution. A method to compute the cumulative probability of any x is provided. When sample data is available, the measures from the sample are used to estimate the parameters of the lognormal. In the event no sample data is available and estimates of the lognormal variable are needed, two methods are described on how to compute the estimates. The lognormal distribution is not as popularly known as the normal, but applies easily in research studies of all kinds. It has applications in many disciplines, such as weather, engineering and economics.
Nick T. Thomopoulos

Chapter 10. Left Truncated Normal

Abstract
In an earlier book [Thomopoulos (1980) p 318–324], the author shows how to use the left-truncated normal distribution to applications in inventory control.
Nick T. Thomopoulos

Chapter 11. Right Truncated Normal

Abstract
In 2001, Arvid Johnson and Nick Thomopoulos generated tables on the right-truncated normal distribution. The right-truncated normal (RTN) takes on a variety of shapes from normal to exponential-like. The distribution has one parameter k where the range includes all values of the standard normal that is less than a value of z = k. In this way, the distribution has the shape of the standard normal on the left and is truncated on the right. The variable is denoted as t and t is zero or negative throughout. With k specified, the following statistics are computed: the mean, standard deviation, coefficient-of-variation, and 0.01% and 0.99% points of t. The spread ratio of the RTN is also computed for each parameter value of k. A table is generated that lists all these statistics for values of k ranging from −3.0 to +3.0. When sample data is available, the analyst computes the following statistics: sample average, standard deviation, min and max. From these, the estimate of the spread ratio is computed, and this estimate is compared to the table values to locate the parameter k that has the closest value of θ. With k estimated for the sample data, the analyst identifies the RTN distribution that best fits the data. From here, the high-limit δ can be estimated, and also any α-percent-point on x that may be needed. The spread-ratio test sometimes indicates the sample data is best fit by the normal distribution, and on other situations by the RTN.
Nick T. Thomopoulos

Chapter 12. Triangular

Abstract
When an analyst is in need of a continuous distribution in a project and has little information on the shape, the often practice is the employ the triangular distribution. The analyst seeks estimates on the min, max and most-likely values of the variable and from these forms the distribution. The probability density goes up linearly from the min to the mode, and linearly down to the max value. Another related distribution is the standard triangular that ranges from zero to one. This latter distribution in easier to manage in a mathematical sense, and is used to define the probability distribution, cumulative probability, mean, variance and standard deviation. The conversion of all statistics and probabilities from the standard triangular to the triangular distribution is readily obtained.
Nick T. Thomopoulos

Chapter 13. Discrete Uniform

Abstract
The discrete uniform distribution occurs when the random variable x can take on any integer value from a to b with equal probability. A church raffle of 1000 numbers (1–1000) is such a system where the winning number, x, is equally likely to be any of the numbers in the admissible range. Often the parameters (a,b) are known apriori to an analyst who is seeking to apply the distribution. Other times the parameters are not known and sample data is provided to estimate the parameters. Still on other occasions when the parameters are not known and no sample data is available, the analyst seeks advise from an expert who provides some approximations on the range, and this information is used to estimate the parameter values.
Nick T. Thomopoulos

Chapter 14. Binomial

Abstract
Some historians give the first use of the binomial distribution to Jakob Bernoulli who was a prominent Swiss mathematician in the 1600s. The binomial distribution applies when a number of trials of an experiment is run and only two outcomes are noted on each trial, success and failure, and the probability of the outcomes remain the same over all of the trials. This happens, for example, when a roll of two dice is run five times and the success per run is when the number of dots is two, say. The probability of a success per run remains constant, the number of trials in five, and the probability of a success per trial is p = 1/36. The random variable, denoted as x, for this scenario is the number of successes in the five trials, and this could be: x = 0, 1, 2, 3, 4, 5. The chapter lists the probability distribution of the random variable x. The mean, variance, standard deviation and mode of x is also given. When p is not known, it can be estimated using sample data, and even when no sample data is provided, an estimate of p can be obtained.
Nick T. Thomopoulos

Chapter 15. Geometric

Abstract
The geometric distribution applies when an experiment is run repeatedly until a successful outcome occurs, and the probability of a success is the same for all trials. The random variable could be defined as the number of fails till the first success, and has a range of integers zero and above. The random variable could also be labeled as the number of trials till the first success and the range is integers one and above. Both scenarios are described in the chapter. This situation occurs, for example, when a process produces units that need to meet acceptable engineering standards, and the process is repeated until an acceptable unit is produced. When the probability of a successful outcome is not known, sample data can be used to estimate the probability. Sometimes, no sample data is available, and a person of experience offers an approximation on the distribution and this data is used to estimate the probability of a successful outcome. The chapter also describes how the geometric distribution is the only discrete distribution that has a memory less property.
Nick T. Thomopoulos

Chapter 16. Pascal

Abstract
Blaise Pascal, a prominent French mathematician of the 1600s, was the first to formulate the Pascal distribution. The distribution is also often referred as the negative binomial distribution. When an experiment is run whose outcome could be a success or a failure with probabilities of p and (1 − p), respectively, and the analyst is seeking k successes of the experiment, the random variable is the minimum number of fails that occur to achieve the goal of k successes. This distribution is called the Pascal distribution. Some analysts working with the Pascal are interested when the random variable is the minimum number of trials to achieve the k successes. An example is when a production facility needs to produce k successful units for a customer order and the probability of a successful unit is less than one. The number of fails till the k successful units becomes the random variable. The chapter describes how to measure the probabilities for each situation. When the probability of a success per trial is not known, sample data may be used to estimate the probability. On other occasions, no sample data is available and an approximation on the distribution is used to estimate the probability.
Nick T. Thomopoulos

Chapter 17. Poisson

Abstract
The Poisson distribution is named after Simmeon Poisson, a leading French mathematician during the early 1800s. The distribution applies when the random variable is discrete and represents the number of events that occur in a unit-of-scale, such as unit-or-time or unit-of-area. The rate of events occurring is constant and the number of events are the integers of zero and larger. This distribution is used heavily in the study of queuing systems, and also in statistical process control. In queuing, the random variable pertains to the number of arriving units to a system in a unit-of-time; and also pertains to the number of departing units in a unit-of-time for a continuously busy service facility. In statistical process control, the random variable pertains to the number of defectives occurring in a unit of product. The distribution is mostly described in one unit-of-scale, but easily extends to multiple units-of-scale. This distribution is often used in place of the normal when the expected number of events is relatively low.
Nick T. Thomopoulos

Chapter 18. Hyper Geometric

Abstract
The hyper geometric distribution applies when a population of size N has D marked items, and a sample of n items taken without replacement yields x marked items. This differs from the binomial distribution where the population size is infinite and the samples are taken with replacement. The hyper geometric applies often in quality applications when a lot of N items with D defectives (quantity unknown) and a sample of n without replacement is taken. The sample result of x defectives allow the management to gauge the quality of the lot.
Nick T. Thomopoulos

Chapter 19. Bivariate Normal

Abstract
Over the years a great many scholars have contributed to the literature concerning the bivariate normal distribution. In 1998, Montira Jantaravareerat and N. Thomopoulos describe a way to estimate the cumulative probability of the distribution. In this chapter, a new method is shown on computing the joint cumulative probability. The bivariate normal has two variables, x1, x2, that are jointly related, and has five parameters, μ1, μ2, σ1, σ2, ρ. The marginal distributions are normally distributed, and when the value of one of the variables is known, the distribution on the other is also normally distributed. The variables are converted to a new set, z1, z2, that are jointly related by the bivariate standard normal distribution. The latter two variables are easier to apply mathematically in the computations. An approximation method is developed here to compute the joint probability of the two variables. Table values are listed and examples are presented to demonstrate the application.
Nick T. Thomopoulos

Chapter 20. Bivariate Lognormal

Abstract
The author in [Thomopoulos and Longinow (1984). p 3045–3049] showed how to compute the cumulative probability for the bivariate lognormal distribution in a structural engineering reliability problem. The bivariate lognormal distribution with variables x1, x2 appears at first to be difficult to maneuver, but by taking the natural log of each of the two variables, the bivariate normal distribution emerges and this distribution is easier to handle. The five parameters of the bivariate normal distribution become the parameters to the bivariate lognormal distribution. The chapter shows how to convert the parameters from the bivariate lognormal to the bivariate normal and vice versa. Shown also is how to compute the correlation for the bivariate lognormal pair, and how to compute the joint probability of any pair (x1, x2). An example is given to aid the reader in the methodology.
Nick T. Thomopoulos

Backmatter

Weitere Informationen

Premium Partner

    Bildnachweise