Skip to main content

2003 | Buch

Statistics of Earth Science Data

Their Distribution in Time, Space, and Orientation

verfasst von: Professor Graham Borradaile

Verlag: Springer Berlin Heidelberg

insite
SUCHEN

Über dieses Buch

The Goals of Data Collection and Its Statistical Treatment in the Earth Sciences The earth sciences are characterised by loose and complex relationships between variables, and the necessity to understand the geographical dis­ tribution of observations as well as their frequency distribution. Our fre­ quency distributions and the looseness of relationships reflect the com­ plexity and intrinsic natural variation in nature, more than measurement error. Furthermore, earth scientists cannot design experiments according to statistical recommendation because the availability and complexity of data are beyond our control. Usually, the system we are studying cannot be isolated into discrete independent variables. These factors influence the first steps of research, how and where to collect specimens or observations. Some issues are particularly troublesome and common in earth science, but are rarely handled in an undergraduate statistics course. These include spatial-sampling methods, orientation data, regionalised variables, time se­ ries, identification of cyclicity and pattern, discrimination, multivariate systems, lurking variables and constant-sum data. It is remarkable that most earth-science students confront these issues without formal training or focused consideration.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Spatial Sampling
Abstract
It is self-explanatory that whatever we sample must be representative of the object of study. However, for statistical purposes the simple random sample is essential and its desirability may at first seem like an obscure and unnecessary display of erudition. For example, when sampling soils, we might be able to randomly choose sites from almost any part of the study area. Unfortunately, differences in topography, land use, history of land use, microclimate, the current weather and time of day may all bias the selection of specimens drawn from the population.
Graham Borradaile
Chapter 2. Central Tendency and Dispersion
Abstract
The most commonly required characterisation of any sample size of measurements is a statistic representing a value that usually occurs commonly, and in the centre part of the range of observations. The arithmetic mean is most important, but other important central values will be described. Conversely, the second most useful statistic describes the degree to which observations are scattered. Variance,the square of standard deviation, is the most powerful descriptor of dispersion. The following summary explains these concepts in earth-science contexts. Descriptive measures like mean and variance may be calculated for any sample of measurements, without any theoretical knowledge of the population from which the sample is drawn. However, the nature of the populations may affect their usefulness. In this chapter, through Chapter 4, it is important to note that we deal with observations that are scalars, measurements encapsulated in a single quantity. Quantities like 2.1 m, 4 kg, and dimensionless values like 0.03 and 7% are scalars.
Graham Borradaile
Chapter 3. Theoretical Distributions: Binomial, Poisson and Normal Distributions
Abstract
Although observations of natural processes and phenomena in the earth sciences may combine many complex and poorly understood factors, it is remarkable that their frequency distribution may closely follow one of a few theoretical models. Generally, a theoretical distribution may be useful as an idealisation or approximation for interpolation and for comparisons. More specifically a theoretical model provides equations from which useful statistics such as mean, variance and confidence estimates can be calculated. The theoretical probability distribution also permits statistical hypotheses to be tested.
Graham Borradaile
Chapter 4. Statistical Inference: Estimation and Hypothesis Tests
Abstract
Confidence intervals around the mean are of great practical importance as they convey the importance of our results and usually require few assumptions about the nature of the population. Moreover, these simple statistics are readily visualised on diagrams where they may appear as error-bars or confidence ellipses around the mean value (e.g. Figs. 4.2, 7.3, 9.9, 10.8a, b, 10.12). We are reminded that “in general, though not always, estimation is more important than significance testing” (Chatfield 1978, p. 165). Thus, we first address the concept of confidence limits. This is followed by the subject of hypothesis testing which permits us to qualify a decision at some level of confidence. For that we must know the form of the probability distribution from which the sample was drawn. Hereafter, it is important that we deal only with a simple random sample. This means that each observation is selected without bias and has no influence on the selection of any other observation. Those precautions require specific knowledge of the subject being studied.
Graham Borradaile
Chapter 5. Comparing Frequency-Distribution Curves
Abstract
In this chapter, we consider the form of simple frequency distributions and the means by which they may be compared. In most earth-science applications, our observations will not correspond closely to the Normal distribution, nor to any other theoretical distribution, but that does not necessarily prevent description and characterisation with a few statistics. However, sometimes a direct comparison with the Normal distribution may be helpful and, in a few instances, it may be important, for example, to satisfy the requirements for ANOVA (Chap. 4). Therefore, this chapter commences with methods that compare observations to a Normal distribution. The discussion commences with simple arithmetical and graphic methods and then continues with the process of transforming observations by standardisation to facilitate direct comparison with the Normal distribution. The cumulative version of the frequency distribution sometimes expedites a cursory visual comparison of samples, and cumulative frequency distributions of grain size may reveal underlying processes.
Graham Borradaile
Chapter 6. Regression: Linear, Curvilinear and Multilinear
Abstract
Previously, we mainly dealt with univariate statistics, for observations that constitute a single measurement. They were presented in frequency-distribution graphs. The next step is to understand systems where each observation comprises measurements of two different variables, x and y using approaches of bivariate statistics. Bivariate data are presented on an x-y graph, sometimes referred to as a scatter plot. In the earth sciences, some x-y plots are so unstructured that “scatter” may be a deserved description but it usually does not carry a pejorative connotation. Some useful information can be obtained from scatter plots by contouring the density of points or by simple visual inspection. However, even quite complex geological systems sometimes show x-y data organised along a line or curve. The latter may be described by a mathematical relationship that may be used for predictive purposes. The simplest relationship is a straight line and the (x, y) relationship is then said to be linear. The straight line is defined by an equation of the form, y = m x + c where m is the slope and c is the intercept of the line on the y-axis (Fig. 6.1 a). Even in carefully controlled laboratory experiments there may be some departure of the data points from a line, and observations of natural phenomena may include more noise,or spurious variation. We can reduce the influence of this uncertainty on our interpretation of the scatter plot by calculating a regression line that provides the best predictions and description. The line regressing y on x is most useful, in which the response, y, contains all the uncertainty due to a well-defined control, x.
Graham Borradaile
Chapter 7. Correlation and Comparison of Variables
Abstract
Correlation concerns techniques that are mostly used to assess the association or degree of dependence between the dependent response variable (y) and the independent or control variable (x). This uses the concept of the regression line from the previous chapter, but it is rarely a mathematical model of underlying theoretical relationship or causal association. The value of the correlation coefficient and its significance qualify the strength of the association. This may be illustrated by confidence limits on the slope and intercept of the regression line, and confidence regions about the regression line that will be discussed below.
Graham Borradaile
Chapter 8. Sequences, Cycles and Time Series
Abstract
Most earth-science data are univariate; single values suffice to describe the observations. Univariate statistics dominate introductory courses in statistics, and simple statistics determined from frequency distributions permit their characterisation (Chaps. 2–4). The next increment of complexity was introduced by bivariate data, where each observation required an x-value and y-value for its description (Chaps. 6, 7).
Graham Borradaile
Chapter 9. Circular Orientation Data
Abstract
Useful earth-science data occur in the form of orientations of lines that constitute an orientation distribution. Examples include fault trends, paleocurrent directions, and wind directions. Orientation data introduce some new challenges for presentation and for characterisation by statistics. Like constant-sum data, discussed in Chapter 7, they do not have an infinite range. Worse, the range is circular (0° to 360°, or 0 to 2π) so that the concept of an outlier, or a large variance, must be viewed cautiously. The most complete account of circular-orientation data is by Fisher (1993) although the concepts seem to have been first introduced formally into earth sciences by Cheeney (1983) and Till (1974), from the specialist monograph by Mardia (1972), subsequently expanded by Mardia and Jupp (2000).
Graham Borradaile
Chapter 10. Spherical-Orientation Data
Abstract
The previous chapter introduced the study of orientations that were constrained to lie in a plane, hence their shorthand name circular data. In nature, all orientations exist in three-dimensional space and the circular-orientation data that we collect in earth science are simply special cases whose angular distribution is controlled by the orientation of some planar feature in which they lie. For example, current directions on a bedding plane or the trends of vertical joints on a horizontal map projection may be represented without any important loss of information as lines on a plane. However, many geological and geophysical orientation data require specification in three dimensions. Therefore, each orientation requires three pieces of information to define its orientation. These may be the direction cosines of an axis or the x, y, z coordinates of the endpoint of a unit vector, or for that matter of a true vector.
Graham Borradaile
Chapter 11. Spherical Orientation Data:Tensors
Abstract
This text first introduced us to the study of univariate observations, those documented by a single magnitude value. Fortunately, in their initial training, geologists mostly deal with measurements that are univariate and scalar, such as rock density or a chemical abundance. Our discussions progressed to bivariate observations, described by two scalar values (x-y), and then to multivariate observations, each of which required three or more values for its description. From Chapter 9 onward, complexity was incremented with a new dimension, quite literally. Orientation became an issue in sampling. Although it is disregarded in almost all introductory statistics courses, even the novice earth scientist must become adept at managing orientation data, usually without formal study. Our introduction to orientation data concerned axes,directions and unit vectors. Orientation distribution replaced our introductory obsession with frequency distributions along a line (Chaps. 2–8). Although true vectors possess an associated magnitude, the arguments mostly concern the orientation distribution.
Graham Borradaile
Chapter 12. Appendix
Abstract
If we measure a with an error of observation Δa, and b with an error of observation Δb, what will be the effect of these errors on a quantity Q, where Q is some simple expression of the observed values,for example Q= a + b? This topic is described as the confounding or propagation of errors and is discussed in many textbooks concerning applied statistics and experimental measurements (e. g. Topping 1965). First, let us consider how errors propagate through simple arithmetic operations that involve values with errors of observation or measurement. Note that the fractional error in a is given by f= Δa/a. Subsequently, more general situations will be mentioned, including the manner in which variances of samples of observations influence the variance of some derived quantity.
Graham Borradaile
Chapter 13. References
Graham Borradaile
Backmatter
Metadaten
Titel
Statistics of Earth Science Data
verfasst von
Professor Graham Borradaile
Copyright-Jahr
2003
Verlag
Springer Berlin Heidelberg
Electronic ISBN
978-3-662-05223-5
Print ISBN
978-3-642-07815-6
DOI
https://doi.org/10.1007/978-3-662-05223-5