main-content

## Über dieses Buch

Statistics provides tools and strategies for the analysis of data. While much has been written about the methodology, sometimes without reference to data, little has been said about the data. In this volume we present sets of data obtained from many situations without any direct reference to a particular type of analysis. Our view of the usefulness of bringing together a broad collection of sets of data has been shared by many friends and contributors. Students of statistics need to gain facility with their art by applying their knowledge to many sets of data. Textbook examples tend to be small and selected primarily to illustrate a particular technique, thus failing to demonstrate the questioning, iterative nature of statistical analysis. The situations which gave rise to the more extensive sets of data given in this volume are colourful and interesting, and can be readily understood by laymen, students and research workers with diverse interests. These sets were often chosen for their perverse reluctance to yield under the naive application of standard procedures. They do not have correct solutions. They describe situations where the statisti­ cian can develop skills and learn the limitations of statistical methods.

## Inhaltsverzeichnis

### Introduction

The purpose of statistical analysis is to extract and assess the information contained in data which has arisen in many diverse situations. Thus, the methodology of statistics has been developed to address a wide variety of problems, while the theory of statistics organizes and studies this methodology. The evolution of both the theory and the methods is influenced by new situations and new forms of measurement.

D. F. Andrews, A. M. Herzberg

### 1. Iris Data

The data given in Table 1.1, taken from Fisher (1936), are the measurements of the sepal length and width and petal length and width in centimetres of fifty plants for each of three types of iris; Iris setosa, Iris versicolor and Iris virginica. These data are commonly referred to as the “Fisher Iris Data”. Although the data were collected by Dr. Edgar Anderson, R. A. Fisher published the data on Iris setosa and Iris versicolor to demonstrate the use of discriminant functions. The Iris virginica data are used to extend Fisher’s technique and to test Randolph’s (1934) hypothesis that Iris versicolor is a polyploid hybrid of the two other species which is related to the fact that Iris setosa is a diploid species with 38 chromosomes, Iris virginica a tetraploid and Iris versicolor having 108 chromosomes is a hexaploid.

D. F. Andrews, A. M. Herzberg

### 2. Darwin’s Data on Growth Rates of Plants

Since their use by R.A. Fisher (Fisher, 1966, Chapter 3), this small set of data, chosen from more than eighty such sets discussed in Charles Darwin’s book (Darwin, 1878), has perhaps been subjected to more analysis by statisticians than any other such set. While Fisher quotes at length from Darwin’s book the amount of background detail he gives is less than proportionate to the detailed statistical discussion that has been given to the data. Darwin was interested in the question of cross-fertilization versus the self fertilization of plants. Darwin stated:l

D. F. Andrews, A. M. Herzberg

The numbers of Canadian lynx trapped in the MacKenzie River District of Northwest Canada have been used to illustrate a number of methods associated with time series analysis. Many of these methods are associated with autoregressive or periodic models. Table 3.1 presents these data for the years 1821 to 1934. Table 3.2 presents lynx pelt sales of the Hudson’s Bay Company together with the price paid for the period 1857 to 1911. These large numbers reflect trappings over a wider area of North America. Some of the data in this table appear to be in error. It is of interest whether variation in MacKenzie River District trappings is shared by all the areas contributing to Table 3.2. Is there a relation between price of pelts and size of catch? Any relation would affect the interpretation of an autoregressive model. Note that there may be a delay between the trapping and sale of a pelt.

D. F. Andrews, A. M. Herzberg

### 4. The Number of Deaths by Horsekicks in the Prussian Army

The number of men killed by horsekicks in the Prussian Army introduces the Poisson distribution to many students. The illustration was initially given by Bortkewitsch (1898). Winsor (1947) has reproduced the original tables with some discussion. Bishop, Fienberg and Holland (1975) also discuss the data.

D. F. Andrews, A. M. Herzberg

### 5. The Yields of Wheat on the Broadbalk Fields at Rothamsted

In 1919, Sir Ronald A. Fisher went to Rothamsted Experimental Station. He was hired to bring “modern statistical methods” to the task of analyzing a large amount of data collected on agricultural field trials over many years. Data had been collected on the Broadbalk field since 1844. The first experimental crop had been sown in the autumn of 1843 and harvested in 1844. Each year since, wheat has been sown and harvested on all or part of the field. The data for the yields of grain and straw for the “classical” period 1852–1925 are given in Table 5.1. From 1843–1851, many plot-treatments were varied from year to year; after 1925, parts of all plots were fallowed each year. The ten-year yields were given by Garner and Dyke (1969). The annual yields given in Table 5.1 were kindly supplied by G.V. Dyke and the late J.H.A. Dunwoody. Fisher (1921) discussed and analyzed the yield of grain for these data and Fisher (1924) discussed the influence of the rainfall. Figure 5.1 shows a recent plan of the Broadbalk field. Exhibit 5.1 gives the organic manures and inorganic fertilizers that were applied in various combinations to the plots.

D. F. Andrews, A. M. Herzberg

### 6. Uniformity Trials: Variation and Correlation in the Yield of Wheat

There has been much interest in examining the effects of possibly correlated errors among plots in agricultural field trials. Many uniformity trials have been conducted to investigate this aspect. In these trials, every plot receives the same treatment. Two well-known experiments of this kind are those of Mercer and Hall (1911) and Wiebe (1935). Their data have been used extensively in other investigations; see, for example Barbacki and Fisher (1936), Besag (1974) and Wilkinson, Eckert, Hancock and Mayo (1983). Further, Fairfield Smith (1938) discussed these and other uniformity trials and determined an empirical relationship between variance of yield as a function of plot size of field experiments and plot to plot correlation. Papadakis (1937), Bartlett (1938, 1978, 1981) and Wilkinson et al. (1983) discuss methods for adjusting the analysis of field experiments by using the yields of neighbouring plots.

D. F. Andrews, A. M. Herzberg

### 7. Stanford Heart Transplant Data

The Stanford Heart Transplantation Program began in October 1967. Patients are admitted to the program after review by a committee, and then they wait for donor hearts to become available. While waiting, some may die or be transferred out of the program, but most receive a transplant. The cut-off date for the data presented in Table 7.1 was in February 1980, and by that time 184 patients had received a transplant.

D. F. Andrews, A. M. Herzberg

### 8. Coal-Mining Disasters

The data on intervals between coal-mining disasters in Britain involving 10 or more men killed (Maguire, Pearson and Wynn, 1952) has been used by a number of authors to illustrate various techniques that can be applied to point processes; see, for example, Barnard (1953), Cox and Lewis (1966) and Boneva, Kendall and Stefanov (1971). See also Maguire, Pearson and Wynn (1953).

D. F. Andrews, A. M. Herzberg

### 9. Recoiling of Guns

The Lord Brouncker was commanded by The Royal Society of London to make some experiments on the recoiling of guns. A gun was fixed to a triangular frame which could then be fastened at one or more places (Fig. 9.1).

D. F. Andrews, A. M. Herzberg

### 10. Old Maps

This problem is concerned with the classification of old maps of the Great Lakes area. The data are taken from the eleven maps listed in Table 10.2. These maps are believed to be representative of the period of time commencing with the widespread knowledge that five major lakes existed in the interior of North America, and ending when relatively large scale hydrographic surveys of the lakes’ shorelines were being done.

D. F. Andrews, A. M. Herzberg

### 11. Monthly Mean Sunspot Numbers

Daily relative sunspot numbers are based upon counts of spots and group entities of spots on the sun’s surface at some time each day. Wolf devised a relative number with the intent of reducing the spot counts of different observers and telescopes to a common basis. The relative number is k(f + 10g) where, for a given day, g is the number of groups, irrespective of the number of component spots, f is the total number of component spots which can be counted in these groups and may range from 1 to 50 or more in the case of complex groups, and k is a scale factor depending on the estimated efficiency of the observer and his telescope. The daily sunspot numbers are evaluated on the basis of more than fifty observing stations around the world and related to the observations of a reference station (which changed in 1982). The series provided, Table 11.1, is that of the simple mean of the daily values for each month.

D. F. Andrews, A. M. Herzberg

### 12. Ozone Column

The data given in Table 12.1 are monthly mean thickness in Dobson units, one milli-centimetre ozone at standard temperature and pressure, of the ozone column at Arosa, Switzerland. They show unusual seasonal behavour in that the variance as well as the mean varies seasonally, and not in a way that can be removed by a transformation.

D. F. Andrews, A. M. Herzberg

### 13. Downwind Effects from the Arizona Cloud Seeding Experiments

Randomized silver-iodide seeding of summer convective clouds was carried out over the Santa Catalina Mountains in southern Arizona in 2 programs: the first from 1957 through 1960, and the second in 1961, 1962 and 1964. Experimental days were taken in pairs; the decision to seed or not to seed was made after a day was considered as seedable. The second day of the pair was the opposite of the first. Each 2-day set was considered independently. If the second and third days were not considered seedable, the set was omitted. Seedability depended primarily on the available moisture in the morning. Seeding was from aircraft upwind from the Santa Catalinas.

D. F. Andrews, A. M. Herzberg

### 14. Half-Hourly Precipitation and Streamflow, River Hirnant, Wales, U.K., November and December, 1972

The Hirnant subcatchment of the River Dee has an area of 33.9 km2 and is situated west of Bala Lake in North Wales. Impervious rocks, providing very little storage for rainfall, combine with steep slopes to give a fast streamflow response to rainfall. The river is gauged at Plas Rhiwaedog at a natural river section, to provide discrete time streamflow data at half hourly intervals. The precipitation data consists of estimates of a real rainfall derived from six recording rain-gauges situated in and around the catchment. The present set of data, Table 14.1, is part of several years’ data on the Hirnant and other subcatchments of the Dee collected and available at the Welsh Water Authority and at the Institute of Hydrology, Wallingford, U.K. The hydrolic control is a natural rock outcrop and is subject to minor changes from year to year. A real-time flow forecasting system has been operational on the River Dee since 1975 as part of extensive water supply and flood control schemes for the catchment.

D. F. Andrews, A. M. Herzberg

### 15. The Rainfall at Adelaide, South Australia

The rainfall at Adelaide is about 1530 mm each year, with about 70–80% of this falling in the winter months between April and October. The data used by Dr. E.A. Cornish consisted of 61 six-day totals for each year, with December 31 of the previous year being included in non-leap years. With the help of the Bureau of Meteorology, the data have been extended to cover the period 1839 to 1977 inclusive. The data are given in Table 15.1.

D. F. Andrews, A. M. Herzberg

### 16. Particle Size Distribution of Soil Profiles

The experiment to determine particle size distribution of soil profiles was conducted at the West Side Field Station of the University of California, located in Fresno County, forty miles southwest of Fresno. Fresno County is in the southernmost quarter of the central valley of California, which is an elongated trough paralleling the eastern and western boundaries of the state. The valley is 500 miles long in a north-south direction and averages about 40 miles in width. The valley is surrounded by mountains except for the outlet into San Francisco Bay through which the valley rivers drain.

D. F. Andrews, A. M. Herzberg

### 17. Identifying Groundwater Populations

The U.S. Department of Energy is directing a National Uranium Resource Evaluation Program in which an estimate of the U.S. uranium reserves will be obtained. A fundamental problem is identifying the populations in a group of samples so that background samples may be separated from samples anomalous in uranium and other elements. These samples represent a typical group of 127 groundwater samples, each having 12 measurements. The data are given in Table 17.1. An initial classification of the populations is given by the Groundwater Producing Horizon Code.

D. F. Andrews, A. M. Herzberg

### 18. Soil Data from the Province of Murcia, Spain

The objective of this study was to compare and to classify soil mapping units on the basis of their lateral variabilities for a single property, or for several properties. The area investigated consisted of a sequence of eight contiguous sites extending over the crest and flank of a low rise in a valley plain underlain by marl near Albudeite in the province of Murcia, Spain; see Figure 18.1. This relatively simple situation was considered appropriate for this study’s objective. The geomorphological sites were the primary mapping units adopted and were small areas of ground surface of uniform shape internally and delimited by relative discontinuities externally. Following the delimitation of the sites, as described in Wright and Wilson (1979), soil samples were obtained in each site at 11 random points within a 10m × 10m area centred on the mid-point of the site. All samples were taken from the same depth, chosen after due consideration of various factors affecting the area and detailed by Wright and Wilson (1979). The soil properties considered appropriate for the study’s objective were the silt content and the clay content, expressed as percentages of the total silt, clay and sand content. The data are given in Table 18.1.

D. F. Andrews, A. M. Herzberg

### 19. Lamoka Lake Site Determinations

A radiocarbon age (RC) determination is usually presented by the radiocarbon dating laboratory in the form A± E, where A is the estimate of the radiocarbon age, bp, and E is the standard deviation due to counting error. The counting error is taken to be normally distributed. If one is making inferences in real time from different samples then it is necessary to take into account the conversion of conventional radiocarbon dates (5568 half-life) denoted bp, to calendar or tree-ring dates, denoted BP. However, if one has two or more radiocarbon ages by from the same sample then to check their agreement with each other it is not necessary to consider this conversion. Such considerations are often ignored in the evaluation of a series of dates. For example, Long and Rippeteau (1974) considered the 8 dates presented in Table 19.1. This gives us seven dates, one for each of the seven different samples. Thus one can check whether these estimates may be dating the same event in real time, after taking into account the calibration error due to the need to convert from years by to BP. Ward and Wilson (1978) have re-analyzed these dates, under the assumption that the calibration error, in the years bp, is normally distributed and independent of the counting error. They found no statistical evidence to doubt the consistency of C-288 and M-26, and therefore combined these two dates. After analyzing the seven samples, and making the assumption concerning the calibration error, they found some borderline evidence that C-367 may be aberrant.

D. F. Andrews, A. M. Herzberg

### 20. Variations in the Earth’s Rotation Rate, 1820–1970

Variations in the rate of rotation of the earth have been measured for many years with increasing precision as new instruments became available. The quality of the primary sources of the observations used to compile these data varies considerably over the 151 year period from 1820–1970. Luo, Liang, Ye, Yan and Li (1977) take their data from Brouwer (1952) and in order to produce a plot of homogeneous appearance, different sets of smoothing have been applied over the periods 1820–1922 and 1923–1956. The data are reproduced in Table 20.1.

D. F. Andrews, A. M. Herzberg

### 21. Association of Pairs of Radio Sources with Peculiar Galaxies

Arp (1967) argued that there exists a definite association between radio sources and peculiar galaxies of his earlier paper (Arp, 1966). He advocated that for two radio sources lying relatively close to one another, and often possessing similar radio brightness or flux strength, there exists a peculiar galaxy, approximately equidistant from each and unusually close to the line joining the pair, forming an angle of approximately 180°. This is true for particular peculiar galaxies, but can be supported as a general theory only by randomly selecting a sample area of the sky and studying the objects therein. The area was centred on the celestial equator between right ascension (“longitude”) 10h and 16h, and between declination (“latitude”) -16° and +16°, thus avoiding any curvature effect of the lines of right ascension. All peculiar galaxies and radio sources within the boundaries were enumerated and the distances, r l , from each galaxy to the nearest source, and r 2 , to the second nearest source, are recorded in degrees in Table 21.1. The radio brightnesses, s 1 and s2, of the closest and second closest radio sources, respectively, are also given. Object numbers are denoted by N, ranging from 1 to 33 for peculiar galaxies, and from 34 to 56 for radio sources.

D. F. Andrews, A. M. Herzberg

### 22. Motions of Stars in a Star Cluster, M92

Observational tests of theories of stellar evolution are typically based on star clusters, where many stars of essentially equal ages and at approximately the same distance from the earth can be studied and compared. In such studies, however, one must remove the contamination of foreground and background stars which are unrelated to the cluster. This is done well via stellar motions, since cluster members should move together whereas the non-members should show a wide variety of motions. The stars of the globular cluster, M92, are among the oldest stars in our Galaxy. The globular cluster M92, also referred to as NGC 6341, located at 17 h 15 m 6 + 43 12’ (1950) is one of the most metal-poor objects in our galaxy. As such it has been the subject of numerous investigations.

D. F. Andrews, A. M. Herzberg

### 23. Motions and Distances of Planetary Nebulae

Planetary nebulae are symmetrical clouds of gas surrounding very hot stars. They probably represent a late stage in the evolution of many stars like the sun. The distances of these objects from us, and hence their intrinsic luminosities, are rather poorly estimated. Cudworth measured the angular motions of these objects across the sky and adopted other data from Perek and Kohoutek (1967). Application of the technique known in astronomy as statistical parallax, essentially the method of least squares, yielded distances for the nebulae. Table 23.1 presents the angular motions, μα and μδ, together with their mean errors and the sources for the proper motions of each nebula. Table 23.1 also lists whether the nebula is optically thick, TK, or thin, TN, the extinction parameter C, a measure of distance d and the class obtained from Greig (1971,1972).

D. F. Andrews, A. M. Herzberg

### 24. Quality Control Data in Clinical Chemistry

Every two weeks, a specimen from a large homogeneous pool of serum is sent out to a large number of laboratories who perform up to 15 separate analyses. The data consist of ten sets of results from some 400 – 500 laboratories; not all laboratories perform all analyses, and not every laboratory provided ten reports. Different laboratories used different methods for a given analysis. These methods have been grouped together into method-groups, there being 4 to 12 method-groups for a particular analysis. A laboratory did not necessarily stick to a particular method-group for a given analysis over all ten occasions. The data consist of ten sets of 500 records giving the laboratory number, the fifteen results and their method-groups, not all different, for each laboratory. Tables 24.1 and 24.2 present the data for the first two analyses by 100 laboratories for each of the ten occasions.

D. F. Andrews, A. M. Herzberg

### 25. Calcium Assay

The data relate to a chemical assay of calcium discussed in Brown, Healy and Kearns (1981). A set of standard solutions is prepared and these and the unknowns are read on a spectrophotometer in arbitrary units. A straight-line response curve is fitted to the standards and the values of the unknowns are read off from this. The preparation of the standard and unknown solutions involves a fair amount of laboratory manipulation, and the actual concentrations of the standards may differ slightly from their target values, the very precise instrumentation being capable of detecting this. The target values are 2.0, 2.0, 2.5, 3.0, 3.0 mmol. per litre; the ‘duplicates’ are made up independently. The sequence of reading the standards and unknowns is repeated four times. Two specimens of each unknown are included in each assay, and the four sequences of readings are done twice, first with the flame conditions in the instrument optimized and then with a slightly weaker flame.

D. F. Andrews, A. M. Herzberg

### 26. Determination of Dunham Coefficients for the Ground State of D2

The spectrum of any atomic system is determined by its energy levels. For a diatomic molecule in a given electronic state these energy levels may be expressed by (1)$$G\left( v \right) + {F_{v}}\left( J \right) = \sum\limits_{{ik}} {{Y_{{ik}}}{{\left( {v + 1/2} \right)}^{i}}{J^{k}}{{\left( {J + 1} \right)}^{k}},}$$ where G is the vibrational energy, v = 0,1,2,… the vibrational quantum number, F v (J) the rotational enery in the v, and J = 0,1,2,… is the rotational quantum number (i,k = 0,1,…). For the example of the D2, deuterium, molecule the observed energies G(v)+F v (J) are listed in Table 26.1.

D. F. Andrews, A. M. Herzberg

### 27. Clock Intercomparisons

These data on clock intercomparisons give the time difference between secondary cesium clocks in 0.01 microseconds. Tables 27.1 to 27.3 present the data. Each line in the tables represents the data for one week; no observations were taken on week-ends. The last two columns indicate on what days the clock output jumped and by what amount. All successive data should be corrected for this. These jumps are caused by spurious pulses in divider chains and are, therefore, always integral amounts of 0.2 microseconds. Missing data during a weekday indicate that no measurements were taken because of failure in equipment.

D. F. Andrews, A. M. Herzberg

### 28. An Evaluation of Cryogenic Flow Meters

Liquid nitrogen, liquid oxygen and other liquefied gases are sold widely. At the time of the present experiment there were no satisfactory meters available for measuring the flow of such fluids. Also there was no national reference standard available by which the meters that were used could be evaluated. The United States National Bureau of Standards had designed and built a cryogenic flow test facility but its stability and accuracy were not known. An experiment was designed to test the accuracy of the meters and the flow velocity. Thus an important difference of this experiment from many is that it was necessary to not only evaluate the object being measured but also the measuring device. The new facility was a complex apparatus with a variety of components each of which might be subject to some error. It was necessary to evaluate accuracy as well as precision.

D. F. Andrews, A. M. Herzberg

### 29. Stress-Rupture Life of Kevlar 49/Epoxy Spherical Pressure Vessels

A study of the lifetimes of Kevlar 49/epoxy spherical pressure vessels that are subjected to a constant sustained pressure until vessel failure, commonly known as static fatigue or stress-rupture, has been made. The NASA space shuttle uses Kevlar/epoxy spherical pressure vessels in a sustained pressure mode throughout the usage life of the vessel, and several commercial applications, such as fire-fighters’ air-breathing apparatus, are also subject to this service condition. The study was done to generate baseline data on vessel life under pressure and to predict vessel life and design reliability.

D. F. Andrews, A. M. Herzberg

### 30. A Chemical Reaction

Consider a chemical reaction of the type A + B → C followed by A + C → D in which two reactants A and B formed a mixture of C and D. The object was to obtain the maximum for C subject to the condition that the yield of D should not exceed 20% since more than this amount would cause difficulty in purification. The quantity of B used was kept constant throughout, the factors varied being temperature, T, in degrees Centigrade, the percent concentration of A, and the time in hours of the reaction.

D. F. Andrews, A. M. Herzberg

### 31. Product Preferences

D. F. Andrews, A. M. Herzberg

### 32. Incidence of Malignant Melanoma After Peaks of Sunspot Activity

The aetiology of melanoma is complex and may include the influences of trauma, heredity and hormonal activity (Lee, 1975). In particular, exposure to solar radiation may be involved in the pathogenesis of melanoma. Melanoma is more common in fair-skinned individuals (Lancaster and Nelson, 1957) and most frequent in skin sites exposed to the sun (Davis, Herron and McLeod, 1966). In white populations melanoma is more common in areas closer to the equator where the intensity of solar radiation is higher (Elwood, Lee, Walter, Mo and Green, 1974). Data from various parts of the world suggest that the incidence of melanoma is increasing (Burbank, 1971; Lee and Carter, 1970; Houghton, Flannery and Viola, 1980).

D. F. Andrews, A. M. Herzberg

### 33. Supplemental Ascorbate, Vitamin C, in the Supportive Treatment of Cancer

A study was made of the survival times of 100 terminal cancer patients who were given supplemental ascorbate, Vitamin C, as part of their routine management and 1000 matched controls, similar patients who had received the same treatment except for the ascorbate. The object of the investigation was to determine whether supplemental ascorbate prolongs the survival times of patients with terminal human cancer.

D. F. Andrews, A. M. Herzberg

### 34. Incidence of Byssinosis: A Cross-Sectional Occupational Health Study

In 1973 a large cotton textile company made a study to investigate the prevalence of byssinosis, a form of pneumoconiosis to which workers exposed to cotton dust are subject. It was desired to determine the extent to which byssinosis is explained by such variables as sex, race, length of employment, smoking habit and dustiness of work place.

D. F. Andrews, A. M. Herzberg

### 35. Inter and Intra Individual Variation of Blood Glucose Levels

The most commonly used diagnostic aid for early diabetes mellitus is the oral glucose tolerance test. The test, however, is subject to considerable variation due to differences in individual rates of gastrointestinal absorption of the glucose challenge. This problem is accentuated in the pregnant woman; yet pregnancy is the ideal period during which to identify the potentially diabetic female. The additional variation during pregnancy has been considered by some, see Burt (1962), to be reason to exclude use of the test.

D. F. Andrews, A. M. Herzberg

### 36. Chemical and Overt Diabetes

Reaven and Miller (1979) examined the relationship between chemical subclinical and overt nonketotic diabetes in 145 non-obese adult subjects.

D. F. Andrews, A. M. Herzberg

### 37. The Maternal Age Distribution of Patients with Down’s Syndrome

Several authors have observed a bimodality in the age distribution of mothers of patients with Down’s Syndrome suggesting that the distribution is the mixture of two different distributions with different etiologies.

D. F. Andrews, A. M. Herzberg

### 38. Procedures for the Detection of Muscular Dystrophy Carriers

Duchenne Muscular Dystrophy, DMD, is a genetically transmitted disease, passed from a mother to her children. Affected female offspring usually suffer no apparent symptoms and may unknowingly carry the disease. Male offspring with the disease die at a young age. Not all cases of the disease come from an affected mother. A fraction, perhaps one third, of the cases arise spontaneously, to be genetically-transmitted by an affected female. This is the most widely held view at present. The incidence of DMD is about 1 in 10,000 male births. The population risk that a woman is a DMD carrier is about 1 in 3,300.

D. F. Andrews, A. M. Herzberg

### 39. Lengths of Remissions for Children with Acute Leukemia

Prior to the publication of Freireich et al (1963), the existence of an effective therapy for acute leukemia had helped to hamper the evaluation of new and potentially more effective therapeutic agents. The data, given in Table 39.1, on preferences for 6-mercaptopurine (6-MP) in 21 pairs of patients, established the usefulness of this drug in prolonging the duration of complete remissions in childhood acute leukemia. The design of study involved a paired comparison of 6-MP versus placebo treatment, both administered in randomized and blinded fashion, the pairing being based on the institution at which the patient was treated and completeness of remission, complete or partial. The observation recorded for each patient was length of remission and the analysis was based on preferences among the pairs of patients, as determined by the difference in the lengths of remission between patients in each pair. The study was a sequential one designed according to an Armitage restricted sequential procedure (Armitage, 1957) and the study was stopped after 21 pairs of patients, 18 preferences favouring 6-MP and 3 favouring the placebo.

D. F. Andrews, A. M. Herzberg

### 40. Testis Tumours in Japan

The death rate from malignant tumours of the testis in Japan has risen from 1.53 per million per year in 1947–49 to 3.81 in 1966–70. This rise has been most marked in young adults and children, where the death rate is now greater than in the U.S. white population. Fatal testicular tumours in Japanese boys occur at a younger age than in U.S. white boys. The increase in the Japanese mortality rate cannot be associated with particular years in the total period studied, but rather appears to be related to increased lifetime risks with successively later years of birth.

D. F. Andrews, A. M. Herzberg

### 41. Consecutive Measurements of Plasma Citrate Concentrations

In order to study the variation of plasma citrate concentrations during the day, an experiment including ten subjects was performed.

D. F. Andrews, A. M. Herzberg

### 42. Time to Death and Type of Death in Mice Receiving Various Doses of Red Dye No. 40

A lifetime feeding experiment involving 400 mice was undertaken in 1976 to assess the carcinogenicity of FD&C Red No. 40, Red 40, a colour additive widely used in foods in the U.S. For each sex, fifty mice were allocated to each of four groups, a control group and three dose level groups of Red 40.

D. F. Andrews, A. M. Herzberg

### 43. Cariogenic Effects of Diet

One hundred and twenty rats were randomly assigned to one of eight diets to see if Treatments A and B would reduce the cariogenic effects of Diet 2.

D. F. Andrews, A. M. Herzberg

### 44. Physical Characteristics of Urines With and Without Crystals

The 79 urine specimens, given in Table 44.1, were analyzed in an effort to determine if certain physical characteristics of the urine might be ralated to the formation of calcium oxalate crystals.

D. F. Andrews, A. M. Herzberg

### 45. Multiple Tumour Recurrence Data for Patients with Bladder Cancer

These data were obtained in a randomized clinical trial conducted by the Veterans Administration Co-operative Urological Research Group (VACURG). All patients had superficial bladder tumours when they entered the trial. These tumours were removed transurethrally and patients were assigned randomly to one of three treatments: placebo pills, pyridoxine (vitamin B6) pills, or periodic instillation of a chemotherapeutic agent, thiotepa, into the bladder. The rationale for the latter two treatments is given in Byar, Blackard and VACURG (1977). At subsequent follow-up visits any tumours noticed were removed and the treatment was continued. The goal of the analysis should be to determine the effect of treatment on the frequency of tumour recurrence. For the purpose of this analysis, the word recurrence will refer to a visit at which one or more tumours are found in the bladder regardless of whether these are thought to be recurrences or new tumours. This term is not to be confused with the number of tumours present at any single visit because the tumours are often multiple. The data, Table 45.1, consist of the number of recurrences experienced by each of 118 patients, the number of tumours present initially at the time of randomization in the trial and the diameter of the largest of these, the months from the beginning of the study until each recurrence, the number of tumours present at each recurrence, and the diameter of the largest of these. An analysis of perhaps secondary importance would compare treatments with respect to numbers and size of tumours.

D. F. Andrews, A. M. Herzberg

### 46. Prognostic Variables for Survival in a Randomized Comparison of Treatments for Prostatic Cancer

These data were obtained from a randomized clinical trial comparing four treatments for patients with prostatic cancer in Stages 3 and 4. Stage 3 represents local extension of the disease without evidence of distant metastasis and Stage 4 represents distant metastasis as evidenced by elevated acid phosphatase, x-ray evidence, or both. The trial was double-blinded and the treatments were placebo pill, 0.2 mg diethylstilbestrol, DES, 1.0 mg of DES, or 5.0 mg of DES, all drugs administered daily by mouth. Patients were not required to remain indefinitely on their assigned treatment, but were allowed to have their treatment changed at the discretion of the physician if signs of tumour progression or symptoms appeared. This provision was required for ethical reasons. Patients were followed according to a standard protocol at 6 month intervals or more frequently if required.

D. F. Andrews, A. M. Herzberg

### 47. Visual Pattern Recognition

One of the major problems in the understanding of visual perception concerns how spatially structured stimuli are visually encoded and processed by the nervous system. Over the last decade two main theories of form perception and pattern recognition have emerged. The one supposes that spatially structured stimuli are represented internally by the visual system as unstructured, approximately point-for-point ‘images’ of the stimulus. Judgements on the similarity of stimuli are made by subjecting these internal pattern representations to families of internal transformations that serve to bring the representations as closely as possible into coincidence. The amount of overlap between the transformed representations determines the judged similarity of the patterns. The other theory supposes that spatially structured stimuli are internally represented in an essentially symbolic fashion, listing local features of the stimulus pattern, for example, blobs, edges and corners, and the spatial relations that exist between these local features, for example, left of, joined to and far from. Judgements on the similarity of stimuli are achieved by an internal matching of their symbolic descriptions; see, for example, Barlow, Narasimhan and Rosenfeld (1972), Sutherland (1973) and Kahn and Foster (1981).

D. F. Andrews, A. M. Herzberg

In a study of the effects of environment on two species of mammals, a large body of data was recorded at Princeton University in the late 1960’s under the direction of Dr. C.S. Pittendrich. The data consist of temperature and activity recordings taken at either two or five minute intervals for up to 150 days. The animals were isolated in cages and were subjected to a variety of controlled photoperiods.

D. F. Andrews, A. M. Herzberg

### 49. Number of Species in the Galápagos Islands

The Galápagos Islands are a territory of the Republic of Ecuador. And though they are officially listed by that government as El Archipélago de Colón, most local inhabitants and many of the Ecuadorian nationals who visit the islands persist in calling them Las Islas de los Galápagos, the Islands of the Tortoises. Elsewhere in the world, of course, the name Galápagos has long been current.

D. F. Andrews, A. M. Herzberg

### 50. The Spatio-Temporal Spread of Fox Rabies

A retrospective study was conducted on the features of a wild life epizootic that spread into a study area in South Germany during 1963, spread southeast during the next few years and started to move out of the region in 1971.

D. F. Andrews, A. M. Herzberg

### 51. Measuring the Avoidance of Super-Parasitism: Are Balls Scattered Non-Randomly into Boxes?

Various species of parasites which reproduce by laying eggs in hosts in which at most one egg per host can develop to the adult stage, exhibit avoidance of super-parasitism, that is they tend to avoid laying more than one egg in a host, thereby reducing egg wastage. Laboratory experiments can be devised (i) to demonstrate that the phenomenon is indeed a real one, and, when so, (ii) to measure the extent of avoidance. While the former question is akin to showing that eggs, balls, are not distributed randomly in the hosts, boxes, the latter question requires some modelling of the behaviour of the ovipository events in the experiment.

D. F. Andrews, A. M. Herzberg

### 52. Effect of Chemicals on Earthworm Populations

In an experiment to investigate the effects of several chemicals on earthworm populations, a five by eight array of two metre square plots with one metre buffers was laid out in a field of winter wheat according to the systematic layout given in Fig. 52.1. The four treatments were: A: water only (control); B: 0.5 kg/ha, Benlate; C: 0.6 kg/ha, Bevistin; D: 1.4 kg/ha, Cercobin; all diluted to 1000 l/ha for application. Samples were collected on three occasions during the period April to September 1977. Temporal variation in the earthworm population over this period is known to be considerable but data-collection for each sample was completed within two days. The first sample was collected prior to the application of any treatments. The sampling technique was to apply an irritant solution of formaldehyde, which causes worms to rise to the surface, to 50 cm square sub-plots as indicated in Fig. 52.2.

D. F. Andrews, A. M. Herzberg

### 53. The Classification of Three Historical Specimens of Grey Kangaroos

In 1803 a French research vessel captured 19 live specimens of Macropus fuliginosus from Kangaroo Island. Despite a long and arduous voyage, and a diet consisting sometimes of rum and damper, some live specimens reached France, including the type specimen still held at Paris. During the nineteenth century, only three preserved specimens of fuliginosus were extant in Europe, and a great deal of taxonomic confusion arose between fuliginosus and a large male Tasmanian specimen of the eastern grey kangaroo, Macropus giganteus, held at the British Museum.

D. F. Andrews, A. M. Herzberg

### 54. Social Grooming in North American River Otters

Social grooming, i.e. one animal grooming another, is a common interaction among many group-living animals. The behaviour is often regarded, rather uncritically, as the “social cement” of animal groups. Although thought to play an important role in the bonding of group members, social grooming has been the subject of few detailed studies. The data presented in Table 54.1, on social grooming in North American river otters, Lutra canadensis, were taken from a larger study of the species’ social behaviour and were obtained from observations of five groups of captive otters. All animals within a group were observed simultaneously.

D. F. Andrews, A. M. Herzberg

### 55. Species Composition in a Complex of Woodlands

The Northcliffe/Heaton complex of woodlands lies two miles northwest of Bradford city centre in two deep ice-cut valleys which run approximately west to east and join at their eastern extremities. They form a continuous, roughly semi-circular arc of woodland approximately three kilometers in length.

D. F. Andrews, A. M. Herzberg

### 56. Distribution Patterns of Plant Species

Cain and Evans (1952) mapped in detail an old-field grasslands community in southeastern Michigan, plotting the occurrence of three plant species: Lespedeza capitata, Liatris aspera and Solidago rigida. From these, Evans (1952) prepared quadrat coverages of 16, 8, 4, 2, 1, 112, 114, 1/8 and 1116 square metres, recording the frequencies with which each of the species appeared in the quadrats. For Solidago rigida, golden rod, the frequency distributions for the three largest quadrat sizes are given in Table 56.1.

D. F. Andrews, A. M. Herzberg

### 57. The Garrison Bay Project, Stock Assessment and Dynamics of the Littleneck Clam, Protothaca staminea

Garrison Bay is a small bay in Washington State, U.S.A. The marine fauna is diversified, with especially large numbers of soft-substrate benthic organisms such as polychaetes and bivalves (Scherba and Gallucci, 1976). One of the popular recreational activities in the bay is clam digging. During the past five years this harvest has been monitored and data collected on the species harvested, the total weight of each digger’s harvest, the size distribution of each species and the time needed to dig each catch. In addition, a periodic survey by stratified random sampling determines the abundance and size distributions of the unharvested standing stock of each species. These latter sampling data are presented for the littleneck clam, Protothaca staminea, the species most commonly taken.

D. F. Andrews, A. M. Herzberg

### 58. Maize Fertilizer Experiments on the Islands of St. Vincent and Antigua

A series of experiments was designed to study the effect of concentration of three components of fertilizer on the growth of maize. All experiments contained 36 plots in four blocks of nine plots each. Half of the blocks contained the treatments 000 022 202 220 111 (twice) 113 131 311 and the other half 002 020 200 222 111(twice) 113 131 311 where the three digits represent levels of nitrogen, phosphorus and potassium, respectively, in the fertilizer applications. The treatments 000, 002, 020, 022, 200, 202, 220, 222 and 111 were included to search the usual range of fertilizer applications for an optimum; 113, 131 and 311 were added to provide clues in case the optimum lay outside the expected range (Springer, 1972). If this design is regarded as an incomplete block design, there is no great difficulty obtaining a reasonable partition of the treatment sum of squares.

D. F. Andrews, A. M. Herzberg

### 59. Disorder and Mineral Content in Apples

Apple fruits are subjected to chemical analysis for their mineral element content and to examination for disorders such as bitter pit, breakdown, scald and fungal rots. The relationships among bitter pit incidence, calcium deficiency and mean fruit weight per tree is illustrated by using data obtained on Jonathan apples from potted trees.

D. F. Andrews, A. M. Herzberg

### 60. A Classical Apple Experiment

A commercial apple tree consists of two parts grafted together. The upper, the scion, determines the main characteristics of the fruit and leaves, while the lower, the root-stock, largely determines the size and development of the tree. At the beginning of the century it was generally accepted that a root-stock propagated asexually, for example by cuttings or from a stool-bed, gave a dwarf tree, whereas one propagated sexually, that is from seed, gave a large tree.

D. F. Andrews, A. M. Herzberg

### 61. Mastitis Control by Penicillin and Novobiocin

In a study of the treatment of mastitis, cows from sixteen Southwestern Virginia Holstein dairy farms were used, including both milking parlour and stanchion barn herds. These herds also reflected a range of subjective management scores of poor, fair, good and excellent, as judged by the technicians who visited the farms on a weekly basis. Cows were assigned sequentially as they were identified, rather than randomly, to the treatments.

G. G. Koch

### 62. United Kingdom Pig Production 1967 – 1978

A description of the state of the pig production is given by five indicators, measured quarterly over the twelve year period, 1967–1978. An approximate three-year cycle is evident in much of the data, and interest lies partly in explaining this by models which represent the interaction between the series. The quality of the data leaves something to be desired, being based on sampling schemes which change slightly over the twelve years, but is typical of what could reasonably be expected in many econometric situations.

D. F. Andrews, A. M. Herzberg

### 63. Comparison of Family Sizes: A Problem from the 1941 Canadian Census

Most of the numerous studies of differential fertility are based either on census or vital registration records which are substantially complete or on large portions of a population which are considered as samples. Before the nature of sampling error was understood it seemed dangerous to base conclusions on a small number of cases, but this need no longer be the case. Small samples may make possible investigations which avoid the limitations of census tables in which only a few variables can be cross-tabulated. In any serious attempt to study the effect of a particular variable, freed from the effects of other relevant variables, at least ten directions of simultaneous cross-classification are required rather than the three or four which are the maximum usually given in a census.

D. F. Andrews, A. M. Herzberg

### 64. Canadian Unemployment Data 1956 – 1975

The problem of shifts in seasonal behaviour is one which turns up in many time-series, but particularly in the statistics of unemployment. Standard procedures for the seasonal adjustment of economic time-series assume that the amplitude of the seasonal variation either varies in proportion as the level of the series changes (multiplicative seasonality) or is independent of the level (additive seasonality). Economic theory provides little guidance, and some preliminary analysis of the data is usually necessary.

D. F. Andrews, A. M. Herzberg

### 65. United States of America Unemployment Data, 1948–1981

Statistics on the labour force status of the United States civilian noninstitutional population are derived from a monthly sample survey of 60,000 households, the Current Population Survey, conducted by the Bureau of the Census for the Bureau of Labor Statistics. Respondents are interviewed to obtain information on the labor force status of each member of the household, 16 years of age and over. The inquiry relates to activity or status during the calendar week, which includes the 12th of the month.

D. F. Andrews, A. M. Herzberg

### 66. Interorganizational Resource Links in Towertown, U.S.A.

As part of a more elaborate study of social organization, sociologists from the University of Chicago gathered information on the formal organizations in a midwestern U.S. community of 32,000 persons, referred to here by its pseudonym, Towertown. This city lies 60 miles from a large metropolitan centre, and contains a large state university with an enrollment of almost 24,000 students. A total of 109 formal organizations were identified; for more details, see Galaskiewicz and Marsden (1978) and Galaskiewicz (1979). These organizations included all manufacturing firms having more than 20 employees, banks, savings and loans, law firms, business associations, service clubs, labour unions, city offices and departments, political organizations, mass media organizations, health institutions, public welfare institutions, educational institutions and churches.

D. F. Andrews, A. M. Herzberg

### 67. Insurance Availability in Chicago

In a study of insurance availability in Chicago, the U.S. Commission on Civil Rights attempted to examine charges by several community organizations that insurance companies were redlining their neighbourhoods, i.e. cancelling policies, refusing to insure or to renew, etc. Data were obtained from a variety of sources. First, the Illinois Department of Insurance provided the number of cancellations, nonrenewals, new policies, and renewals of homeowners and residential fire insurance policies by ZIP code for the months of December 1977 through February 1978. The companies that provided this information to the Department account for more than 70 percent of the homeowners insurance policies written in the city of Chicago. The Department also supplied the number of FAIR Plan policies written and renewed in Chicago, by ZIP code, for the months of December 1977 through May 1978. Since most FAIR Plan policyholders secure such coverage only after they have been rejected by the voluntary market, rather than as a result of a preference for that type of insurance, the distribution of FAIR Plan policies is another measure of insurance availability in the voluntary market.

D. F. Andrews, A. M. Herzberg

### 68. Factors Influencing Motor Insurance Rates

In most countries, motor car insurance is obligatory. The problems concerning this type of insurance are the same, although the technical solution to these problems may vary. The data given in Table 68.1 present Swedish third party motor insurance for 1977 for one of seven geographical zones.

D. F. Andrews, A. M. Herzberg

### 69. Disputed Authorship: The Federalist Papers

The Federalist papers were written between 1787–1788 by Alexander Hamilton, John Jay and James Madison to persuade the citizens of the State of New York to ratify the American Constitution. As was common in those days, these short essays, about 900–3500 words in length, appeared in newspapers signed with a pseudonym, in this instance, ’Publius’. Seventy-seven essays first appeared in several different newspapers, and then Hamilton wrote an additional eight essays designed to complete the job.

D. F. Andrews, A. M. Herzberg

### 70. Platonic Prose Rhythm

Prose rhythm may be characterized by the occurrence of five-syllable sequences in passages of text. This characterization may be used to assess the similarity among passages. The data presented in Table 70.1 come from thirty-three passages representing ten Platonic texts. Each passage was divided into sentences, lists of words ending with a colon, question mark or period. Syllables within each sentence were classified as long or short. Each sequence of five syllables was identified as being one of the thirty-two possible groupings of five long or short syllables. Thus each sentence of length N greater than 5 contributed N-4 such identifications. Table 70.1 gives the percent of each of the thirty-two five-syllable sequences for each of ten books: Timaeus, Critias, Laws, Republic, Phaedrus, Symposium, Sophistes, Philedus, Seventh Epistle and Politics.

D. F. Andrews, A. M. Herzberg

### 71. Relationships between Birthday and Deathday

In Phillips (1972), an investigation is made of the relationship between date of birth and date of death in a sample of famous Americans. One conclusion is that famous people are less likely to die in the month preceding their birth month than at any other time. The data in Tables 71.1 and 71.2 allow comparisons to be made among various types of famous people and ordinary people.

D. F. Andrews, A. M. Herzberg

### Backmatter

Weitere Informationen