Skip to main content

2014 | Buch

Statistical Methods for Ranking Data

insite
SUCHEN

Über dieses Buch

This book introduces advanced undergraduate, graduate students and practitioners to statistical methods for ranking data. An important aspect of nonparametric statistics is oriented towards the use of ranking data. Rank correlation is defined through the notion of distance functions and the notion of compatibility is introduced to deal with incomplete data. Ranking data are also modeled using a variety of modern tools such as CART, MCMC, EM algorithm and factor analysis.

This book deals with statistical methods used for analyzing such data and provides a novel and unifying approach for hypotheses testing. The techniques described in the book are illustrated with examples and the statistical software is provided on the authors’ website.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction
Abstract
This book was motivated by a desire to make available in a single volume many of the results on ranking methods developed by the authors and their collaborators that have appeared in the literature over a period of several years. In many instances, the presentations have a geometric flavor to them. As well there is a concerted effort to introduce real applications in order to exhibit the wide scope of ranking methods.
Mayer Alvo, Philip L. H. Yu
Chapter 2. Exploratory Analysis of Ranking Data
Abstract
Descriptive statistics present an overall picture of ranking data. Not only do they provide a summary of the ranking data, but they are also often suggestive of the appropriate direction to analyze the data. Therefore, it is suggested that researchers consider descriptive analysis prior to any sophisticated data analysis.
Mayer Alvo, Philip L. H. Yu
Chapter 3. Correlation Correlation Analysis of Paired Ranking Data
Abstract
A ranking represents the order of preference one has with respect to a set of t objects. If we label the objects by the integers 1 to t, a ranking can then be thought of as a permutation of the integers \((1,2,\ldots,t)\). We may denote such a permutation by μ = (μ(1), μ(2), , μ(t))′ which may also be conceptualized as a point in t-dimensional space. It is natural to measure the spread between two individual permutations μ, ν by means of a distance function.
Mayer Alvo, Philip L. H. Yu
Chapter 4. Testing for Randomness, Agreement, and Interaction
Abstract
Suppose that n judges are asked to rank t contestants in accordance with some predetermined criterion. One immediate question that comes to mind is: are the judges ranking the contestants by selecting a ranking at random or is there some specific pattern for their choices? Placing this problem in a geometric setting, we may represent each ranking as a point in a t-dimensional space. If indeed the judges act in accordance with some specific nonrandom manner, the points would tend to cluster close together in one or more groups. Intuitively then, a test of randomness could be based on the average pairwise distance between points with large values of that statistic displaying evidence of the random pattern of the points.
Mayer Alvo, Philip L. H. Yu
Chapter 5. Block Designs
Abstract
In the previous chapter, we were concerned with the study of complete randomized block designs. In biological studies involving animals, however, it is not always possible to compare several treatments within litters since the size of the litter will be a function of the particular species used. In such cases, it is then necessary to consider various types of incomplete experimental designs. The methodology presented here rests on the concept of compatibility and the extended notion of distance between rankings. This approach provides a natural extension of the well-known Friedman and Durbin statistics to some partially balanced incomplete designs. The tests developed are also applicable to general block designs with ties and multiple observations per cell.
Mayer Alvo, Philip L. H. Yu
Chapter 6. General Theory of Hypothesis Testing General theory of hypothesis testing
Abstract
The notion of distance was fruitfully utilized in previous chapters in order to develop tests of hypotheses for both complete and incomplete rankings. In this chapter we consider a more general framework for constructing tests of hypotheses. We begin by defining two sets of rankings: one set consists of all the rankings which are most in agreement with the observed ranking while the second set contains all the rankings which are most in agreement with the alternative hypothesis. A distance function is then defined between those two sets of rankings. The notion of distance between sets is well known in mathematics and is often taken to be the minimum distance between pairs of elements, one from each set. In the present statistical context however, the more workable definition of distance is chosen to be the average of all pairwise distances between pairs of rankings, one from each set.
Mayer Alvo, Philip L. H. Yu
Chapter 7. Testing for Ordered Alternatives
Abstract
In this chapter, we shall consider a randomized block experiment given by the model
$$\displaystyle{X_{ij} = b_{i} +\tau _{j} + e_{ij,}\ i = 1,..,n,j = 1,\ldots,t.}$$
where b i is the ith block effect, τ j is the jth treatment effect, and the e ij are independent identically distributed error terms having a continuous distribution. We wish to test the hypothesis of no treatment effect
$$\displaystyle{H_{0}:\tau _{1} =\tau _{2} =\ldots =\tau _{t}}$$
against the ordered alternative
$$\displaystyle{H_{1}:\tau _{1} \leq \tau _{2} \leq \ldots \leq \tau _{t}}$$
with at least one inequality strict. This problem has been considered in the literature by Page (1963) and by Jonckheere (1954) who proposed different test statistics in the case where there is complete data in each block.
Mayer Alvo, Philip L. H. Yu
Chapter 8. Probability Models for Ranking Data
Abstract
Probability modeling for ranking data is an efficient way to understand people’s perception and preference on different objects. Various probability models for ranking data have been developed, particularly in the last decade where many new problems involving a large number of objects emerged. In their review paper on probability models for ranking data, Critchlow et al. (1991) broadly categorized these models into four classes: (1) order statistics models, (2) paired comparison models, (3) distance-based models, and (4) multistage models. Since their publication in 1991, variants of these models and new models have been developed. In this chapter, we will introduce these four classes of models and describe their properties.
Mayer Alvo, Philip L. H. Yu
Chapter 9. Probit Models for Ranking Data
Abstract
In 1980, the American Psychological Association (APA) conducted an election in which five candidates (A, B, C, D, and E) were running for president and voters were asked to rank order all of the candidates. Candidates A and B are research psychologists, C is a community psychologist, and D and E are clinical psychologists. Among those voters, 5738 gave complete rankings. These complete rankings are considered here (Diaconis (1988)). Note that lower rank implies more favorable. Then the average ranks received by candidates A, B, C, D, and E are 2.84, 3.16, 2.92, 3.09, and 2.99, respectively. This means that voters generally prefer candidate A the most, candidate C the second, etc. However, in order to make inferences on the preferences of the candidates, modeling of the ranking data is needed. In Sect. 9.1 we consider a model for this data which takes into account covariates.
Mayer Alvo, Philip L. H. Yu
Chapter 10. Decision Tree Models for Ranking Data
Abstract
A number of models for ranking data were introduced in Chaps. 8 and 9. However, not all of these models are designed to incorporate individual/object-specific covariates. Distance-based models discussed in Sect. 8.​3 are typical examples of ranking models that are not presently designed to incorporate covariates. As these models generally assume a homogeneous population of individuals, they always give the same predicted ranking. Order statistics models discussed in Sect. 8.​3 and Chap. 9 are typical examples of models that are able to incorporate covariates in a “linear model” form. However, there are only a few diagnostic procedures available to determine whether a satisfactory model is found. For instance, is it necessary to transform some of the covariates? Which variables or interaction terms should be included into the model?
Mayer Alvo, Philip L. H. Yu
Chapter 11. Extension of Distance-Based Models for Ranking Data
Abstract
Recall from Sect. 8.​3 that distance-based models assume that the probability of observing a ranking \(\boldsymbol{\pi }\) is inversely proportional to its distance from a modal ranking \(\boldsymbol{\pi }_{0}\). The closer to the modal ranking \(\boldsymbol{\pi }_{0}\), the more likely the ranking \(\boldsymbol{\pi }\) is observed. There are different measures of distances between two rankings as previously noted. Recently, Lee and Yu (2012) proposed new distance-based models by using weighted distance measures to allow different weights for different ranks. In this way, the properties of distance can be retained while at the same time enhancing the model flexibility.
Mayer Alvo, Philip L. H. Yu
Backmatter
Metadaten
Titel
Statistical Methods for Ranking Data
verfasst von
Mayer Alvo
Philip L.H. Yu
Copyright-Jahr
2014
Verlag
Springer New York
Electronic ISBN
978-1-4939-1471-5
Print ISBN
978-1-4939-1470-8
DOI
https://doi.org/10.1007/978-1-4939-1471-5