Nonparametric Tests of Differences in Medians: Comparison of the Wilcoxon–Mann–Whitney and Robust Rank-Order Tests

Feltovich, Nick

doi:10.1023/A:1026273319211

Nonparametric Tests of Differences in Medians: Comparison of the Wilcoxon–Mann–Whitney and Robust Rank-Order Tests

Published: November 2003

Volume 6, pages 273–297, (2003)
Cite this article

Experimental Economics Aims and scope Submit manuscript

Nick Feltovich¹

1848 Accesses
80 Citations
Explore all metrics

Abstract

The nonparametric Wilcoxon–Mann–Whitney test is commonly used by experimental economists for detecting differences in central tendency between two samples. This test is only theoretically appropriate under certain assumptions concerning the population distributions from which the samples are drawn, and is often used in cases where it is unclear whether these assumptions hold, and even when they clearly do not hold. Fligner and Pollicello's (1981, Journal of the American Statistical Association. 76, 162–168) robust rank-order test is a modification of the Wilcoxon–Mann–Whitney test, designed to be appropriate in more situations than Wilcoxon–Mann–Whitney. This paper uses simulations to compare the performance of the two tests under a variety of distributional assumptions. The results are mixed. The robust rank-order test tends to yield too many false positive results for medium-sized samples, but this liberalness is relatively invariant across distributional assumptions, and seems to be due to a deficiency of the normal approximation to its test statistic's distribution, rather than the test itself. The performance of the Wilcoxon–Mann–Whitney test varies hugely, depending on the distributional assumptions; in some cases, it is conservative, in others, extremely liberal. The tests have roughly similar power. Overall, the robust rank-order test performs better than Wilcoxon–Mann–Whitney, though when critical values for the robust rank-order test are not available, so that the normal approximation must be used, their relative performance depends on the underlying distributions, the sample sizes, and the level of significance used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Small is beautiful: In defense of the small-N design

Article Open access 19 March 2018

References

Boos, D.D. and Brownie, C. (1988). “Bootstrap p-Values for Tests of Nonparametric Hypotheses.” Institute of Statistics Mimeo Series No. 1919, North Carolina State University.
Cohen, J. (1969). Statistical Power Analysis for the Behavioral Sciences. New York and London: Academic Press.
Google Scholar
Duffy, J. and Feltovich, N. (1999). “Does Observation of Others Affect Learning in Strategic Environments? An Experimental Study.” International Journal of Game Theory. 28, 131–152.
Google Scholar
Feltovich, N. (2003). “Critical Values for the Robust Rank-Order Test.” Working paper, University of Houston. Available at www.uh.edu/ ~nfelt/papers/rrovals.pdf
Fligner, M.A. and Pollicello, G.E. III. (1981). “Robust Rank Procedures for the Behrens-Fisher Problem.” Journal of the American Statistical Association. 76, 162–168.
Google Scholar
Mann, H.B. and Whitney, D.R. (1947). “On a Test of Whether One of Two Random Variables is Stochastically Larger Than the Other.” Journal of Statistical Computing and Simulation. 13, 41–48.
Google Scholar
Siegel, S. and Castellan, N.J., Jr. (1988). Nonparametric Statistics for the Behavioral Sciences. New York: McGraw-Hill.
Google Scholar
Tatsuoka, M. (1993). “Effect Size.” In G. Keren and C. Lewis (eds.), A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues. Hillsdale, NJ: Erlbaum, pp. 461–479.
Google Scholar
Wilcoxon, F. (1945). “Individual Comparisons by Ranking Methods.” Biometrics. 3, 119–122.
Google Scholar
Zimmerman, D.W. (1987). “Comparative Power of Student T Test and Mann-Whitney U Test for Unequal Sample Sizes and Variances.” Journal of Experimental Education. 55, 171–174.
Google Scholar
Zimmerman, D.W. and Zumbo, B.D. (1993a). “Rank Transformations and the Power of the Student t Test and Welch t test for non-normal populations with unequal variances.” Canadian Journal of Experimental Psychology. 47, 523–539.
Google Scholar
Zimmerman, D.W. and Zumbo, B.D. (1993b). “The Relative Power of Parametric and Nonparametric Statisti-cal Methods.” In G. Keren and C. Lewis (eds.), A Handbook for Data Analysis in the Behavioral Sciences: Methodological Issues. Hillsdale, NJ: Erlbaum, pp. 481–517.
Google Scholar
Zumbo, B.D. and Coulombe, D. (1997). “Investigation of the Robust Rank-Order Test for Non-Normal Populations with Unequal Variances: The Case of Reaction Time.” Canadian Journal of Experimental Psychology. 51, 139–149.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics, University of Houston, Houston, TX, 77204-5019, USA
Nick Feltovich

Authors

Nick Feltovich
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Feltovich, N. Nonparametric Tests of Differences in Medians: Comparison of the Wilcoxon–Mann–Whitney and Robust Rank-Order Tests. Experimental Economics 6, 273–297 (2003). https://doi.org/10.1023/A:1026273319211

Download citation

Issue Date: November 2003
DOI: https://doi.org/10.1023/A:1026273319211

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonparametric Tests of Differences in Medians: Comparison of the Wilcoxon–Mann–Whitney and Robust Rank-Order Tests

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Small is beautiful: In defense of the small-N design

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Nonparametric Tests of Differences in Medians: Comparison of the Wilcoxon–Mann–Whitney and Robust Rank-Order Tests

Abstract

Access this article

Similar content being viewed by others

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Small is beautiful: In defense of the small-N design

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation