Unimodal regression via prefix isotonic regression

https://doi.org/10.1016/j.csda.2008.08.005Get rights and content

Abstract

This paper gives algorithms for determining real-valued univariate unimodal regressions, that is, for determining the optimal regression which is increasing and then decreasing. Such regressions arise in a wide variety of applications. They are shape-constrained nonparametric regressions, closely related to isotonic regression. For unimodal regression on n weighted points our algorithm for the L2 metric requires only Θ(n) time, while for the L1 metric it requires Θ(nlogn) time. For unweighted points our algorithm for the L metric requires only Θ(n) time. All of these times are optimal. Previous algorithms were for the L2 metric and required Ω(n2) time. All previous algorithms used multiple calls to isotonic regression, and our major contribution is to organize these into a prefix isotonic regression, determining the regression on all initial segments. The prefix approach reduces the total time required by utilizing the solution for one initial segment to solve the next.

Introduction

Given n univariate real data values (xi,yi,wi) with nonnegative real weights wi, i=1,,n, where x1<<xn, and given p[1,], the Lp isotonic regression of the data is the set {(xi,yî):i=1,,n} that minimizes (i=1nwi|yiyî|p)1/pif 1p<maxi=1nwi|yiyî|if p= subject to the increasing isotonic constraint that y1̂y2̂yn̂. Note that the values are merely required to be nondecreasing, rather than strictly increasing. The Lp unimodal regression of the data is the set {(xi,yî):i=1,,n} that minimizes Eq. (1) subject to the unimodal constraint that there is an m{1,,n} such that y1̂y2̂ym̂ym+1̂yn̂, i.e., such that {yî} is increasing on 1m and decreasing on mn. The unimodal constraint is also called an umbrella ordering, and isotonic regression is often called monotonic regression, though in some application areas this term means the values might be decreasing.

Isotonic regression does not yield a smooth curve, but rather a collection of level sets where the regression is constant. Fig. 1 gives an example of an isotonic regression of a set of data with equal weights, where circles represent data points and lines represent level sets, with a filled circle representing a data point which is also a level set. Fig. 2 shows a unimodal regression.

By the error of a regression we mean the quantity in Eq. (1). In the algorithms the value called error is actually the pth power of this quantity in order to simplify calculations.

Both isotonic regression and unimodal regression are examples of nonparametric shape-constrained regression. Our interest in efficient unimodal regression was motivated by its repeated use in dose-response problems with competing failure modes (Hardwick and Stout, 2000, Pan, 1996). For such problems, as the dose increases the efficacy increases but the toxicity increases as well. The goal is to find the dose that maximizes the probability of being efficacious and non-toxic, and it is usually assumed that this probability distribution is unimodal. More generally such regressions are of use in a wide range of applications when there is prior knowledge about the shape of a response function but no assumption of a parametric form. See, for example, the references to water-level time-series data in Haiminen and Gionis (2004) and to tree growth in Turner and Wollan (1997). The latter is another example of competing failure modes, where as trees in a newly planted grove grow, their “vigor” initially increases as they increase in size, but eventually starts decreasing as they compete for nutrients and light.

In Section 2 we examine previous work on the problem of determining unimodal regression. In Section 3 we introduce the notion of prefix isotonic regression, and in Sections 3.1 , 3.2 , 3.3 we develop algorithms for the L2, L1, and unweighted L versions of this problem, taking time Θ(n), Θ(nlogn), and Θ(n), respectively. These then yield unimodal algorithms of the same time complexity. All of these algorithms, both isotonic and unimodal, are optimal, and, except for the isotonic L2 problem, all are the first optimal solutions to their problems. In Section 3.4 we examine the slightly different problem of determining the value at xi of the isotonic regression on the first m values. Section 4 contains an immediate corollary of the results on prefix isotonic regression, namely that unimodal regression can be computed in the same time bounds. Section 5 concludes with some final remarks.

Throughout, we assume that the data is given in order of increasing xi values. If the data is not so ordered, then an initial sorting step, taking Θ(nlogn) time, is needed. Since the values of the xi are irrelevant we simplify notation by assuming xi=i.

Section snippets

Previous work

It is well-known that the L2 increasing isotonic regression can be determined in Θ(n) time. Apparently all published algorithms use the “pair adjacent violators” (PAV) approach (Ayer et al., 1955). In this approach, initially each data value is viewed as a level set. At each step, if there are two adjacent level sets that are out of order (i.e., the left level set is above the right one) then the sets are combined and the weighted L2 mean of the data values becomes the value of the new level

Prefix isotonic regression

By determining an isotonic regression we mean determining the error of the regression and the extents and regression values of the level sets. Given n real-valued weighted data values {(xi,yi,wi):1in} with nonnegative real weights wi, and given a metric μ on the reals, let Isom denote the μ isotonic regression on {(xi,yi,wi):1im}. The μ prefix isotonic regression problem is to determine Isom for all 1mn.

Note that prefix isotonic regression determines exactly the set of increasing isotonic

Unimodal regression

It is a very simple process to modify the algorithm in Fig. 3 to utilize prefix isotonic regression. The errorl values are calculated via a standard prefix increasing isotonic regression, and the errorr values are calculated via a prefix increasing isotonic regression going through the data in right-to-left order. The time complexity of this algorithm is quite straightforward since its total time is dominated by the time to perform the isotonic regressions.

Theorem 2

Given weighted data {(xi,yi,wi):i=1,,n

Final comments

It has been shown that the problem of determining the unimodal regression of a set of data can be optimally solved by using an approach based on prefix isotonic regression. This approach is quite similar to that in Frisén (1980), Geng and Shi (1990), Mureika et al. (1992), Pehrsson and Frisén (1983) and Turner (1998), but achieves greater efficiency by organizing the regression calculations into a systematic prefix calculation. The prefix approach not only reduces the asymptotic time of

Acknowledgements

This work was supported in part by National Science Foundation grant DMS–0072910. A preliminary version of portions of this paper appeared in Stout (2000).

The author thanks the reviewers for their helpful comments.

References (18)

There are more references available in the full text version of this article.

Cited by (50)

  • Non-convex isotonic regression via the Myersonian approach

    2021, Statistics and Probability Letters
View all citing articles on Scopus
View full text