Abstract
This paper gives algorithms for determining isotonic regressions for weighted data at a set of points P in multidimensional space with the standard componentwise ordering. The approach is based on an order-preserving embedding of P into a slightly larger directed acyclic graph (dag) G, where the transitive closure of the ordering on P is represented by paths of length 2 in G. Algorithms are given which, compared to previous results, improve the time by a factor of \(\tilde{\varTheta}(|P|)\) for the L 1, L 2, and L ∞ metrics. A yet faster algorithm is given for L 1 isotonic regression with unweighted data. L ∞ isotonic regression is not unique, and algorithms are given for finding L ∞ regressions with desirable properties such as minimizing the number of large regression errors.
Similar content being viewed by others
References
Angelov, S., Harb, B., Kannan, S., Wang, L.-S.: Weighted isotonic regression under the L 1 norm. In: Symposium on Discrete Algorithms (SODA), pp. 783–791 (2006)
Ayer, M., Brunk, H.D., Ewing, G.M., Reid, W.T., Silverman, E.: An empirical distribution function for sampling with incomplete information. Ann. Math. Stat. 5, 641–647 (1955)
Barlow, R.E., Bartholomew, D.J., Bremner, J.M., Brunk, H.D.: Statistical Inference Under Order Restrictions: the Theory and Application of Isotonic Regression. Wiley, New York (1972)
Barlow, R.E., Brunk, H.D.: The isotonic regression problem and its dual. J. Am. Stat. Soc. 67, 140–147 (1972)
Beran, R., Dümbgen, L.: Least squares and shrinkage estimation under bimonotonicity constraints. Stat. Comput. 20, 177–189 (2010)
Bril, G., Dykstra, R., Pillars, C., Robertson, T.: Algorithm AS 206: isotonic regression in two independent variables. J. R. Stat. Soc., Ser. C, Appl. Stat. 33, 352–357 (1984)
Burdakov, O., Grimvall, A., Hussian, M.: A generalized PAV algorithm for monotonic regression in several variables. In: Proceedings of COMPSTAT’2004 (2004)
Chandrasekaran, R., Rhy, Y.U., Jacob, V.S., Hong, S.: Isotonic separation. INFORMS J. Comput. 17, 462–474 (2005)
Dembczynski, K., Greco, S., Kotlowski, W., Slowinski, R.: Statistical model for rough set approach to multicriteria classification. In: PKDD 2007: 11th European Conference on Principles and Practice Knowledge Discovery in Databases. Springer Lecture Notes in Computer Science, vol. 4702, pp. 164–175 (2007)
Dette, H., Scheder, R.: Strictly monotone and smooth nonparametric regression for two or more variables. Can. J. Stat. 34, 535–561 (2006)
Dykstra, R., Hewett, J., Robertson, T.: Nonparametric, isotonic discriminant procedures. Biometrika 86, 429–438 (1999)
Dykstra, R.L., Robertson, T.: An algorithm for isotonic regression of two or more independent variables. Ann. Stat. 10, 708–716 (1982)
Gamarnik, D.: Efficient learning of monotone concepts via quadratic optimization. In: Proceedings of Computational Learning Theory (COLT), pp. 134–143 (1998)
Gebhardt, F.: An algorithm for monotone regression with one or more independent variables. Biometrika 57, 263–271 (1970)
Gudmundsson, J., Klein, O., Knauer, C., Smid, M.: Small Manhattan networks and algorithmic applications for the earth mover’s distance. In: Proceedings 23rd European Workshop on Computational Geometry, pp. 174–177 (2007)
Hanson, D.L., Pledger, G., Wright, F.T.: On consistency in monotonic regression. Ann. Stat. 1, 401–421 (1973)
Hershberger, J., Suri, S.: Applications of a semi-dynamic convex hull algorithm. BIT Numer. Math. 32, 249–267 (1992)
Jackson, D.: Note on the median of a set of numbers. Bull. Am. Math. Soc., pp. 160–164 (1921)
Kalai, A.T., Sastry, R.: The isotron algorithm: high-dimensional isotonic regression. In: Proceedings of Computational Learning Theory (COLT) (2009)
Kaufman, Y., Tamir, A.: Locating service centers with precedence constraints. Discrete Appl. Math. 47, 251–261 (1993)
Legg, D., Townsend, D.: Best monotone approximation in L ∞[0, 1]. J. Approx. Theory 42, 30–35 (1984)
Luss, R., Rosset, S., Shahar, M.: Decomposing isotonic regression for efficiently solving large problems, In: Proceedings of Neural Information Processing Systems (2010)
Maxwell, W.L., Muckstadt, J.A.: Establishing consistent and realistic reorder intervals in production-distribution systems. Oper. Res. 33, 1316–1341 (1985)
Megido, N.: Linear-time algorithms for linear programming in ℜ3 and related problems. SIAM J. Comput. 12, 759–776 (1983)
Meyer, M.: Inference for multiple isotonic regression (2010). www.stat.colostate.edu/research/TechnicalReports/2010/2010_2.pdf
Moon, T., Smola, A., Chang, Y., Zheng, Z.: IntervalRank—isotonic regression with listwise and pairwise constraints. In: Proceedings Web Search and Data Mining, pp. 151–160 (2010)
Mukarjee, H., Stern, S.: Feasible nonparametric estimation of multiargument monotone functions. J. Am. Stat. Assoc. 89, 77–80 (1994)
Pólya, G.: Sur une algorithme toujours convergent pour obtenir les polynomes de meillure approximation de Tchebycheff pour un fonction continue quelconque. Comptes Rendus 157, 840–843 (1913)
Punera, K., Ghosh, J.: Enhanced hierarchical classification via isotonic smoothing. In: Proceedings International Conference on the World Wide Web, pp. 151–160 (2008)
Qian, S., Eddy, W.F.: An algorithm for isotonic regression on ordered rectangular grids. J. Comput. Graph. Stat. 5, 225–235 (1996)
Robertson, T., Wright, F.T.: Multiple isotonic median regression. Ann. Stat. 1, 422–432 (1973)
Robertson, T., Wright, F.T., Dykstra, R.L.: Order Restricted Statistical Inference. Wiley, New York (1988)
Saarela, O., Arjas, E.: A method for Bayesian monotonic multiple regression. Scand. J. Stat. 38, 499–513 (2011)
Salanti, G., Ulm, K.: Multidimensional isotonic regression and estimation of the threshold value. Discussion paper 234. Institute für Statistik, Ludwig-Maximilians Universität, Munchen (2001)
Sasabuchi, S., Inutsuka, M., Kulatungaz, D.D.S.: A multivariate version of isotonic regression. Biometrika 70, 465–472 (1983)
Spouge, J., Wan, H., Wilber, W.J.: Least squares isotonic regression in two dimensions. J. Optim. Theory Appl. 117, 585–605 (2003)
Stout, Q.F.: Strict L ∞ isotonic regression. J. Optim. Theory Appl. 152, 121–135 (2012)
Stout, Q.F.: Weighted L ∞ isotonic regression, submitted (2012). Available at. www.eecs.umich.edu/~qstout/pap/LinfinityIso.pdf
Stout, Q.F.: Isotonic regression via partitioning. Algorithmica 66, 93–112 (2013)
Sysoev, O., Burdakov, O., Grimvall, A.: A segmentation-based algorithm for large-scale partially ordered monotonic regression. Comput. Stat. Data Anal. 55, 2463–2476 (2011)
Ubhaya, V.A.: Isotone optimization, I, II. J. Approx. Theory 12, 146–159, 315–331 (1974)
Velikova, M., Daniels, H.: Monotone Prediction Models in Data Mining. VDM Verlag (2008)
Acknowledgements
Research partially supported by NSF grant CDI-1027192, DOE grant DE-FC52-08NA28616 and King Abdulaziz University. The author thanks Günter Rote for pointing out the paper by Gudmundsson et al. [15], and the reviewer for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Appendix: Optimal Size of Order-Embeddings
Appendix: Optimal Size of Order-Embeddings
For points in 2-dimensional space the condensed rendezvous dag has Θ(nlogn) edges, and here we show that there are sets of points in 2-space for which Ω(nlogn) edges are needed. One example is the bit-reversal permutation: given integer b>0, let V be the set of 2-dimensional points {(i b i b−1…i 1,i 1 i 2…i b ):i j ∈{0,1}}, i.e., a point is in V if its b-bit 2nd coordinate, in binary, is the reversal of its 1st coordinate.
Proposition 8
Let V be the b-bit bit-reversal permutation, ordered by 2-dimensional ordering. Then for any dag G, if there is an order-embedding of V into G then G has at least \(\frac{1}{2}|V| \log_{2} |V|\) edges.
Proof
Note that G is not restricted to 2-dimensional ordering.
To prove this, let G=(V′,E′) be a dag and π be an order-embedding of V into G. Let k r denote the bit reversal of k. Then V={p(i):0≤i≤n−1}, where p(i) is the 2-dimensional point (i r,i) and n=2b. Note that p(0),p(1),…,p(n−1) have been ordered by their second coordinate. Let V 0={p(2i):0≤i<n/2} and V 1={p(2i+1):0≤i<n/2}, i.e., V x is those points where the highest bit of their first coordinate is x. Elements of V 0 can precede elements of V 1, but not vice versa. In particular, for any i, j, where 0≤i,j≤n−1, then p(i)≺p(j) if and only if i<j and either \(p(j) \not\in V_{0}\) or \(p(i) \not\in V_{1}\).
Vertex p(n−2)∈V 0 is dominated by p(n−1)∈V 1, and thus there is a path in G from π(p(n−2)) to π(p(n−1)). Let e(1) denote the first edge on this path. Since no other point dominates p(n−2), removing e(1) from E′ does not alter the fact that π preserves the 2-dimensional ordering, with the possible exception that there may be elements p(j)∈V 0 for which p(j)≺p(n−1) but π(p(j)) no longer precedes π(p(n−1)).
Let U i ={p(n−2j+1):1≤j≤i}⊂V 1. By induction, for 1≤i<n/2, suppose one has found i edges E i ={e(j):1≤j≤i} such that, for any p,q∈V, if \(p \not\in V_{0}\) or \(q \not\in U_{i}\), then p≺q in V if and only if π(p)≺π(q) in (V′,E′−E i ). To find edge e(i+1), since p(n−2i−2)∈V 0 is dominated by p(n−2i−1)∈V 1−U i , there is a path τ in (V′,E′−E i ) from π(p(n−2i−2)) to π(p(n−2i−1)). Let v∈V′ be the first vertex on τ such that there is no path from v to any vertex in π(V 0). Since the endpoint π(p(n−2i−1)) has this property v must exist, and it cannot be π(p(n−2i−2)) since there must be a path from it to π(p(n−2i))∈π(V 0). Let e(i+1) denote the incoming edge at v in τ. This edge cannot be on any path leading to π(V 1−U i+1) since otherwise π(p(n−2i−2)) would be dominated by such a point, violating the order-embedding property because p(n−2i−2) is not dominated by any point in V 1−U i+1. Similarly, it cannot be on any path that starts in π(V 1) since otherwise such a point would have a path to the predecessor of v in τ, and hence would have a path to the point π(p(n−2i))∈π(V 0), which is impossible. Therefore e(i+1) is only on paths from π(V 0) to π(U i+1). Thus E i+1=E i +e(i+1) has the requisite properties for the induction.
Once E n/2 has been determined, let E″=E′−E n/2, i.e., n/2 edges have been removed. The dag G″=(V′,E″) has the property that if p≺q and p,q∈V 0 or p,q∈V 1, then π(p)≺π(q) in G″. If we further remove all edges in E″ which are not on any path from π(p) to π(q) where p,q∈V 0 or p,q∈V 1, then the resulting dag has the same property. Further, it has connected components, one of which contains π(V 0) and another of which contains π(V 0). V 0 and V 1 are both isomorphic to the b−1 bit-reversal permutation, and thus we can recursively remove edges from their components. The process can be repeated b times, removing n/2 edges at each stage, so there must be at least \(bn/2 = \frac{1}{2}|V| \log_{2} |V|\) edges. □
A somewhat similar proof appears in [15] when there is no partial ordering on the points and G is required to be a Manhattan network, i.e., the endpoints of edges differ in exactly one coordinate.
Rights and permissions
About this article
Cite this article
Stout, Q.F. Isotonic Regression for Multiple Independent Variables. Algorithmica 71, 450–470 (2015). https://doi.org/10.1007/s00453-013-9814-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-013-9814-z