Skip to main content
Log in

Mixtures of regressions with changepoints

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

We introduce an extension to the mixture of linear regressions model where changepoints are present. Such a model provides greater flexibility over a standard changepoint regression model if the data are believed to not only have changepoints present, but are also believed to belong to two or more unobservable categories. This model can provide additional insight into data that are already modeled using mixtures of regressions, but where the presence of changepoints has not yet been investigated. After discussing the mixture of regressions with changepoints model, we then develop an Expectation/Conditional Maximization (ECM) algorithm for maximum likelihood estimation. Two simulation studies illustrate the performance of our ECM algorithm and we analyze a real dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Aitkin, M., Rubin, D.B.: Estimation and hypothesis testing in finite mixture models. J. R. Stat. Soc., Ser. B, Stat. Methodol. 47(1), 67–75 (1985)

    MATH  Google Scholar 

  • Allman, E.S., Matias, C., Rhodes, J.A.: Identifiability of parameters in latent structure models with many observed variables. Ann. Stat. 37(6A), 3099–3132 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  • Andrews, D.W.K., Lee, I., Ploberger, W.: Optimal changepoint tests for normal linear regression. J. Econom. 70(1), 9–38 (1996)

    Article  MATH  MathSciNet  Google Scholar 

  • Benaglia, T., Chauveau, D., Hunter, D.R., Young, D.S.: mixtools: an R package for analyzing finite mixture models. J. Stat. Softw. 32(6), 1–29 (2009). http://www.jstatsoft.org/v32/i06/

    Google Scholar 

  • Betts, M., Forbes, G., Diamond, A.: Thresholds in songbird occurrence in relation to landscape structures. Conserv. Biol. 21(4), 1046–1058 (2007)

    Article  Google Scholar 

  • Brinkman, N.D.: Ethanol fuel—a single-cylinder engine study of efficiency and exhaust emissions. In: S. A. E. Transactions, p. 68 (1981)

    Google Scholar 

  • Cohen, E.: Inharmonic tone perception. PhD dissertation, Stanford University (1980). Unpublished

  • Csörgő, M., Horváth, L.: Limit Theorems in Change-Point Analysis. Wiley, New York (1998)

    Google Scholar 

  • Davis, R.A., Lee, T.C.M., Rodriguez-Yam, G.A.: Testing for a change in the parameter values and order of an autoregressive model. Ann. Stat. 101(1), 223–239 (2006)

    MATH  MathSciNet  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc., Ser. B, Stat. Methodol. 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  • DeSarbo, W.S., Cron, W.L.: A maximum likelihood methodology for clusterwise linear regression. J. Classif. 5(2), 249–282 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  • DeVeaux, R.D.: Mixtures of linear regressions. Comput. Stat. Data Anal. 8(3), 227–245 (1989)

    Article  MathSciNet  Google Scholar 

  • Fong, D.K.H., DeSarbo, W.S.: A Bayesian methodology for simultaneously detecting and estimating regime change points and variable selection in multiple regression models for marketing research. Quant. Mark. Econ. 5(4), 427–453 (2007)

    Article  Google Scholar 

  • Franke, J., Stockis, J.-P., Tadjuidje-Kamgaing, J., Li, W.K.: Mixtures of nonparametric autoregressions. J. Nonparametr. Stat. 23(2), 287–303 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Gombay, E.: Change detection in autoregressive time series. J. Multivar. Anal. 99(3), 451–464 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Hennig, C.: Identifiability of models for clusterwise linear regression. J. Classif. 17(2), 273–296 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Henry, M., Kitamura, Y., Salanié, B.: Identifying finite mixtures in econometric models. Technical Report 1767, Cowles Foundation for Research in Economics, Yale University (2010)

  • Hinkley, D.V.: Inference about the intersection in two-phase regression. Biometrika 56(3), 495–504 (1969)

    Article  MATH  MathSciNet  Google Scholar 

  • Hunter, D.R., Young, D.S.: Semiparametric mixtures of regressions. J. Nonparametr. Stat. 24(1), 19–38 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  • Hurn, M., Justel, A., Robert, C.P.: Estimating mixtures of regressions. J. Comput. Graph. Stat. 12(1), 55–79 (2003)

    Article  MathSciNet  Google Scholar 

  • Hurvich, C.M., Simonoff, J.S., Tsai, C.: Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc., Ser. B, Stat. Methodol. 60(2), 271–293 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  • Julious, S.A.: Inference and estimation in a changepoint regression problem. J. R. Stat. Soc., Ser. D, Stat. 50(1), 51–61 (2001)

    Article  MathSciNet  Google Scholar 

  • Kiefer, N.M.: Discrete parameter variation: efficient estimation of a switching regression model. Econometrica 46(2), 427–434 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  • Kutner, M.H., Nachtsheim, C.J., Neter, J.: Applied Linear Regression Models, 4th edn. McGraw-Hill/Irwin, Boston (2004)

    Google Scholar 

  • Leisch, F.: FlexMix: a general framework for finite mixture models and latent class regressions in R. J. Stat. Softw. 11(8), 1–18 (2004). http://www.jstatsoft.org/v11/i08/

    Google Scholar 

  • Liu, S., Wu, S., Zidek, J.V.: On segmented multivariate regression. Stat. Sin. 7(2), 497–525 (1997)

    MATH  MathSciNet  Google Scholar 

  • Louis, T.A.: Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc., Ser. B, Stat. Methodol. 44(2), 226–233 (1982)

    MATH  MathSciNet  Google Scholar 

  • Martin-Magniette, M.L., Mary-Huard, T., Bérard, C., Robin, S.: ChIPmix: mixture model of regressions for two-color ChIP-chip analysis. Bioinformatics 24(16), 181–186 (2008)

    Article  Google Scholar 

  • McLachlan, G.J.: On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Stat. 36(3), 318–324 (1987)

    Article  MathSciNet  Google Scholar 

  • McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008)

    Book  MATH  Google Scholar 

  • McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000)

    Book  MATH  Google Scholar 

  • Meng, X.L.: On the rate of convergence of the ECM algorithm. Ann. Stat. 22(1), 326–339 (1994)

    Article  MATH  Google Scholar 

  • Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2), 267–278 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • Muggeo, V.M.R.: Estimating regression models with unknown break-points. Stat. Med. 22(19), 3055–3071 (2003)

    Article  Google Scholar 

  • Muggeo, V.M.R.: Segmented: an R package to fit regression models with broken-line relationships. R News 8(1), 20–25 (2008)

    Google Scholar 

  • Ng, S.K., McLachlan, G.J.: Using the EM algorithm to train neural networks: misconceptions and a new algorithm for multiclass classification. IEEE Trans. Neural Netw. 15(3), 738–749 (2004)

    Article  Google Scholar 

  • Park, C.-W., Kim, W.-C.: Estimation of a regression function with a sharp change point using boundary wavelets. Stat. Probab. Lett. 66(4), 435–448 (2004)

    Article  MATH  Google Scholar 

  • Peña, D., Rodrìguez, J., Tiao, G.C.: Identifying mixtures of regression equations by the SAR procedure. In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statistics, vol. 7, pp. 327–348. Clarendon, Oxford (2003)

    Google Scholar 

  • Quandt, R.E.: A new approach to estimating switching regressions. J Am. Stat. Assoc. 67(338), 306–310 (1972)

    Article  MATH  Google Scholar 

  • Quandt, R.E., Ramsey, J.B.: Estimating mixtures of normal distributions and switching regressions. J. Am. Stat. Assoc. 73(364), 730–738 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  • Richardson, S., Green, P.J.: On Bayesian analysis of mixtures with an unknown number of components. J. R. Stat. Soc., Ser. B, Stat. Methodol. 59(4), 731–792 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6(2), 461–464 (1978)

    Article  MATH  Google Scholar 

  • Sen, A.K., Srivastava, M.S.: Regression Analysis: Theory, Methods, and Applications. Springer, New York (1990)

    MATH  Google Scholar 

  • Shao, X., Zhang, X.: Testing for change points in time series. J. Am. Stat. Assoc. 105(491), 1228–1240 (2010)

    Article  MathSciNet  Google Scholar 

  • Shewhart, W.A.: Statistical Method from the Viewpoint of Quality Control. Dover, Washington (1939)

    Google Scholar 

  • Sprent, P.: Some hypotheses concerning two phase regression lines. Biometrics 17(4), 634–645 (1961)

    Article  MATH  Google Scholar 

  • Stephens, M.: Bayesian analysis of mixture models with an unknown number of components—an alternative to reversible jump methods. Ann. Stat. 28(1), 40–74 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Tiwari, R.C., Cronin, K.A., Davies, W., Feuer, E.J., Yu, B., Chib, S.: Bayesian model selection for joinpoint regression with application to age-adjusted cancer rates. J. R. Stat. Soc., Ser. C, Appl. Stat. 54(5), 919–939 (2005)

    Article  MATH  Google Scholar 

  • Turner, T.R.: Estimating the propagation rate of a viral infection of potato plants via mixtures of regressions. Appl. Stat. 49(3), 371–384 (2000)

    MATH  Google Scholar 

  • Turner, T.R.: Mixreg: functions to fit mixtures of regressions (2011). http://CRAN.R-project.org/package=mixreg. R Package Version 0.0-4

  • Ulm, K.: A statistical method for assessing a threshold in epidemiological studies. Stat. Med. 10(3), 341–349 (1991)

    Article  Google Scholar 

  • Viele, K., Tong, B.: Modeling with mixtures of linear regressions. Stat. Comput. 12(4), 315–330 (2002)

    Article  MathSciNet  Google Scholar 

  • Worsley, K.J.: Testing for a two-phase multiple regression. Technometrics 25(1), 35–42 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  • Young, D.S., Hunter, D.R.: Mixtures of regressions with predictor-dependent mixing proportions. Comput. Stat. Data Anal. 54(10), 2253–2266 (2010)

    Article  MathSciNet  Google Scholar 

  • Zeileis, A., Leisch, F., Hornik, K., Kleiber, C.: Strucchange: an R package for testing for structural change in linear regression models. J. Stat. Softw. 7(2), 1–38 (2002). http://www.jstatsoft.org/v07/i02/

    Google Scholar 

  • Zhao, J.H., Yu, P.L.: Fast ML estimation of the mixture of factor analyzers via an ECM algorithm. IEEE Trans. Neural Netw. 19(11), 1956–1961 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to two anonymous referees and an Associate Editor for numerous helpful comments during the preparation of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Derek S. Young.

Additional information

Disclaimer: This report is released to inform interested parties of research and to encourage discussion. The views expressed are those of the author and not necessarily those of the U.S. Census Bureau.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Young, D.S. Mixtures of regressions with changepoints. Stat Comput 24, 265–281 (2014). https://doi.org/10.1007/s11222-012-9369-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-012-9369-x

Keywords

Navigation