Skip to main content
Erschienen in: Journal of Classification 2/2021

02.07.2020

Model-based Clustering of Count Processes

verfasst von: Tin Lok James Ng, Thomas Brendan Murphy

Erschienen in: Journal of Classification | Ausgabe 2/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

A model-based clustering method based on Gaussian Cox process is proposed to address the problem of clustering of count process data. The model allows for nonparametric estimation of intensity functions of Poisson processes, while simultaneous clustering count process observations. A logistic Gaussian process transformation is imposed on the intensity functions to enforce smoothness. Maximum likelihood parameter estimation is carried out via the EM algorithm, while model selection is addressed using a cross-validated likelihood approach. The proposed model and methodology are applied to two datasets.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Abraham, C., Cornillon, P. A., Matzner-Løber, E., & Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scand. J. Statist., 30, 581–595.MathSciNetCrossRef Abraham, C., Cornillon, P. A., Matzner-Løber, E., & Molinari, N. (2003). Unsupervised curve clustering using B-splines. Scand. J. Statist., 30, 581–595.MathSciNetCrossRef
Zurück zum Zitat Adams, R. P., Murray, I., & MacKay, D. J. C. (2009). Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In Proceedings of the 26th annual international conference on machine learning (pp. 9–16). Adams, R. P., Murray, I., & MacKay, D. J. C. (2009). Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In Proceedings of the 26th annual international conference on machine learning (pp. 9–16).
Zurück zum Zitat Basu, S., & Dassios, A. (2002). A Cox process with log-normal intensity. Insurance sMath. Econom., 31, 297–302.MathSciNetCrossRef Basu, S., & Dassios, A. (2002). A Cox process with log-normal intensity. Insurance sMath. Econom., 31, 297–302.MathSciNetCrossRef
Zurück zum Zitat Baudry, J. -P., Raftery, A. E., Celeux, G., Lo, K., & Gottardo, R. (2010). Combining mixture components for clustering. J. Comput. Graph. Statist., 19, 332–353.MathSciNetCrossRef Baudry, J. -P., Raftery, A. E., Celeux, G., Lo, K., & Gottardo, R. (2010). Combining mixture components for clustering. J. Comput. Graph. Statist., 19, 332–353.MathSciNetCrossRef
Zurück zum Zitat Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.MathSciNetMATH Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.MathSciNetMATH
Zurück zum Zitat Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Comput. Statist. Data Anal., 71, 52–78.MathSciNetCrossRef Bouveyron, C., & Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: a review. Comput. Statist. Data Anal., 71, 52–78.MathSciNetCrossRef
Zurück zum Zitat Bouveyron, C., Celeux, G., Murphy, T.B., & Raftery, A. E. (2019). Model-based clustering and classification for data science: with applications in R. Cambridge: Cambridge University Press.CrossRef Bouveyron, C., Celeux, G., Murphy, T.B., & Raftery, A. E. (2019). Model-based clustering and classification for data science: with applications in R. Cambridge: Cambridge University Press.CrossRef
Zurück zum Zitat Chen, J., Li, S., & Tan, X. (2016). Consistency of the penalized MLE for two-parameter gamma mixture models. Sci China Math., 59, 2301–2318.MathSciNetCrossRef Chen, J., Li, S., & Tan, X. (2016). Consistency of the penalized MLE for two-parameter gamma mixture models. Sci China Math., 59, 2301–2318.MathSciNetCrossRef
Zurück zum Zitat Chen, J., Tan, X., & Zhang, R. (2008). Inference for normal mixtures in mean and variance. Statist Sinica, 18, 443–465.MathSciNetMATH Chen, J., Tan, X., & Zhang, R. (2008). Inference for normal mixtures in mean and variance. Statist Sinica, 18, 443–465.MathSciNetMATH
Zurück zum Zitat Cheng, R. C. H., & Liu, W. B. (2001). The consistency of estimators in finite mixture models. Scand J. Statist., 28, 603–616.MathSciNetCrossRef Cheng, R. C. H., & Liu, W. B. (2001). The consistency of estimators in finite mixture models. Scand J. Statist., 28, 603–616.MathSciNetCrossRef
Zurück zum Zitat Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Scand J. Statist., 30, 45–59.MathSciNetCrossRef Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Scand J. Statist., 30, 45–59.MathSciNetCrossRef
Zurück zum Zitat Côme, E., & Latifa, O. (2014). Model-based count series clustering for Bike-sharing system usage mining, a case study with the Velib’ system of Paris,́ ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Urban Computing, 5. Côme, E., & Latifa, O. (2014). Model-based count series clustering for Bike-sharing system usage mining, a case study with the Velib’ system of Paris,́ ACM Transactions on Intelligent Systems and Technology (TIST) - Special Section on Urban Computing, 5.
Zurück zum Zitat Cunningham, J. P., Shenoy, K. V., & Sahani, M. (2008). Fast Gaussian process methods for point process intensity estimation. In Proceedings of the 25th international conference on machine learning (pp. 192–199). Cunningham, J. P., Shenoy, K. V., & Sahani, M. (2008). Fast Gaussian process methods for point process intensity estimation. In Proceedings of the 25th international conference on machine learning (pp. 192–199).
Zurück zum Zitat Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39, 1–38. with discussion.MathSciNetMATH Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, 39, 1–38. with discussion.MathSciNetMATH
Zurück zum Zitat Eagle, N., & Pentland, A. (2006). Reality mining: sensing complex social systems. Pers Ubiquitous Comput., 10, 255–268.CrossRef Eagle, N., & Pentland, A. (2006). Reality mining: sensing complex social systems. Pers Ubiquitous Comput., 10, 255–268.CrossRef
Zurück zum Zitat Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc., 97, 611–631.MathSciNetCrossRef Fraley, C., & Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc., 97, 611–631.MathSciNetCrossRef
Zurück zum Zitat Fraley, C., & Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24, 155–181.MathSciNetCrossRef Fraley, C., & Raftery, A. E. (2007). Bayesian regularization for normal mixture estimation and model-based clustering. Journal of Classification, 24, 155–181.MathSciNetCrossRef
Zurück zum Zitat Giacofci, M., Lambert-Lacroix, S., Marot, G., & Picard, F. (2013). Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics, 69, 31–40.MathSciNetCrossRef Giacofci, M., Lambert-Lacroix, S., Marot, G., & Picard, F. (2013). Wavelet-based clustering for mixed-effects functional models in high dimension. Biometrics, 69, 31–40.MathSciNetCrossRef
Zurück zum Zitat Heikkinen, J., & Arjas, E. (1999). Modeling a Poisson forest in variable elevations: a nonparametric Bayesian approach. Biometrics, 55, 738–745.CrossRef Heikkinen, J., & Arjas, E. (1999). Modeling a Poisson forest in variable elevations: a nonparametric Bayesian approach. Biometrics, 55, 738–745.CrossRef
Zurück zum Zitat Jacques, J., & Preda, C. (2014). Model-based clustering for multivariate functional data. Comput Statist. Data Anal., 71, 92–106.MathSciNetCrossRef Jacques, J., & Preda, C. (2014). Model-based clustering for multivariate functional data. Comput Statist. Data Anal., 71, 92–106.MathSciNetCrossRef
Zurück zum Zitat Kiefer, J., & Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist., 27, 887–906.MathSciNetCrossRef Kiefer, J., & Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist., 27, 887–906.MathSciNetCrossRef
Zurück zum Zitat Kim, H., & Ghahramani, Z. (2006). Bayesian Gaussian process classification with the EM-EP algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1948–1959.CrossRef Kim, H., & Ghahramani, Z. (2006). Bayesian Gaussian process classification with the EM-EP algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 1948–1959.CrossRef
Zurück zum Zitat Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence - vol. 2, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., IJCAI’95, pp. 1137–1143. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence - vol. 2, San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., IJCAI’95, pp. 1137–1143.
Zurück zum Zitat Lenk, P. J. (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. J. Amer. Statist. Assoc., 83, 509–516.MathSciNetCrossRef Lenk, P. J. (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. J. Amer. Statist. Assoc., 83, 509–516.MathSciNetCrossRef
Zurück zum Zitat Lenk, P. J. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika, 78, 531–543.MathSciNetCrossRef Lenk, P. J. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika, 78, 531–543.MathSciNetCrossRef
Zurück zum Zitat Leonard, T. (1978). Density estimation, stochastic processes and prior information. J. Roy. Statist. Soc. Ser. B, 40, 113–146. with discussion.MathSciNetMATH Leonard, T. (1978). Density estimation, stochastic processes and prior information. J. Roy. Statist. Soc. Ser. B, 40, 113–146. with discussion.MathSciNetMATH
Zurück zum Zitat Lloyd, C., Gunter, T., Osborne, M. A., & Roberts, S. J. (2015). Variational inference for Gaussian process modulated Poisson processes. In Proceedings of the 32nd international conference on international conference on machine learning -, (Vol. 37 pp. 1814–1822). Lloyd, C., Gunter, T., Osborne, M. A., & Roberts, S. J. (2015). Variational inference for Gaussian process modulated Poisson processes. In Proceedings of the 32nd international conference on international conference on machine learning -, (Vol. 37 pp. 1814–1822).
Zurück zum Zitat McNicholas, P.D. (2016). Mixture model-based classification. Boca Raton: CRC Press.CrossRef McNicholas, P.D. (2016). Mixture model-based classification. Boca Raton: CRC Press.CrossRef
Zurück zum Zitat Murray, I., MacKay, D., & Adams, R. P. (2009). The Gaussian process density sampler. In Koller, D., Schuurmans, D., Bengio, Y., & Bottou, L. (Eds.) Advances in neural information processing systems 21 (pp. 9–16): Curran Associates, Inc. Murray, I., MacKay, D., & Adams, R. P. (2009). The Gaussian process density sampler. In Koller, D., Schuurmans, D., Bengio, Y., & Bottou, L. (Eds.) Advances in neural information processing systems 21 (pp. 9–16): Curran Associates, Inc.
Zurück zum Zitat O’Hagan, A., Murphy, T. B., Scrucca, L., & Gormley, I. C. (2019). Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap. Comput Stat., 34, 1779–1813.MathSciNetCrossRef O’Hagan, A., Murphy, T. B., Scrucca, L., & Gormley, I. C. (2019). Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap. Comput Stat., 34, 1779–1813.MathSciNetCrossRef
Zurück zum Zitat Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning, adaptive computation and machine learning. MIT Press: Cambridge.MATH Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning, adaptive computation and machine learning. MIT Press: Cambridge.MATH
Zurück zum Zitat Sapatinas, T. (1995). Identifiability of mixtures of power-series distributions and related characterizations. Ann. Inst. Statist. Math., 47, 447–459.MathSciNetMATH Sapatinas, T. (1995). Identifiability of mixtures of power-series distributions and related characterizations. Ann. Inst. Statist. Math., 47, 447–459.MathSciNetMATH
Zurück zum Zitat Smyth, P. (2000). Model selection for probabilistic clustering using cross-validated likelihood. Stat Comp., 10, 63–72.CrossRef Smyth, P. (2000). Model selection for probabilistic clustering using cross-validated likelihood. Stat Comp., 10, 63–72.CrossRef
Zurück zum Zitat Tokdar, S. T., & Ghosh, J. K. (2007). Posterior consistency of logistic Gaussian process priors in density estimation. J. Statist. Plann. Inference, 137, 34–42.MathSciNetCrossRef Tokdar, S. T., & Ghosh, J. K. (2007). Posterior consistency of logistic Gaussian process priors in density estimation. J. Statist. Plann. Inference, 137, 34–42.MathSciNetCrossRef
Zurück zum Zitat Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. In Advances in neural information processing systems 8 (pp. 514–520). Cambridge: MIT Press. Williams, C. K. I., & Rasmussen, C. E. (1996). Gaussian processes for regression. In Advances in neural information processing systems 8 (pp. 514–520). Cambridge: MIT Press.
Zurück zum Zitat Zhang, Y., & Yang, Y. (2015). Cross-validation for selecting a model selection procedure. J. Econom., 187, 95–112.MathSciNetCrossRef Zhang, Y., & Yang, Y. (2015). Cross-validation for selecting a model selection procedure. J. Econom., 187, 95–112.MathSciNetCrossRef
Metadaten
Titel
Model-based Clustering of Count Processes
verfasst von
Tin Lok James Ng
Thomas Brendan Murphy
Publikationsdatum
02.07.2020
Verlag
Springer US
Erschienen in
Journal of Classification / Ausgabe 2/2021
Print ISSN: 0176-4268
Elektronische ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-020-09363-4

Weitere Artikel der Ausgabe 2/2021

Journal of Classification 2/2021 Zur Ausgabe

Premium Partner