Skip to main content
Top
Published in:

14-12-2022

A Semi-parametric Density Estimation with Application in Clustering

Authors: Mahdi Salehi, Andriette Bekker, Mohammad Arashi

Published in: Journal of Classification | Issue 1/2023

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The idea behind density-based clustering is to associate groups to the connected components of the level sets of the density of the data to be estimated by a nonparametric method. This approach claims some advantages over both distance- and model-based clustering. Some researchers developed this technique by proposing a graph theory–based method for identifying local modes of the underlying density being estimated by the well-known kernel density estimation (KDE) with normal and t kernels. The present work proposes a semi-parametric KDE with a more flexible family of kernels including skew-normal (SN) and skew-t (ST). We show that the proposed estimator not only reduces boundary bias but it is also closer to the actual density compared to that of the usual estimator employing the Gaussian kernel. Finding optimal bandwidth for one-dimensional and multidimensional cases under the mentioned asymmetric kernels is another main result of this paper where we shrink the bandwidth more than the one obtained under the normal assumption. Finally, through a comprehensive numerical study, we will illustrate the application of the proposed semi-parametric KDE on the density-based clustering using some simulated and real data sets.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Appendix
Available only for authorised users
Literature
go back to reference Azzalini, A., & Arellano-Valle, R.B. (2013). Maximum penalized likelihood estimation for skew-normal and skew-t distributions. Journal of Statistical Planning and Inference, 143, 419–433.MathSciNetCrossRefMATH Azzalini, A., & Arellano-Valle, R.B. (2013). Maximum penalized likelihood estimation for skew-normal and skew-t distributions. Journal of Statistical Planning and Inference, 143, 419–433.MathSciNetCrossRefMATH
go back to reference Azzalini, A., & Torelli, N. (2007). Clustering via nonparametric density estimation. Statistics and Computing, 17, 71–80.MathSciNetCrossRef Azzalini, A., & Torelli, N. (2007). Clustering via nonparametric density estimation. Statistics and Computing, 17, 71–80.MathSciNetCrossRef
go back to reference Azzalini, A., & Menardi, G. (2014). Clustering via nonparametric density estimation: The R package pdfCluster. Journal of Statistical Software, 57, 1–26.CrossRefMATH Azzalini, A., & Menardi, G. (2014). Clustering via nonparametric density estimation: The R package pdfCluster. Journal of Statistical Software, 57, 1–26.CrossRefMATH
go back to reference Bagnato, L., Punzo, A., & Zoia, M. G. (2017). The multivariate leptokurtic-normal distribution and its application in model-based clustering. Canadian Journal of Statistics, 45, 95–119.MathSciNetCrossRefMATH Bagnato, L., Punzo, A., & Zoia, M. G. (2017). The multivariate leptokurtic-normal distribution and its application in model-based clustering. Canadian Journal of Statistics, 45, 95–119.MathSciNetCrossRefMATH
go back to reference Bouveyron, C., Celeux, G., Murphy, T. B., & Raftery, A. E. (2019). Model-based clustering and classification for data science: With applications in R (vol. 50). Cambridge University Press. Bouveyron, C., Celeux, G., Murphy, T. B., & Raftery, A. E. (2019). Model-based clustering and classification for data science: With applications in R (vol. 50). Cambridge University Press.
go back to reference Bouezmarni, T., & Scaillet, O. (2005). Consistency of asymmetric kernel density estimators and smoothed histograms with application to income data. Econometric Theory, 21, 390–412.MathSciNetCrossRefMATH Bouezmarni, T., & Scaillet, O. (2005). Consistency of asymmetric kernel density estimators and smoothed histograms with application to income data. Econometric Theory, 21, 390–412.MathSciNetCrossRefMATH
go back to reference Bowman, A. W., & Azzalini, A. (1997). Applied smoothing techniques for data analysis. Oxford: Claredon Press.MATH Bowman, A. W., & Azzalini, A. (1997). Applied smoothing techniques for data analysis. Oxford: Claredon Press.MATH
go back to reference Chen, S. X. (2000). Probability density function estimation using gamma kernels. Annals of the Institute of Statistical Mathematics, 52, 471–480.MathSciNetCrossRefMATH Chen, S. X. (2000). Probability density function estimation using gamma kernels. Annals of the Institute of Statistical Mathematics, 52, 471–480.MathSciNetCrossRefMATH
go back to reference Fernandez, M., & Monteiro, P.K. (2005). Central limit theorem for asymmetric kernel functionals. Annals of the Institute of Statistical Mathematics., 57, 425–442.MathSciNetCrossRefMATH Fernandez, M., & Monteiro, P.K. (2005). Central limit theorem for asymmetric kernel functionals. Annals of the Institute of Statistical Mathematics., 57, 425–442.MathSciNetCrossRefMATH
go back to reference Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. In M. Martens H.J. Russwurm (Eds.) Food research and data analysis, pp. 189–214. Appl. Sci, London. Forina, M., Armanino, C., Lanteri, S., & Tiscornia, E. (1983). Classification of olive oils from their fatty acid composition. In M. Martens H.J. Russwurm (Eds.) Food research and data analysis, pp. 189–214. Appl. Sci, London.
go back to reference Fraley, C., & Raftery, A.E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association, 97(458), 611–631.MathSciNetCrossRefMATH Fraley, C., & Raftery, A.E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American statistical Association, 97(458), 611–631.MathSciNetCrossRefMATH
go back to reference Hjort, N. L., & Glad, I.K. (1995). Nonparametric density estimation with a parametric start. The Annals of Statistics, 23, 882–904.MathSciNetCrossRefMATH Hjort, N. L., & Glad, I.K. (1995). Nonparametric density estimation with a parametric start. The Annals of Statistics, 23, 882–904.MathSciNetCrossRefMATH
go back to reference Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification., 2, 193–218.CrossRefMATH Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification., 2, 193–218.CrossRefMATH
go back to reference Hubert, M., & Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational Statistics and Data Analysis, 52, 5186–5201.MathSciNetCrossRefMATH Hubert, M., & Vandervieren, E. (2008). An adjusted boxplot for skewed distributions. Computational Statistics and Data Analysis, 52, 5186–5201.MathSciNetCrossRefMATH
go back to reference Ingrassia, S., & Punzo, A. (2020). Cluster validation for mixtures of regressions via the total sum of squares decomposition. J. Classif, 37, 526–547.MathSciNetCrossRefMATH Ingrassia, S., & Punzo, A. (2020). Cluster validation for mixtures of regressions via the total sum of squares decomposition. J. Classif, 37, 526–547.MathSciNetCrossRefMATH
go back to reference Kuruwita, C. N., Kulasekera, K. B., & Padgett, W. J. (2010). Density estimation using asymmetric kernels and Bayes bandwidths with censored data. Journal of Statistical Planning and Inference, 140, 1765–1774.MathSciNetCrossRefMATH Kuruwita, C. N., Kulasekera, K. B., & Padgett, W. J. (2010). Density estimation using asymmetric kernels and Bayes bandwidths with censored data. Journal of Statistical Planning and Inference, 140, 1765–1774.MathSciNetCrossRefMATH
go back to reference Malsiner-Walli, G., & Frühwirth-Schnatter, S. (2017). grün, B. Identifying mixtures of mixtures using Bayesian estimation. Journal of Computational and Graphical Statistics, 26, 285–295. Malsiner-Walli, G., & Frühwirth-Schnatter, S. (2017). grün, B. Identifying mixtures of mixtures using Bayesian estimation. Journal of Computational and Graphical Statistics, 26, 285–295.
go back to reference Marron, J.S., & Ruppert, D. (1994). Transformations to reduce boundary bias in kernel density estimation. Journal of the Royal Statistical Society: Series B (Methodological), 56, 653–671.MathSciNetMATH Marron, J.S., & Ruppert, D. (1994). Transformations to reduce boundary bias in kernel density estimation. Journal of the Royal Statistical Society: Series B (Methodological), 56, 653–671.MathSciNetMATH
go back to reference Mazza, A., & Punzo, A. (2011). Discrete beta kernel graduation of age-specific demographic indicators. In S. Ingrassia, R. Rocci, & M. Vichi (Eds.) New Perspectives in Statistical Modeling and Data Analysis, Studies in Classification, Data Analysis, and Knowledge Organization (pp. 127–134). Springer, Berlin. Mazza, A., & Punzo, A. (2011). Discrete beta kernel graduation of age-specific demographic indicators. In S. Ingrassia, R. Rocci, & M. Vichi (Eds.) New Perspectives in Statistical Modeling and Data Analysis, Studies in Classification, Data Analysis, and Knowledge Organization (pp. 127–134). Springer, Berlin.
go back to reference Mazza, A., & Punzo, A. (2013). Using the variation coefficient for adaptive discrete beta kernel graduation. In P. Giudici, S. Ingrassia, & M. Vichi (Eds.) Statistical models for data analysis, studies in classification, data analysis, and knowledge organization (pp. 225–232). Springer International Publishing, Switzerland. Mazza, A., & Punzo, A. (2013). Using the variation coefficient for adaptive discrete beta kernel graduation. In P. Giudici, S. Ingrassia, & M. Vichi (Eds.) Statistical models for data analysis, studies in classification, data analysis, and knowledge organization (pp. 225–232). Springer International Publishing, Switzerland.
go back to reference Mazza, A., & Punzo, A. (2013). Graduation by adaptive discrete beta kernels. In A. Giusti, G. Ritter, & M. Vichi (Eds.) Classification and data mining, studies in classification, data analysis, and knowledge organization (pp 243–250). Springer, Berlin. Mazza, A., & Punzo, A. (2013). Graduation by adaptive discrete beta kernels. In A. Giusti, G. Ritter, & M. Vichi (Eds.) Classification and data mining, studies in classification, data analysis, and knowledge organization (pp 243–250). Springer, Berlin.
go back to reference Mazza, A., & Punzo, A. (2014). DBKGrad: An R package for mortality rates graduation by fixed and adaptive discrete beta kernel techniques. Journal of Statistical Software, 57, 1–18.CrossRef Mazza, A., & Punzo, A. (2014). DBKGrad: An R package for mortality rates graduation by fixed and adaptive discrete beta kernel techniques. Journal of Statistical Software, 57, 1–18.CrossRef
go back to reference McNicholas, P.D. (2016). Mixture model-based classification. CRC press. McNicholas, P.D. (2016). Mixture model-based classification. CRC press.
go back to reference Menardi, G., & Azzalini, A. (2014). An advancement in clustering via nonparametric density estimation. Statistics and Computing, 24, 753–767.MathSciNetCrossRefMATH Menardi, G., & Azzalini, A. (2014). An advancement in clustering via nonparametric density estimation. Statistics and Computing, 24, 753–767.MathSciNetCrossRefMATH
go back to reference Millard, S. (2019). Contributions to mixture regression modelling with applications in industry. PhD thesis, University of Pretoria. Millard, S. (2019). Contributions to mixture regression modelling with applications in industry. PhD thesis, University of Pretoria.
go back to reference Moss, J., & Tveten, M. (1566). kdensity: An R package for kernel density estimation with parametric starts and asymmetric kernels. Journal of Open Source Software:4. Moss, J., & Tveten, M. (1566). kdensity: An R package for kernel density estimation with parametric starts and asymmetric kernels. Journal of Open Source Software:4.
go back to reference Lee, S., & McLachlan, G.J. (2014). Finite mixtures of multivariate skew t-distributions: Some recent and new results. Statistics and Computing, 24, 181–202.MathSciNetCrossRefMATH Lee, S., & McLachlan, G.J. (2014). Finite mixtures of multivariate skew t-distributions: Some recent and new results. Statistics and Computing, 24, 181–202.MathSciNetCrossRefMATH
go back to reference Lin, T. I., Lee, J. C., & Hsieh, W. J. (2007). Robust mixture modelling using the skew-t distribution. Statistics and Computing, 17, 81–92.MathSciNetCrossRef Lin, T. I., Lee, J. C., & Hsieh, W. J. (2007). Robust mixture modelling using the skew-t distribution. Statistics and Computing, 17, 81–92.MathSciNetCrossRef
go back to reference Loperfido, N. (2019). Finite mixtures, projection pursuit and tensor rank: A triangulation. Advances in Data Analysis and Classification, 31, 145–173.MathSciNetCrossRefMATH Loperfido, N. (2019). Finite mixtures, projection pursuit and tensor rank: A triangulation. Advances in Data Analysis and Classification, 31, 145–173.MathSciNetCrossRefMATH
go back to reference Punzo, A. (2010). Discrete beta-type models. In H. Locarek-Junge C. Weihs (Eds.) Classification as a tool for research, studies in classification, data analysis, and knowledge organization (pp 253–261). Springer, Berlin Heidelberg. Punzo, A. (2010). Discrete beta-type models. In H. Locarek-Junge C. Weihs (Eds.) Classification as a tool for research, studies in classification, data analysis, and knowledge organization (pp 253–261). Springer, Berlin Heidelberg.
go back to reference Rattihalli, R. N., & Patil, S.B. (2021). Data dependent asymmetric kernels for estimating the density function. Sankhya A., 83, 155–186.MathSciNetCrossRefMATH Rattihalli, R. N., & Patil, S.B. (2021). Data dependent asymmetric kernels for estimating the density function. Sankhya A., 83, 155–186.MathSciNetCrossRefMATH
go back to reference Salehi, M., & Azzalini, A. (2018). On application of the univariate Kotz distribution and some of its extensions. METRON, 76, 177–201.MathSciNetCrossRefMATH Salehi, M., & Azzalini, A. (2018). On application of the univariate Kotz distribution and some of its extensions. METRON, 76, 177–201.MathSciNetCrossRefMATH
go back to reference Salehi, M., & Doostparast, M. (2015). Expressions for moments of order statistics and records from the skew-normal distribution in terms of multivariate normal orthant probabilities. Statistical Methods and Applications, 24, 547–568.MathSciNetCrossRefMATH Salehi, M., & Doostparast, M. (2015). Expressions for moments of order statistics and records from the skew-normal distribution in terms of multivariate normal orthant probabilities. Statistical Methods and Applications, 24, 547–568.MathSciNetCrossRefMATH
go back to reference Saulo, H., Leiva, V., Ziegelmann, F. A., & et al (2013). A nonparametric method for estimating asymmetric densities based on skewed Birnbaum-Saunders distributions applied to environmental data. Stoch Environ Res Risk Assess., 27, 1479–1491.CrossRef Saulo, H., Leiva, V., Ziegelmann, F. A., & et al (2013). A nonparametric method for estimating asymmetric densities based on skewed Birnbaum-Saunders distributions applied to environmental data. Stoch Environ Res Risk Assess., 27, 1479–1491.CrossRef
go back to reference Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.MATH Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman and Hall.MATH
go back to reference Tomarchio, S. D., & Punzo, A. (2019). Modelling the loss given default distribution via a family of zero-and-one inflated mixture models. Journal of the Royal Statistical Society: Series A, 182, 1247–1266.MathSciNetCrossRef Tomarchio, S. D., & Punzo, A. (2019). Modelling the loss given default distribution via a family of zero-and-one inflated mixture models. Journal of the Royal Statistical Society: Series A, 182, 1247–1266.MathSciNetCrossRef
Metadata
Title
A Semi-parametric Density Estimation with Application in Clustering
Authors
Mahdi Salehi
Andriette Bekker
Mohammad Arashi
Publication date
14-12-2022
Publisher
Springer US
Published in
Journal of Classification / Issue 1/2023
Print ISSN: 0176-4268
Electronic ISSN: 1432-1343
DOI
https://doi.org/10.1007/s00357-022-09425-9

Premium Partner