Skip to main content
Top
Published in: Soft Computing 12/2020

23-10-2019 | Methodologies and Application

A novel heuristic algorithm to solve penalized regression-based clustering model

Authors: Shadi Hasanzadeh Tavakkoli, Yahya Forghani, Reza Sheibani

Published in: Soft Computing | Issue 12/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Penalized regression-based clustering model (PRClust) is an extension of “sum-of-norms” clustering model. Three previously proposed heuristic algorithms for solving PRClust are: (1) DC-CD, which combines the difference of convex programming (DC) and a coordinate-wise descent algorithm (CD), (2) DC-ADMM, which combines DC with the alternating direction method of multipliers (ADMM), and (3) ALT, which uses alternate optimization. DC-CD uses \( p \times \left( {n \times \left( {n - 1} \right)} \right)/2 \) scalar slack variables to solve PRClust, where n is the number of data and p is the number of their features. In each iteration of DC-CD, these slack variables and cluster centers are updated using a second-order cone programming (SOCP). DC-ADMM uses \( p \times n \times \left( {n - 1} \right) \) scalar slack variables. In each iteration of DC-ADMM, these slack variables and cluster centers are updated with a standard ADMM. In this paper, first, PRClust is reformulated into an equivalent model. Then, a novel heuristic algorithm is proposed to solve the reformulated model. Our proposed algorithm needs only \( \left( {n \times \left( {n - 1} \right)} \right)/2 \) scalar slack variables which are much less than those of DC-CD and DC-ADMM, and updates them using a simple equation in each iteration of the algorithm. Therefore, updating slack variables in our proposed algorithm is less time-consuming than that of DC-CD and DC-ADMM. Our proposed algorithm updates only cluster centers using an unconstrained convex quadratic problem. Therefore, our proposed unconstrained convex quadratic problem is much smaller than the SOCP of DC-CD which is used to update both cluster centers and slack variables. Meanwhile, ALT updates cluster centers using a SOCP, while our proposed algorithm updates cluster centers using an unconstrained convex quadratic problem with the same number of variables. Solving an unconstrained convex quadratic problem is less time-consuming than a SOCP with the same number of variables. Our experimental results on 12 datasets confirm that the runtime of our proposed algorithm is better than that of DC-ADMM and DC-CD.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Arthur D, Vassilvitskii S (2007) K-means ++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035 Arthur D, Vassilvitskii S (2007) K-means ++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035
go back to reference Barati M, Jalali M, Forghani YJES (2019) Alternating optimization to solve penalized regression‐based clustering model. p e12462 Barati M, Jalali M, Forghani YJES (2019) Alternating optimization to solve penalized regression‐based clustering model. p e12462
go back to reference Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121CrossRef Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121CrossRef
go back to reference Capó M, Pérez A, Lozano JA (2017) An efficient approximation to the k-means clustering for massive data. Knowl-Based Syst 117:56–69CrossRef Capó M, Pérez A, Lozano JA (2017) An efficient approximation to the k-means clustering for massive data. Knowl-Based Syst 117:56–69CrossRef
go back to reference Chen GK, Chi EC, Ranola JMO, Lange K (2015) Convex clustering: an attractive alternative to hierarchical clustering. PLoS Comput Biol 11(5):e1004228CrossRef Chen GK, Chi EC, Ranola JMO, Lange K (2015) Convex clustering: an attractive alternative to hierarchical clustering. PLoS Comput Biol 11(5):e1004228CrossRef
go back to reference Cheng W, Zhang X, Pan F, Wang W (2016) HICC: an entropy splitting-based framework for hierarchical co-clustering. Knowl Inf Syst 46(2):343–367CrossRef Cheng W, Zhang X, Pan F, Wang W (2016) HICC: an entropy splitting-based framework for hierarchical co-clustering. Knowl Inf Syst 46(2):343–367CrossRef
go back to reference Gan G, Ng MK-P (2017) K-means clustering with outlier removal. Pattern Recognit Lett 90:8–14CrossRef Gan G, Ng MK-P (2017) K-means clustering with outlier removal. Pattern Recognit Lett 90:8–14CrossRef
go back to reference Han D, Agrawal A, Liao W-K, Choudhary A (2018) A fast DBSCAN algorithm with spark implementation. In: Big data in engineering applications. Springer, Berlin, pp 173–192 Han D, Agrawal A, Liao W-K, Choudhary A (2018) A fast DBSCAN algorithm with spark implementation. In: Big data in engineering applications. Springer, Berlin, pp 173–192
go back to reference Hocking TD, Joulin A, Bach F, Vert J-P (2011) Clusterpath an algorithm for clustering using convex fusion penalties. In: 28th international conference on machine learning, p 1 Hocking TD, Joulin A, Bach F, Vert J-P (2011) Clusterpath an algorithm for clustering using convex fusion penalties. In: 28th international conference on machine learning, p 1
go back to reference Ienco D, Bordogna G (2018) Fuzzy extensions of the DBScan clustering algorithm. Soft Comput 22(5):1719–1730MATHCrossRef Ienco D, Bordogna G (2018) Fuzzy extensions of the DBScan clustering algorithm. Soft Comput 22(5):1719–1730MATHCrossRef
go back to reference Le Thi Hoai A, Tao PD (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253–285MATHCrossRef Le Thi Hoai A, Tao PD (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253–285MATHCrossRef
go back to reference Lindsten F, Ohlsson H, Ljung L (2011) Clustering using sum-of-norms regularization: with application to particle filter output computation. In: Statistical signal processing workshop (SSP), 2011 IEEE, pp 201–204. IEEE Lindsten F, Ohlsson H, Ljung L (2011) Clustering using sum-of-norms regularization: with application to particle filter output computation. In: Statistical signal processing workshop (SSP), 2011 IEEE, pp 201–204. IEEE
go back to reference MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, pp 281–297 MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, pp 281–297
go back to reference Malsiner-Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26(1):303–324MathSciNetMATHCrossRef Malsiner-Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26(1):303–324MathSciNetMATHCrossRef
go back to reference Pan W, Shen X, Liu B (2013) Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. J Mach Learn Res 14(1):1865–1889MathSciNetMATH Pan W, Shen X, Liu B (2013) Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. J Mach Learn Res 14(1):1865–1889MathSciNetMATH
go back to reference Panahi A, Dubhashi D, Johansson FD, Bhattacharyya C (2017) Clustering by sum of norms: stochastic incremental algorithm, convergence and cluster recovery. In: International conference on machine learning, pp 2769–2777 Panahi A, Dubhashi D, Johansson FD, Bhattacharyya C (2017) Clustering by sum of norms: stochastic incremental algorithm, convergence and cluster recovery. In: International conference on machine learning, pp 2769–2777
go back to reference Pelckmans K, De Brabanter J, Suykens J, De Moor B (2005) Convex clustering shrinkage. In: PASCAL workshop on statistics and optimization of clustering workshop Pelckmans K, De Brabanter J, Suykens J, De Moor B (2005) Convex clustering shrinkage. In: PASCAL workshop on statistics and optimization of clustering workshop
go back to reference Pham T, Dang H, Le T, Le TH (2017) Fast support vector clustering. Vietnam J Comput Sci 4(1):13–21CrossRef Pham T, Dang H, Le T, Le TH (2017) Fast support vector clustering. Vietnam J Comput Sci 4(1):13–21CrossRef
go back to reference Seidpisheh M, Mohammadpour A (2018) Hierarchical clustering of heavy-tailed data using a new similarity measure. Intell Data Anal 22(3):569–579CrossRef Seidpisheh M, Mohammadpour A (2018) Hierarchical clustering of heavy-tailed data using a new similarity measure. Intell Data Anal 22(3):569–579CrossRef
go back to reference Wu C, Kwon S, Shen X, Pan W (2016) A new algorithm and theory for penalized regression-based clustering. J Mach Learn Res 17(188):1–25MathSciNetMATH Wu C, Kwon S, Shen X, Pan W (2016) A new algorithm and theory for penalized regression-based clustering. J Mach Learn Res 17(188):1–25MathSciNetMATH
Metadata
Title
A novel heuristic algorithm to solve penalized regression-based clustering model
Authors
Shadi Hasanzadeh Tavakkoli
Yahya Forghani
Reza Sheibani
Publication date
23-10-2019
Publisher
Springer Berlin Heidelberg
Published in
Soft Computing / Issue 12/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI
https://doi.org/10.1007/s00500-019-04448-8

Other articles of this Issue 12/2020

Soft Computing 12/2020 Go to the issue

Premium Partner