Top

Soft Computing

Published in:

23-10-2019 | Methodologies and Application

A novel heuristic algorithm to solve penalized regression-based clustering model

Authors: Shadi Hasanzadeh Tavakkoli, Yahya Forghani, Reza Sheibani

Published in: Soft Computing | Issue 12/2020

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Penalized regression-based clustering model (PRClust) is an extension of “sum-of-norms” clustering model. Three previously proposed heuristic algorithms for solving PRClust are: (1) DC-CD, which combines the difference of convex programming (DC) and a coordinate-wise descent algorithm (CD), (2) DC-ADMM, which combines DC with the alternating direction method of multipliers (ADMM), and (3) ALT, which uses alternate optimization. DC-CD uses \( p \times \left( {n \times \left( {n - 1} \right)} \right)/2 \) scalar slack variables to solve PRClust, where n is the number of data and p is the number of their features. In each iteration of DC-CD, these slack variables and cluster centers are updated using a second-order cone programming (SOCP). DC-ADMM uses \( p \times n \times \left( {n - 1} \right) \) scalar slack variables. In each iteration of DC-ADMM, these slack variables and cluster centers are updated with a standard ADMM. In this paper, first, PRClust is reformulated into an equivalent model. Then, a novel heuristic algorithm is proposed to solve the reformulated model. Our proposed algorithm needs only \( \left( {n \times \left( {n - 1} \right)} \right)/2 \) scalar slack variables which are much less than those of DC-CD and DC-ADMM, and updates them using a simple equation in each iteration of the algorithm. Therefore, updating slack variables in our proposed algorithm is less time-consuming than that of DC-CD and DC-ADMM. Our proposed algorithm updates only cluster centers using an unconstrained convex quadratic problem. Therefore, our proposed unconstrained convex quadratic problem is much smaller than the SOCP of DC-CD which is used to update both cluster centers and slack variables. Meanwhile, ALT updates cluster centers using a SOCP, while our proposed algorithm updates cluster centers using an unconstrained convex quadratic problem with the same number of variables. Solving an unconstrained convex quadratic problem is less time-consuming than a SOCP with the same number of variables. Our experimental results on 12 datasets confirm that the runtime of our proposed algorithm is better than that of DC-ADMM and DC-CD.

previous article On possible outputs of group decision making with interval uncertainties based on simulation techniques

next article A cluster validity evaluation method for dynamically determining the near-optimal number of clusters

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Arthur D, Vassilvitskii S (2007) K-means ++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 1027–1035

Barati M, Jalali M, Forghani YJES (2019) Alternating optimization to solve penalized regression‐based clustering model. p e12462

Bryant A, Cios K (2018) RNN-DBSCAN: a density-based clustering algorithm using reverse nearest neighbor density estimates. IEEE Trans Knowl Data Eng 30(6):1109–1121CrossRef

Capó M, Pérez A, Lozano JA (2017) An efficient approximation to the k-means clustering for massive data. Knowl-Based Syst 117:56–69CrossRef

Chen GK, Chi EC, Ranola JMO, Lange K (2015) Convex clustering: an attractive alternative to hierarchical clustering. PLoS Comput Biol 11(5):e1004228CrossRef

Cheng W, Zhang X, Pan F, Wang W (2016) HICC: an entropy splitting-based framework for hierarchical co-clustering. Knowl Inf Syst 46(2):343–367CrossRef

Chi EC, Lange K (2015) Splitting methods for convex clustering. J Comput Graph Stat 24(4):994–1013MathSciNetCrossRef

de Amorim RC (2015) Feature relevance in ward’s hierarchical clustering using the L p norm. J Classif 32(1):46–62MathSciNetMATHCrossRef

Friedman J, Hastie T, Höfling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332MathSciNetMATHCrossRef

Gan G, Ng MK-P (2017) K-means clustering with outlier removal. Pattern Recognit Lett 90:8–14CrossRef

Han D, Agrawal A, Liao W-K, Choudhary A (2018) A fast DBSCAN algorithm with spark implementation. In: Big data in engineering applications. Springer, Berlin, pp 173–192

Hocking TD, Joulin A, Bach F, Vert J-P (2011) Clusterpath an algorithm for clustering using convex fusion penalties. In: 28th international conference on machine learning, p 1

Ienco D, Bordogna G (2018) Fuzzy extensions of the DBScan clustering algorithm. Soft Comput 22(5):1719–1730MATHCrossRef

Le Thi Hoai A, Tao PD (1997) Solving a class of linearly constrained indefinite quadratic problems by DC algorithms. J Glob Optim 11(3):253–285MATHCrossRef

Lindsten F, Ohlsson H, Ljung L (2011) Clustering using sum-of-norms regularization: with application to particle filter output computation. In: Statistical signal processing workshop (SSP), 2011 IEEE, pp 201–204. IEEE

MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, no. 14. Oakland, CA, USA, pp 281–297

Malsiner-Walli G, Frühwirth-Schnatter S, Grün B (2016) Model-based clustering based on sparse finite Gaussian mixtures. Stat Comput 26(1):303–324MathSciNetMATHCrossRef

Pan W, Shen X, Liu B (2013) Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. J Mach Learn Res 14(1):1865–1889MathSciNetMATH

Panahi A, Dubhashi D, Johansson FD, Bhattacharyya C (2017) Clustering by sum of norms: stochastic incremental algorithm, convergence and cluster recovery. In: International conference on machine learning, pp 2769–2777

Pelckmans K, De Brabanter J, Suykens J, De Moor B (2005) Convex clustering shrinkage. In: PASCAL workshop on statistics and optimization of clustering workshop

Pham T, Dang H, Le T, Le TH (2017) Fast support vector clustering. Vietnam J Comput Sci 4(1):13–21CrossRef

Seidpisheh M, Mohammadpour A (2018) Hierarchical clustering of heavy-tailed data using a new similarity measure. Intell Data Anal 22(3):569–579CrossRef

Wu TT, Lange K (2008) Coordinate descent algorithms for lasso penalized regression. Ann Appl Stat 2(1):224–244MathSciNetMATHCrossRef

Wu C, Kwon S, Shen X, Pan W (2016) A new algorithm and theory for penalized regression-based clustering. J Mach Learn Res 17(188):1–25MathSciNetMATH

Title: A novel heuristic algorithm to solve penalized regression-based clustering model
Authors: Shadi Hasanzadeh Tavakkoli
Yahya Forghani
Reza Sheibani
Publication date: 23-10-2019
Publisher: Springer Berlin Heidelberg
Published in: Soft Computing / Issue 12/2020
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-019-04448-8

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 12/2020

Correction to: An intelligent personalized web blog searching technique using fuzzy-based feedback recurrent neural network

Residuated lattice of L-fuzzy ideals of a ring

Tackling longevity risk by means of financial compensation

On typical hesitant fuzzy automata

Human body flexibility fitness test based on image edge detection and feature point extraction

Influence measures in subnetworks using vertex centrality

Premium Partner