Skip to main content
Top
Published in: Pattern Analysis and Applications 3/2018

24-02-2017 | Theoretical Advances

Scalability of correlation clustering

Authors: Mamata Samal, V. Vijaya Saradhi, Sukumar Nandi

Published in: Pattern Analysis and Applications | Issue 3/2018

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The problem of scalability of correlation clustering (CC) is addressed by reducing the number of variables involved in the SDP formulation. A nonlinear programming formulation is obtained instead of SDP formulation, which reduces the number of variables. The new formulation is solved through limited memory Broyden Fletcher Goldfarb Shanno method. We demonstrate the potential of the nonlinear formulation on large graph datasets having more than ten thousand vertices and nine million edges. The proposed scalable formulation is experimentally shown not to compromise on quality of the obtained clusters. We compare the scalable formulation results with those of the original CC formulation. We compare the scalable formulation results with the original CC formulation, a constrained spectral clustering method which uses edge labels of the graph and differs only in the way clusters are obtained by defining the cut on the given graph and with a variant of constraint spectral clustering known as self-taught spectral clustering.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
In maximization problem a \(\lambda\)-approximation algorithm A for the problem with cost function c is an algorithm such that, for every input I, the solution A(I) to satisfy \(c(A(I))\ge \lambda \mathrm{max}_S c(S)\) for some \(\lambda \le 1\).
 
2
Approximation value(\(\lambda\)) in the case of a maximization problem is the ratio between the cost of an optimal solution to the cost of the solution produced by the approximation algorithm, in maximization problem \(\lambda \le 1\) and in minimization problem \(\lambda \ge 1.\)
 
3
By large dataset we mean graph with large number of nodes as well as edges.
 
Literature
1.
go back to reference Arasu A, Ré C, Suciu D (2009) Large-scale deduplication with constraints using dedupalog. In: International conference on data engineering, pp. 952–963 Arasu A, Ré C, Suciu D (2009) Large-scale deduplication with constraints using dedupalog. In: International conference on data engineering, pp. 952–963
2.
go back to reference Bansal N, Blum A, Chawla S (2002) Correlation clustering. Found Comput Sci (FOCS) 2002:238–247MATH Bansal N, Blum A, Chawla S (2002) Correlation clustering. Found Comput Sci (FOCS) 2002:238–247MATH
3.
go back to reference Ben-David S, Long PM, Mansour Y (2001) Agnostic boosting. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory, COLT ’01/EuroCOLT ’01. Springer, pp. 507–516 Ben-David S, Long PM, Mansour Y (2001) Agnostic boosting. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory, COLT ’01/EuroCOLT ’01. Springer, pp. 507–516
4.
go back to reference Burer S, Monteiro R (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math Program 95(2):329–357MathSciNetCrossRefMATH Burer S, Monteiro R (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math Program 95(2):329–357MathSciNetCrossRefMATH
5.
go back to reference Cesa-Bianchi N, Gentile C, Vitale F, Zappella G (2012) A correlation clustering approach to link classification in signed networks. J Mach Learn. Res. 23:1–34 Cesa-Bianchi N, Gentile C, Vitale F, Zappella G (2012) A correlation clustering approach to link classification in signed networks. J Mach Learn. Res. 23:1–34
7.
go back to reference Chen Y, Sanghavi S, Xu H (2012) Clustering sparse graphs. Adv Neural Inf Process Syst 25:2204–2212 Chen Y, Sanghavi S, Xu H (2012) Clustering sparse graphs. Adv Neural Inf Process Syst 25:2204–2212
8.
go back to reference Chierichetti F, Dalvi N, Kumar R (2014) Correlation clustering in mapreduce. In: KDD ’14 proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining Chierichetti F, Dalvi N, Kumar R (2014) Correlation clustering in mapreduce. In: KDD ’14 proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining
9.
go back to reference Cohn D, Caruana R, Mccallum A (2007) Constrained clustering: advances in algorithms, chap. Semi-supervised clustering with user feedback. Data mining and knowledge discovery series, pp. 17–31 Cohn D, Caruana R, Mccallum A (2007) Constrained clustering: advances in algorithms, chap. Semi-supervised clustering with user feedback. Data mining and knowledge discovery series, pp. 17–31
10.
go back to reference Giotis I, Guruswami V (2006) Correlation clustering with a fixed number of clusters. Theory Comput Open Access J 2:249–266MathSciNetMATH Giotis I, Guruswami V (2006) Correlation clustering with a fixed number of clusters. Theory Comput Open Access J 2:249–266MathSciNetMATH
11.
go back to reference Goemans MX, Williamson DP (1995) Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J Assoc Comput Mach 42(6):1115–1145MathSciNetCrossRefMATH Goemans MX, Williamson DP (1995) Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J Assoc Comput Mach 42(6):1115–1145MathSciNetCrossRefMATH
12.
go back to reference Immorlica N, Wirth A (2007) Constrained clustering: advances in algorithms, theory and applications, chap. correlation clustering. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 313–327 Immorlica N, Wirth A (2007) Constrained clustering: advances in algorithms, theory and applications, chap. correlation clustering. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 313–327
13.
go back to reference Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03. Morgan Kaufmann Publishers Inc, pp. 561–566 Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03. Morgan Kaufmann Publishers Inc, pp. 561–566
14.
go back to reference Kearns MJ, Schapire RE, Sellie LM (1992) Toward efficient agnostic learning. In: Proceedings of the fifth annual workshop on Computational learning theory, COLT ’92. ACM, pp. 341–352 Kearns MJ, Schapire RE, Sellie LM (1992) Toward efficient agnostic learning. In: Proceedings of the fifth annual workshop on Computational learning theory, COLT ’92. ACM, pp. 341–352
16.
go back to reference Lu Z, Leen TK (2007) Constrained clustering: advances in algorithms, chap. pairwise constraints as priors in probabilistic clustering. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 59–90 Lu Z, Leen TK (2007) Constrained clustering: advances in algorithms, chap. pairwise constraints as priors in probabilistic clustering. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 59–90
17.
go back to reference Pan X, Papailiopoulos D, Oymak S, Recht B, Ramchandran K, Jordan MI (2015) Parallel correlation clustering on big graphs. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28. Curran Associates, Inc, pp. 82–90 Pan X, Papailiopoulos D, Oymak S, Recht B, Ramchandran K, Jordan MI (2015) Parallel correlation clustering on big graphs. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28. Curran Associates, Inc, pp. 82–90
18.
go back to reference Pensa R, Robardet C, Boulicaut JF (2007) Constrained clustering: advances in algorithms, chap. constraint-driven co-clustering of 0/1 Data. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 123–148 Pensa R, Robardet C, Boulicaut JF (2007) Constrained clustering: advances in algorithms, chap. constraint-driven co-clustering of 0/1 Data. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 123–148
20.
go back to reference Rand WM (1971) Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336):846–850CrossRef Rand WM (1971) Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336):846–850CrossRef
21.
go back to reference Shanno DF, Phua KH (1980) Minimization of unconstrained multivariate functions. ACM Trans Math Softw 6:618–622CrossRefMATH Shanno DF, Phua KH (1980) Minimization of unconstrained multivariate functions. ACM Trans Math Softw 6:618–622CrossRefMATH
22.
go back to reference Shental N, Bar-Hillel A, Hertz T, Weinshal D (2004) Computing Gaussian mixture models with EM using equivalence constraints. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT press, Cambridge Shental N, Bar-Hillel A, Hertz T, Weinshal D (2004) Computing Gaussian mixture models with EM using equivalence constraints. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT press, Cambridge
23.
go back to reference Swamy C (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of the fifteenth annual ACM-SIAM symposium on discrete algorithms, pp. 526–527 Swamy C (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of the fifteenth annual ACM-SIAM symposium on discrete algorithms, pp. 526–527
24.
go back to reference Tang W, Zhong S (2008) Computational methods of feature selection, chap. pairwise constraints-guided dimensionality reduction. In: Liu H, Motoda H (eds) Data mining and knowledge discovery series. Taylor & Francis Group CRC, pp. 295–312 Tang W, Zhong S (2008) Computational methods of feature selection, chap. pairwise constraints-guided dimensionality reduction. In: Liu H, Motoda H (eds) Data mining and knowledge discovery series. Taylor & Francis Group CRC, pp. 295–312
26.
go back to reference Wacquet G, Caillault EP, Hamad D, HéBert PA (2013) Constrained spectral embedding for K-way data clustering. Pattern Recognit Lett 34:1009–1017CrossRef Wacquet G, Caillault EP, Hamad D, HéBert PA (2013) Constrained spectral embedding for K-way data clustering. Pattern Recognit Lett 34:1009–1017CrossRef
27.
go back to reference Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, ICML ’01. Morgan Kaufmann Publishers Inc, pp. 577–584 Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, ICML ’01. Morgan Kaufmann Publishers Inc, pp. 577–584
28.
go back to reference Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: KDD ’10: proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 563–572 Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: KDD ’10: proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 563–572
29.
30.
go back to reference Wang X, Wang J, Qian B, Wang F, Davidson I (2014) Self-taught spectral clustering via constraint augmentation. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, pp. 416–424 Wang X, Wang J, Qian B, Wang F, Davidson I (2014) Self-taught spectral clustering via constraint augmentation. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, pp. 416–424
31.
32.
go back to reference Xu C, Tao D, Xu C (2015) Multi-view self-paced learning for clustering. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. AAAI Press, pp. 3974–3980 Xu C, Tao D, Xu C (2015) Multi-view self-paced learning for clustering. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. AAAI Press, pp. 3974–3980
33.
go back to reference Xu Q, Desjardins M (2005) Constrained spectral clustering under a local proximity structure assumption. In: Proceedings of the 18th international conference of the Florida artificial intelligence research society (FLAIRS-05). AAAI Press Xu Q, Desjardins M (2005) Constrained spectral clustering under a local proximity structure assumption. In: Proceedings of the 18th international conference of the Florida artificial intelligence research society (FLAIRS-05). AAAI Press
34.
go back to reference Zhang XL (2014) Convex discriminative multitask clustering. IEEE Trans Pattern Anal Mach Intell 37:28–40CrossRef Zhang XL (2014) Convex discriminative multitask clustering. IEEE Trans Pattern Anal Mach Intell 37:28–40CrossRef
Metadata
Title
Scalability of correlation clustering
Authors
Mamata Samal
V. Vijaya Saradhi
Sukumar Nandi
Publication date
24-02-2017
Publisher
Springer London
Published in
Pattern Analysis and Applications / Issue 3/2018
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-017-0598-7

Other articles of this Issue 3/2018

Pattern Analysis and Applications 3/2018 Go to the issue

Premium Partner