Skip to main content
Erschienen in: Pattern Analysis and Applications 3/2018

24.02.2017 | Theoretical Advances

Scalability of correlation clustering

verfasst von: Mamata Samal, V. Vijaya Saradhi, Sukumar Nandi

Erschienen in: Pattern Analysis and Applications | Ausgabe 3/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of scalability of correlation clustering (CC) is addressed by reducing the number of variables involved in the SDP formulation. A nonlinear programming formulation is obtained instead of SDP formulation, which reduces the number of variables. The new formulation is solved through limited memory Broyden Fletcher Goldfarb Shanno method. We demonstrate the potential of the nonlinear formulation on large graph datasets having more than ten thousand vertices and nine million edges. The proposed scalable formulation is experimentally shown not to compromise on quality of the obtained clusters. We compare the scalable formulation results with those of the original CC formulation. We compare the scalable formulation results with the original CC formulation, a constrained spectral clustering method which uses edge labels of the graph and differs only in the way clusters are obtained by defining the cut on the given graph and with a variant of constraint spectral clustering known as self-taught spectral clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
In maximization problem a \(\lambda\)-approximation algorithm A for the problem with cost function c is an algorithm such that, for every input I, the solution A(I) to satisfy \(c(A(I))\ge \lambda \mathrm{max}_S c(S)\) for some \(\lambda \le 1\).
 
2
Approximation value(\(\lambda\)) in the case of a maximization problem is the ratio between the cost of an optimal solution to the cost of the solution produced by the approximation algorithm, in maximization problem \(\lambda \le 1\) and in minimization problem \(\lambda \ge 1.\)
 
3
By large dataset we mean graph with large number of nodes as well as edges.
 
Literatur
1.
Zurück zum Zitat Arasu A, Ré C, Suciu D (2009) Large-scale deduplication with constraints using dedupalog. In: International conference on data engineering, pp. 952–963 Arasu A, Ré C, Suciu D (2009) Large-scale deduplication with constraints using dedupalog. In: International conference on data engineering, pp. 952–963
2.
Zurück zum Zitat Bansal N, Blum A, Chawla S (2002) Correlation clustering. Found Comput Sci (FOCS) 2002:238–247MATH Bansal N, Blum A, Chawla S (2002) Correlation clustering. Found Comput Sci (FOCS) 2002:238–247MATH
3.
Zurück zum Zitat Ben-David S, Long PM, Mansour Y (2001) Agnostic boosting. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory, COLT ’01/EuroCOLT ’01. Springer, pp. 507–516 Ben-David S, Long PM, Mansour Y (2001) Agnostic boosting. In: Proceedings of the 14th annual conference on computational learning theory and 5th European conference on computational learning theory, COLT ’01/EuroCOLT ’01. Springer, pp. 507–516
4.
Zurück zum Zitat Burer S, Monteiro R (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math Program 95(2):329–357MathSciNetCrossRefMATH Burer S, Monteiro R (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math Program 95(2):329–357MathSciNetCrossRefMATH
5.
Zurück zum Zitat Cesa-Bianchi N, Gentile C, Vitale F, Zappella G (2012) A correlation clustering approach to link classification in signed networks. J Mach Learn. Res. 23:1–34 Cesa-Bianchi N, Gentile C, Vitale F, Zappella G (2012) A correlation clustering approach to link classification in signed networks. J Mach Learn. Res. 23:1–34
7.
Zurück zum Zitat Chen Y, Sanghavi S, Xu H (2012) Clustering sparse graphs. Adv Neural Inf Process Syst 25:2204–2212 Chen Y, Sanghavi S, Xu H (2012) Clustering sparse graphs. Adv Neural Inf Process Syst 25:2204–2212
8.
Zurück zum Zitat Chierichetti F, Dalvi N, Kumar R (2014) Correlation clustering in mapreduce. In: KDD ’14 proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining Chierichetti F, Dalvi N, Kumar R (2014) Correlation clustering in mapreduce. In: KDD ’14 proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining
9.
Zurück zum Zitat Cohn D, Caruana R, Mccallum A (2007) Constrained clustering: advances in algorithms, chap. Semi-supervised clustering with user feedback. Data mining and knowledge discovery series, pp. 17–31 Cohn D, Caruana R, Mccallum A (2007) Constrained clustering: advances in algorithms, chap. Semi-supervised clustering with user feedback. Data mining and knowledge discovery series, pp. 17–31
10.
Zurück zum Zitat Giotis I, Guruswami V (2006) Correlation clustering with a fixed number of clusters. Theory Comput Open Access J 2:249–266MathSciNetMATH Giotis I, Guruswami V (2006) Correlation clustering with a fixed number of clusters. Theory Comput Open Access J 2:249–266MathSciNetMATH
11.
Zurück zum Zitat Goemans MX, Williamson DP (1995) Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J Assoc Comput Mach 42(6):1115–1145MathSciNetCrossRefMATH Goemans MX, Williamson DP (1995) Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J Assoc Comput Mach 42(6):1115–1145MathSciNetCrossRefMATH
12.
Zurück zum Zitat Immorlica N, Wirth A (2007) Constrained clustering: advances in algorithms, theory and applications, chap. correlation clustering. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 313–327 Immorlica N, Wirth A (2007) Constrained clustering: advances in algorithms, theory and applications, chap. correlation clustering. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 313–327
13.
Zurück zum Zitat Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03. Morgan Kaufmann Publishers Inc, pp. 561–566 Kamvar SD, Klein D, Manning CD (2003) Spectral learning. In: Proceedings of the 18th international joint conference on artificial intelligence, IJCAI’03. Morgan Kaufmann Publishers Inc, pp. 561–566
14.
Zurück zum Zitat Kearns MJ, Schapire RE, Sellie LM (1992) Toward efficient agnostic learning. In: Proceedings of the fifth annual workshop on Computational learning theory, COLT ’92. ACM, pp. 341–352 Kearns MJ, Schapire RE, Sellie LM (1992) Toward efficient agnostic learning. In: Proceedings of the fifth annual workshop on Computational learning theory, COLT ’92. ACM, pp. 341–352
15.
16.
Zurück zum Zitat Lu Z, Leen TK (2007) Constrained clustering: advances in algorithms, chap. pairwise constraints as priors in probabilistic clustering. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 59–90 Lu Z, Leen TK (2007) Constrained clustering: advances in algorithms, chap. pairwise constraints as priors in probabilistic clustering. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 59–90
17.
Zurück zum Zitat Pan X, Papailiopoulos D, Oymak S, Recht B, Ramchandran K, Jordan MI (2015) Parallel correlation clustering on big graphs. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28. Curran Associates, Inc, pp. 82–90 Pan X, Papailiopoulos D, Oymak S, Recht B, Ramchandran K, Jordan MI (2015) Parallel correlation clustering on big graphs. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in Neural Information Processing Systems 28. Curran Associates, Inc, pp. 82–90
18.
Zurück zum Zitat Pensa R, Robardet C, Boulicaut JF (2007) Constrained clustering: advances in algorithms, chap. constraint-driven co-clustering of 0/1 Data. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 123–148 Pensa R, Robardet C, Boulicaut JF (2007) Constrained clustering: advances in algorithms, chap. constraint-driven co-clustering of 0/1 Data. In: Wagstaff KL, Davidson I, Basu S (eds) Data mining and knowledge discovery series. Chapman & Hall, pp. 123–148
20.
Zurück zum Zitat Rand WM (1971) Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336):846–850CrossRef Rand WM (1971) Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336):846–850CrossRef
21.
Zurück zum Zitat Shanno DF, Phua KH (1980) Minimization of unconstrained multivariate functions. ACM Trans Math Softw 6:618–622CrossRefMATH Shanno DF, Phua KH (1980) Minimization of unconstrained multivariate functions. ACM Trans Math Softw 6:618–622CrossRefMATH
22.
Zurück zum Zitat Shental N, Bar-Hillel A, Hertz T, Weinshal D (2004) Computing Gaussian mixture models with EM using equivalence constraints. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT press, Cambridge Shental N, Bar-Hillel A, Hertz T, Weinshal D (2004) Computing Gaussian mixture models with EM using equivalence constraints. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT press, Cambridge
23.
Zurück zum Zitat Swamy C (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of the fifteenth annual ACM-SIAM symposium on discrete algorithms, pp. 526–527 Swamy C (2004) Correlation clustering: maximizing agreements via semidefinite programming. In: Proceedings of the fifteenth annual ACM-SIAM symposium on discrete algorithms, pp. 526–527
24.
Zurück zum Zitat Tang W, Zhong S (2008) Computational methods of feature selection, chap. pairwise constraints-guided dimensionality reduction. In: Liu H, Motoda H (eds) Data mining and knowledge discovery series. Taylor & Francis Group CRC, pp. 295–312 Tang W, Zhong S (2008) Computational methods of feature selection, chap. pairwise constraints-guided dimensionality reduction. In: Liu H, Motoda H (eds) Data mining and knowledge discovery series. Taylor & Francis Group CRC, pp. 295–312
26.
Zurück zum Zitat Wacquet G, Caillault EP, Hamad D, HéBert PA (2013) Constrained spectral embedding for K-way data clustering. Pattern Recognit Lett 34:1009–1017CrossRef Wacquet G, Caillault EP, Hamad D, HéBert PA (2013) Constrained spectral embedding for K-way data clustering. Pattern Recognit Lett 34:1009–1017CrossRef
27.
Zurück zum Zitat Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, ICML ’01. Morgan Kaufmann Publishers Inc, pp. 577–584 Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained k-means clustering with background knowledge. In: Proceedings of the eighteenth international conference on machine learning, ICML ’01. Morgan Kaufmann Publishers Inc, pp. 577–584
28.
Zurück zum Zitat Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: KDD ’10: proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 563–572 Wang X, Davidson I (2010) Flexible constrained spectral clustering. In: KDD ’10: proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 563–572
29.
Zurück zum Zitat Wang X, Qian B, Davidson I (2012) On constrained spectral clustering and its applications. Data min knowl discov 28(1):1–30MathSciNetCrossRefMATH Wang X, Qian B, Davidson I (2012) On constrained spectral clustering and its applications. Data min knowl discov 28(1):1–30MathSciNetCrossRefMATH
30.
Zurück zum Zitat Wang X, Wang J, Qian B, Wang F, Davidson I (2014) Self-taught spectral clustering via constraint augmentation. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, pp. 416–424 Wang X, Wang J, Qian B, Wang F, Davidson I (2014) Self-taught spectral clustering via constraint augmentation. In: Proceedings of the 2014 SIAM international conference on data mining, Philadelphia, Pennsylvania, USA, April 24–26, pp. 416–424
31.
32.
Zurück zum Zitat Xu C, Tao D, Xu C (2015) Multi-view self-paced learning for clustering. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. AAAI Press, pp. 3974–3980 Xu C, Tao D, Xu C (2015) Multi-view self-paced learning for clustering. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. AAAI Press, pp. 3974–3980
33.
Zurück zum Zitat Xu Q, Desjardins M (2005) Constrained spectral clustering under a local proximity structure assumption. In: Proceedings of the 18th international conference of the Florida artificial intelligence research society (FLAIRS-05). AAAI Press Xu Q, Desjardins M (2005) Constrained spectral clustering under a local proximity structure assumption. In: Proceedings of the 18th international conference of the Florida artificial intelligence research society (FLAIRS-05). AAAI Press
34.
Zurück zum Zitat Zhang XL (2014) Convex discriminative multitask clustering. IEEE Trans Pattern Anal Mach Intell 37:28–40CrossRef Zhang XL (2014) Convex discriminative multitask clustering. IEEE Trans Pattern Anal Mach Intell 37:28–40CrossRef
Metadaten
Titel
Scalability of correlation clustering
verfasst von
Mamata Samal
V. Vijaya Saradhi
Sukumar Nandi
Publikationsdatum
24.02.2017
Verlag
Springer London
Erschienen in
Pattern Analysis and Applications / Ausgabe 3/2018
Print ISSN: 1433-7541
Elektronische ISSN: 1433-755X
DOI
https://doi.org/10.1007/s10044-017-0598-7

Weitere Artikel der Ausgabe 3/2018

Pattern Analysis and Applications 3/2018 Zur Ausgabe