Skip to main content

2018 | OriginalPaper | Buchkapitel

Approximate Correlation Clustering Using Same-Cluster Queries

verfasst von : Nir Ailon, Anup Bhattacharya, Ragesh Jaiswal

Erschienen in: LATIN 2018: Theoretical Informatics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Ashtiani et al. (NIPS 2016) introduced a semi-supervised framework for clustering (SSAC) where a learner is allowed to make same-cluster queries. More specifically, in their model, there is a query oracle that answers queries of the form “given any two vertices, do they belong to the same optimal cluster?”. In many clustering contexts, this kind of oracle queries are feasible. Ashtiani et al. showed the usefulness of such a query framework by giving a polynomial time algorithm for the k-means clustering problem where the input dataset satisfies some separation condition. Ailon et al. extended the above work to the approximation setting by giving an efficient \((1+\varepsilon )\)-approximation algorithm for k-means for any small \(\varepsilon > 0\) and any dataset within the SSAC framework. In this work, we extend this line of study to the correlation clustering problem. Correlation clustering is a graph clustering problem where pairwise similarity (or dissimilarity) information is given for every pair of vertices and the objective is to partition the vertices into clusters that minimise the disagreement (or maximises agreement) with the pairwise information given as input. These problems are popularly known as \(\mathsf {MinDisAgree}\) and \(\mathsf {MaxAgree}\) problems, and \(\mathsf {MinDisAgree}[k]\) and \(\mathsf {MaxAgree}[k]\) are versions of these problems where the number of optimal clusters is at most k. There exist Polynomial Time Approximation Schemes (PTAS) for \(\mathsf {MinDisAgree}[k]\) and \(\mathsf {MaxAgree}[k]\) where the approximation guarantee is \((1+\varepsilon )\) for any small \(\varepsilon \) and the running time is polynomial in the input parameters but exponential in k and \(1/\varepsilon \). We get a significant running time improvement within the SSAC framework at the cost of making a small number of same-cluster queries. We obtain an \((1+ \varepsilon )\)-approximation algorithm for any small \(\varepsilon \) with running time that is polynomial in the input parameters and also in k and \(1/\varepsilon \). We also give non-trivial upper and lower bounds on the number of same-cluster queries, the lower bound being based on the Exponential Time Hypothesis (ETH). Note that the existence of an efficient algorithm for \(\mathsf {MinDisAgree}[k]\) in the SSAC setting exhibits the power of same-cluster queries since such polynomial time algorithm (polynomial even in k and \(1/\varepsilon \)) is not possible in the classical (non-query) setting due to our conditional lower bounds. Our conditional lower bound is particularly interesting as it not only establishes a lower bound on the number of same cluster queries in the SSAC framework but also establishes a conditional lower bound on the running time of any \((1+\varepsilon )\)-approximation algorithm for \(\mathsf {MinDisAgree}[k]\).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Every clause in an \(\mathsf {E}3\)-\(\mathsf {SAT}\) formula has exactly 3 literals.
 
2
Readers familiar with [12] will realise that the statement of the theorem is slightly different from statement of the similar theorem (Theorem 13) in [12]. More specifically, the claim is about the function call with \(\varepsilon /4\) as a parameter rather than \(\varepsilon \). This is done to allow the recursive call in step (9) to be made with same value of precision parameter as the initial call. This does not change the approximation analysis but is crucial for our running time analysis.
 
Literatur
1.
Zurück zum Zitat Ailon, N., Bhattacharya, A., Jaiswal, R., Kumar, A.: Approximate clustering with same-cluster queries (2017). CoRR, abs/1704.01862. To Appear in ITCS 2018 Ailon, N., Bhattacharya, A., Jaiswal, R., Kumar, A.: Approximate clustering with same-cluster queries (2017). CoRR, abs/1704.01862. To Appear in ITCS 2018
2.
Zurück zum Zitat Angelidakis, H., Makarychev, K., Makarychev, Y.: Algorithms for stable and perturbation-resilient problems. In: STOC, pp. 438–451 (2017) Angelidakis, H., Makarychev, K., Makarychev, Y.: Algorithms for stable and perturbation-resilient problems. In: STOC, pp. 438–451 (2017)
3.
Zurück zum Zitat Ashtiani, H., Kushagra, S., Ben-David, S.: Clustering with same-cluster queries. In: NIPS, pp. 3216–3224 (2016) Ashtiani, H., Kushagra, S., Ben-David, S.: Clustering with same-cluster queries. In: NIPS, pp. 3216–3224 (2016)
4.
Zurück zum Zitat Awasthi, P., Balcan, M.-F, Voevodski, K.: Local algorithms for interactive clustering. In: ICML, pp. 550–558 (2014) Awasthi, P., Balcan, M.-F, Voevodski, K.: Local algorithms for interactive clustering. In: ICML, pp. 550–558 (2014)
9.
Zurück zum Zitat Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005)MathSciNetCrossRefMATH Charikar, M., Guruswami, V., Wirth, A.: Clustering with qualitative information. J. Comput. Syst. Sci. 71(3), 360–383 (2005)MathSciNetCrossRefMATH
11.
Zurück zum Zitat Fomin, F.V., Kratsch, S., Pilipczuk, M., Pilipczuk, M., Villanger, Y.: Tight bounds for parameterized complexity of cluster editing with a small number of clusters. J. Comput. Syst. Sci. 80(7), 1430–1447 (2014)MathSciNetCrossRefMATH Fomin, F.V., Kratsch, S., Pilipczuk, M., Pilipczuk, M., Villanger, Y.: Tight bounds for parameterized complexity of cluster editing with a small number of clusters. J. Comput. Syst. Sci. 80(7), 1430–1447 (2014)MathSciNetCrossRefMATH
12.
Zurück zum Zitat Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, pp. 1167–1176 (2006) Giotis, I., Guruswami, V.: Correlation clustering with a fixed number of clusters. In: SODA, pp. 1167–1176 (2006)
14.
Zurück zum Zitat Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001)MathSciNetCrossRefMATH Impagliazzo, R., Paturi, R., Zane, F.: Which problems have strongly exponential complexity? J. Comput. Syst. Sci. 63(4), 512–530 (2001)MathSciNetCrossRefMATH
15.
Zurück zum Zitat Makarychev, K., Makarychev, Y., Vijayaraghavan, A.: Correlation clustering with noisy partial information. In: COLT, pp. 1321–1342 (2015) Makarychev, K., Makarychev, Y., Vijayaraghavan, A.: Correlation clustering with noisy partial information. In: COLT, pp. 1321–1342 (2015)
16.
Zurück zum Zitat Manurangsi, P.: Almost-polynomial ratio ETH-hardness of approximating densest \(k\)-subgraph. CoRR, abs/1611.05991 (2016) Manurangsi, P.: Almost-polynomial ratio ETH-hardness of approximating densest \(k\)-subgraph. CoRR, abs/1611.05991 (2016)
17.
Zurück zum Zitat Mathieu, C., Schudy, W.: Correlation clustering with noisy input. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 712–728 (2010) Mathieu, C., Schudy, W.: Correlation clustering with noisy input. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 712–728 (2010)
19.
Zurück zum Zitat Voevodski, K., Balcan, M.-F., Röglin, H., Teng, S.-H., Xia, Y.: Efficient clustering with limited distance information. In: Conference on Uncertainty in Artificial Intelligence, pp. 632–640 (2010) Voevodski, K., Balcan, M.-F., Röglin, H., Teng, S.-H., Xia, Y.: Efficient clustering with limited distance information. In: Conference on Uncertainty in Artificial Intelligence, pp. 632–640 (2010)
Metadaten
Titel
Approximate Correlation Clustering Using Same-Cluster Queries
verfasst von
Nir Ailon
Anup Bhattacharya
Ragesh Jaiswal
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-77404-6_2

Premium Partner