Skip to main content

2017 | OriginalPaper | Buchkapitel

Accurate Low-Space Approximation of Metric k-Median for Insertion-Only Streams

verfasst von : Vladimir Braverman, Harry Lang, Keith Levin

Erschienen in: Algorithms and Discrete Applied Mathematics

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We present a low-constant approximation for metric k-median on an insertion-only stream of n points using \(O(\epsilon ^{-3} k \log n)\) space. In particular, we present a streaming \((O(\epsilon ^{-3} k \log n), 2 + \epsilon )\)-bicriterion solution that reports cluster weights. It is well-known that running an offline algorithm on this bicriterion solution yields a \((17.66 + \epsilon )\)-approximation.
Previously, there have been two lines of research that trade off between space and accuracy in the streaming k-median problem. To date, the best-known \((k,\epsilon )\)-coreset construction requires \(O(\epsilon ^{-2} k \log ^4 n)\) space [8], while the best-known \(O(k \log n)\)-space algorithm provides only a \((O(k \log n), 1063)\)-bicriterion [3]. Our work narrows this gap significantly, matching the best-known space while significantly improving the accuracy from 1063 to \(2+\epsilon \). We also provide a matching lower bound, showing that any \({\text {polylog}}(n)\)-space streaming algorithm that maintains an \((\alpha ,\beta )\)-bicriterion must have \(\beta \ge 2\).
Our technique breaks the stream into segments defined by jumps in the optimal clustering cost, which increases monotonically as the stream progresses. By a storing an accurate summary of recent segments and a lower-space summary of older segments, our algorithm maintains a \((O(\epsilon ^{-3} k \log n), 2 + \epsilon )\)-bicriterion solution for the entire input.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Bādoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pp. 250–257. ACM, New York (2002) Bādoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pp. 250–257. ACM, New York (2002)
2.
Zurück zum Zitat Bentley, J.L., Saxe, J.B.: Decomposable searching problems I. Static-to-dynamic transformation. J. Algorithms 1(4), 301–358 (1980)MathSciNetCrossRefMATH Bentley, J.L., Saxe, J.B.: Decomposable searching problems I. Static-to-dynamic transformation. J. Algorithms 1(4), 301–358 (1980)MathSciNetCrossRefMATH
3.
Zurück zum Zitat Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, pp. 26–40. SIAM (2011) Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, pp. 26–40. SIAM (2011)
4.
Zurück zum Zitat Bury, M., Schwiegelshohn, C.: Random projections for k-means: maintaining coresets beyond merge & reduce. CoRR, abs/1504.01584 (2015) Bury, M., Schwiegelshohn, C.: Random projections for k-means: maintaining coresets beyond merge & reduce. CoRR, abs/1504.01584 (2015)
5.
Zurück zum Zitat Byrka, J., Pensyl, T., Rybicki, B., Srinivasan, A., Trinh, K.: An improved approximation for k-median, and positive correlation in budgeted optimization. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pp. 737–756. SIAM (2015) Byrka, J., Pensyl, T., Rybicki, B., Srinivasan, A., Trinh, K.: An improved approximation for k-median, and positive correlation in budgeted optimization. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pp. 737–756. SIAM (2015)
6.
Zurück zum Zitat Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2003, pp. 30–39. ACM, New York (2003) Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2003, pp. 30–39. ACM, New York (2003)
7.
Zurück zum Zitat Chen, K.: On coresets for \(k\)-median and \(k\)-means clustering in metric and euclidean spaces and their applications. SIAM J. Comput. 39(3), 923–947 (2009)MathSciNetCrossRefMATH Chen, K.: On coresets for \(k\)-median and \(k\)-means clustering in metric and euclidean spaces and their applications. SIAM J. Comput. 39(3), 923–947 (2009)MathSciNetCrossRefMATH
8.
Zurück zum Zitat Feldman, D., Langberg, M.: A unified framework for approximating and clustering data. In: Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, STOC 2011, pp. 569–578. ACM, New York (2011) Feldman, D., Langberg, M.: A unified framework for approximating and clustering data. In: Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, STOC 2011, pp. 569–578. ACM, New York (2011)
9.
Zurück zum Zitat Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO: BIRCH meets coresets for k-means clustering. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 481–492. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40450-4_41 CrossRef Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO: BIRCH meets coresets for k-means clustering. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 481–492. Springer, Heidelberg (2013). doi:10.​1007/​978-3-642-40450-4_​41 CrossRef
10.
Zurück zum Zitat Guha, S.: Tight results for clustering and summarizing data streams. In: Proceedings of the 12th International Conference on Database Theory, ICDT 2009, pp. 268–275. ACM, New York (2009) Guha, S.: Tight results for clustering and summarizing data streams. In: Proceedings of the 12th International Conference on Database Theory, ICDT 2009, pp. 268–275. ACM, New York (2009)
11.
Zurück zum Zitat Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRef Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)CrossRef
12.
13.
Zurück zum Zitat Har-Peled, S., Mazumdar, S.: Coresets for \(k\)-means and \(k\)-median clustering and their applications. In: STOC 2004, pp. 291–300 (2004) Har-Peled, S., Mazumdar, S.: Coresets for \(k\)-means and \(k\)-median clustering and their applications. In: STOC 2004, pp. 291–300 (2004)
14.
Zurück zum Zitat Meyerson, A.: Online facility location. In: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, FOCS 2001, p. 426. IEEE Computer Society, Washington, DC (2001) Meyerson, A.: Online facility location. In: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, FOCS 2001, p. 426. IEEE Computer Society, Washington, DC (2001)
15.
Zurück zum Zitat Shindler, M., Wong, A., Meyerson, A.W.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 24, pp. 2375–2383. Curran Associates Inc., Red Hook (2011) Shindler, M., Wong, A., Meyerson, A.W.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 24, pp. 2375–2383. Curran Associates Inc., Red Hook (2011)
Metadaten
Titel
Accurate Low-Space Approximation of Metric k-Median for Insertion-Only Streams
verfasst von
Vladimir Braverman
Harry Lang
Keith Levin
Copyright-Jahr
2017
DOI
https://doi.org/10.1007/978-3-319-53007-9_7