Skip to main content

2018 | OriginalPaper | Buchkapitel

Randomized Algorithms for Some Clustering Problems

verfasst von : Alexander Kel’manov, Vladimir Khandeev, Anna Panasenko

Erschienen in: Optimization Problems and Their Applications

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We consider two strongly NP-hard problems of clustering a finite set of points in Euclidean Space. Both problems have applications, in particular, in data analysis, data mining, pattern recognition, and machine learning. In the first problem, an input set is given and we need to find a cluster (i.e., a subset) of a given size which minimizes the sum of squared distances between the elements of this cluster and its centroid (the geometric center). Every point outside this cluster is considered as singleton cluster. In the second problem, we need to partition a finite set into two clusters minimizing the sum over both clusters of the weighted intracluster sums of the squared distances between the elements of the clusters and their centers. The center of the first cluster is unknown and determined as the centroid, while the center of the second one is the origin. The weight factors for both intracluster sums are the given clusters sizes. In this paper, we present parameterized randomized algorithms for these problems. For given upper bounds of the relative error and failure probability, the parameter value is defined for which both our algorithms find approximate solutions in a polynomial time. This running time is linear on the space dimension and on the input set size. The conditions are found under which these algorithms are asymptotically exact and have the time complexity that is linear on the space dimension and quadratic on the size of the input set.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat de Waal, T., Pannekoek, J., Scholtus, S.: Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken (2011)CrossRef de Waal, T., Pannekoek, J., Scholtus, S.: Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken (2011)CrossRef
2.
Zurück zum Zitat Osborne, J.W.: Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data, 1st edn. SAGE Publication, Inc., Los Angeles (2013)CrossRef Osborne, J.W.: Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data, 1st edn. SAGE Publication, Inc., Los Angeles (2013)CrossRef
3.
Zurück zum Zitat Greco, L.: Robust Methods for Data Reduction Alessio Farcomeni. Chapman and Hall/CRC, Boca Raton (2015) Greco, L.: Robust Methods for Data Reduction Alessio Farcomeni. Chapman and Hall/CRC, Boca Raton (2015)
4.
Zurück zum Zitat Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)MATH
8.
Zurück zum Zitat Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning Series). The MIT Press, Cambridge (2017)MATH Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning Series). The MIT Press, Cambridge (2017)MATH
9.
Zurück zum Zitat Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Ind. Math. 5(3), 352–357 (2011)MathSciNetCrossRef Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Ind. Math. 5(3), 352–357 (2011)MathSciNetCrossRef
10.
Zurück zum Zitat Kel’manov, A.V., Romanchenko, S.M.: Pseudopolynomial algorithms for certain computationally hard vector subset and cluster analysis problems. Autom. Remote Control 73(2), 349–354 (2012)MathSciNetCrossRef Kel’manov, A.V., Romanchenko, S.M.: Pseudopolynomial algorithms for certain computationally hard vector subset and cluster analysis problems. Autom. Remote Control 73(2), 349–354 (2012)MathSciNetCrossRef
11.
Zurück zum Zitat Kel’manov, A.V., Romanchenko, S.M.: An approximation algorithm for solving a problem of search for a vector subset. J. Appl. Ind. Math. 6(1), 90–96 (2012)MathSciNetCrossRef Kel’manov, A.V., Romanchenko, S.M.: An approximation algorithm for solving a problem of search for a vector subset. J. Appl. Ind. Math. 6(1), 90–96 (2012)MathSciNetCrossRef
12.
Zurück zum Zitat Kel’manov, A.V., Romanchenko, S.M.: An FPTAS for a vector subset search problem. J. Appl. Ind. Math. 8(3), 329–336 (2014)MathSciNetCrossRef Kel’manov, A.V., Romanchenko, S.M.: An FPTAS for a vector subset search problem. J. Appl. Ind. Math. 8(3), 329–336 (2014)MathSciNetCrossRef
13.
Zurück zum Zitat Kel’manov, A., Motkova, A., Shenmaier, V.: An approximation scheme for a weighted two-cluster partition problem. In: van der Aalst, W.M.P., Ignatov, D.I., Khachay, M., Kuznetsov, S.O., Lempitsky, V., Lomazova, I.A., Loukachevitch, N., Napoli, A., Panchenko, A., Pardalos, P.M., Savchenko, A.V., Wasserman, S. (eds.) AIST 2017. LNCS, vol. 10716, pp. 323–333. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_30CrossRef Kel’manov, A., Motkova, A., Shenmaier, V.: An approximation scheme for a weighted two-cluster partition problem. In: van der Aalst, W.M.P., Ignatov, D.I., Khachay, M., Kuznetsov, S.O., Lempitsky, V., Lomazova, I.A., Loukachevitch, N., Napoli, A., Panchenko, A., Pardalos, P.M., Savchenko, A.V., Wasserman, S. (eds.) AIST 2017. LNCS, vol. 10716, pp. 323–333. Springer, Cham (2018). https://​doi.​org/​10.​1007/​978-3-319-73013-4_​30CrossRef
14.
Zurück zum Zitat Kel’manov, A.V., Pyatkin, A.V.: NP-hardness of some quadratic Euclidean 2-clustering problems. Doklady Math. 92(2), 634–637 (2015)MathSciNetCrossRef Kel’manov, A.V., Pyatkin, A.V.: NP-hardness of some quadratic Euclidean 2-clustering problems. Doklady Math. 92(2), 634–637 (2015)MathSciNetCrossRef
15.
Zurück zum Zitat Kel’manov, A.V., Pyatkin, A.V.: On the complexity of some quadratic Euclidean 2-clustering problems. Comput. Math. Math. Phys. 56(3), 491–497 (2016)MathSciNetCrossRef Kel’manov, A.V., Pyatkin, A.V.: On the complexity of some quadratic Euclidean 2-clustering problems. Comput. Math. Math. Phys. 56(3), 491–497 (2016)MathSciNetCrossRef
16.
Zurück zum Zitat Kel’manov, A.V., Motkova, A.V.: Exact pseudopolynomial algorithms for a balanced 2-clustering problem. J. Appl. Ind. Math. 10(3), 349–355 (2016)MathSciNetCrossRef Kel’manov, A.V., Motkova, A.V.: Exact pseudopolynomial algorithms for a balanced 2-clustering problem. J. Appl. Ind. Math. 10(3), 349–355 (2016)MathSciNetCrossRef
17.
Zurück zum Zitat Kel’manov, A.V., Motkova, A.V.: Polynomial-time approximation algorithm for the problem of cardinality-weighted variance-based 2-clustering with a given center. Comp. Math. Math. Phys. 58(1), 130–136 (2018)MathSciNetCrossRef Kel’manov, A.V., Motkova, A.V.: Polynomial-time approximation algorithm for the problem of cardinality-weighted variance-based 2-clustering with a given center. Comp. Math. Math. Phys. 58(1), 130–136 (2018)MathSciNetCrossRef
18.
19.
Zurück zum Zitat Aggarwal, H., Imai, N., Katoh, N., Suri, S.: Finding k points with minimum diameter and related problems. J. Algorithms 12(1), 38–56 (1991)MathSciNetCrossRef Aggarwal, H., Imai, N., Katoh, N., Suri, S.: Finding k points with minimum diameter and related problems. J. Algorithms 12(1), 38–56 (1991)MathSciNetCrossRef
20.
Zurück zum Zitat Shenmaier, V.V.: Solving some vector subset problems by Voronoi diagrams. J. Appl. Ind. Math. 10(4), 560–566 (2016)MathSciNetCrossRef Shenmaier, V.V.: Solving some vector subset problems by Voronoi diagrams. J. Appl. Ind. Math. 10(4), 560–566 (2016)MathSciNetCrossRef
21.
Zurück zum Zitat Shenmaier, V.V.: An approximation scheme for a problem of search for a vector subset. J. Appl. Ind. Math. 6(3), 381–386 (2012)MathSciNetCrossRef Shenmaier, V.V.: An approximation scheme for a problem of search for a vector subset. J. Appl. Ind. Math. 6(3), 381–386 (2012)MathSciNetCrossRef
24.
Zurück zum Zitat Indyk, P.: A sublinear time approximation scheme for clustering in metric space. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 154–159 (1999) Indyk, P.: A sublinear time approximation scheme for clustering in metric space. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 154–159 (1999)
25.
Zurück zum Zitat de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Polynomial time approximation schemes for metric min-sum clustering. Electronic Colloquium on Computational Complexity (ECCC). Report no. 25 (2002) de la Vega, F., Karpinski, M., Kenyon, C., Rabani, Y.: Polynomial time approximation schemes for metric min-sum clustering. Electronic Colloquium on Computational Complexity (ECCC). Report no. 25 (2002)
26.
Zurück zum Zitat Kel’manov, A.V., Khandeev, V.I.: A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. Math. Phys. 55(2), 330–339 (2015)MathSciNetCrossRef Kel’manov, A.V., Khandeev, V.I.: A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. Math. Phys. 55(2), 330–339 (2015)MathSciNetCrossRef
27.
Zurück zum Zitat Wirth, N.: Algorithms + Data Structures = Programs. Prentice Hall, Englewood Cliffs (1976)MATH Wirth, N.: Algorithms + Data Structures = Programs. Prentice Hall, Englewood Cliffs (1976)MATH
Metadaten
Titel
Randomized Algorithms for Some Clustering Problems
verfasst von
Alexander Kel’manov
Vladimir Khandeev
Anna Panasenko
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-319-93800-4_9