Skip to main content

2015 | OriginalPaper | Buchkapitel

Finding the k in K-means Clustering: A Comparative Analysis Approach

verfasst von : Markus Lumpe, Quoc Bao Vo

Erschienen in: AI 2015: Advances in Artificial Intelligence

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper explores the application of inequality indices, a concept successfully applied in comparative software analysis among many application domains, to find the optimal value k for k-means when clustering road traffic data. We demonstrate that traditional methods for identifying the optimal value for k (such as gap statistic and Pham et al.’s method) are unable to produce meaningful values for k when applying them to a real-world dataset for road traffic. On the other hand, a method based on inequality indices shows significant promises in producing much more sensible values for the number k of clusters to be used in k-means clustering for the same road network traffic dataset.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
2
Here \(k_{max}\) has to be a reasonably large upper bound reflecting the specific characteristics of the dataset [13].
 
Literatur
1.
Zurück zum Zitat Ben-David, S., von Luxburg, U., Pál, D.: A sober look at clustering stability. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 5–19. Springer, Heidelberg (2006) CrossRef Ben-David, S., von Luxburg, U., Pál, D.: A sober look at clustering stability. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 5–19. Springer, Heidelberg (2006) CrossRef
2.
Zurück zum Zitat Cowell, F.A., Jenkins, S.P.: How much inequality can we explain? A methodology and an application to the united states. Econ. J. 105(429), 412–430 (1995)CrossRef Cowell, F.A., Jenkins, S.P.: How much inequality can we explain? A methodology and an application to the united states. Econ. J. 105(429), 412–430 (1995)CrossRef
3.
Zurück zum Zitat Färber, I., Günnemann, S., Kriegel, H.P., Kröger, P., Müller, E., Schubert, E., Seidl, T., Zimek, A.: On using class-labels in evaluation of clusterings. In: MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings held in conjunction with KDD, p. 1 (2010) Färber, I., Günnemann, S., Kriegel, H.P., Kröger, P., Müller, E., Schubert, E., Seidl, T., Zimek, A.: On using class-labels in evaluation of clusterings. In: MultiClust: 1st International Workshop on Discovering, Summarizing and Using Multiple Clusterings held in conjunction with KDD, p. 1 (2010)
4.
Zurück zum Zitat Goloshchapova, O., Lumpe, M.: On the application of inequality indices in comparative software analysis. In: Proceedings of 22nd Australian Software Engineering Conference (ASWEC 2013), pp. 117–126. IEEE Computer Society, Melbourne, June 2013 Goloshchapova, O., Lumpe, M.: On the application of inequality indices in comparative software analysis. In: Proceedings of 22nd Australian Software Engineering Conference (ASWEC 2013), pp. 117–126. IEEE Computer Society, Melbourne, June 2013
5.
Zurück zum Zitat Hamerly, G., Elkan, C.: Learning the \(k\) in \(k\)-means. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, pp. 281–288. The MIT Press, Cambridge (2004) Hamerly, G., Elkan, C.: Learning the \(k\) in \(k\)-means. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Advances in Neural Information Processing Systems 16, pp. 281–288. The MIT Press, Cambridge (2004)
6.
Zurück zum Zitat Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2013)MATH Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2013)MATH
7.
Zurück zum Zitat Kasturi, J., Acharya, R., Ramanathan, M.: An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics 19, 449–458 (2003)CrossRef Kasturi, J., Acharya, R., Ramanathan, M.: An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics 19, 449–458 (2003)CrossRef
8.
Zurück zum Zitat Le, T., Vu, H.L., Nazarathy, Y., Vo, Q.B.: Hoogendoorn: linear-quadratic model predicative control for urban traffic networks. J. Transp. Res. Part C: Emerg. Technol. 36, 498–512 (2013)CrossRef Le, T., Vu, H.L., Nazarathy, Y., Vo, Q.B.: Hoogendoorn: linear-quadratic model predicative control for urban traffic networks. J. Transp. Res. Part C: Emerg. Technol. 36, 498–512 (2013)CrossRef
9.
Zurück zum Zitat van Leeuwaarden, J.S.H., Lefeber, E., Nazarathy, Y., Rooda, J.E.: Model predictive control for the acquisition queue and related queueing networks. In: Proceedings of 5th International Conference on Queueing Theory and Network Applications (QTNA 2010), pp. 193–200. ACM, New York, July 2010 van Leeuwaarden, J.S.H., Lefeber, E., Nazarathy, Y., Rooda, J.E.: Model predictive control for the acquisition queue and related queueing networks. In: Proceedings of 5th International Conference on Queueing Theory and Network Applications (QTNA 2010), pp. 193–200. ACM, New York, July 2010
10.
Zurück zum Zitat Lumpe, M.: Partition refinement of Component Interaction Automata. Sci. Comput. Program. 78, 27–45 (2012)CrossRefMATH Lumpe, M.: Partition refinement of Component Interaction Automata. Sci. Comput. Program. 78, 27–45 (2012)CrossRefMATH
11.
Zurück zum Zitat von Luxburg, U.: Clustering stability: an overview. Found. Trends Mach. Learn. 2(3), 235–274 (2010)MATH von Luxburg, U.: Clustering stability: an overview. Found. Trends Mach. Learn. 2(3), 235–274 (2010)MATH
12.
Zurück zum Zitat Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2013)MATH Murphy, K.P.: Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge (2013)MATH
13.
Zurück zum Zitat Pham, D.T., Dimov, S.S., Nguyen, C.D.: Selection of K in K-means clustering. J. Mech. Eng. Sci. 219(Part C), 103–119 (2005)CrossRef Pham, D.T., Dimov, S.S., Nguyen, C.D.: Selection of K in K-means clustering. J. Mech. Eng. Sci. 219(Part C), 103–119 (2005)CrossRef
14.
Zurück zum Zitat Sen, A.K.: On Economic Inequality. Oxford University Press, Oxford (1973)CrossRef Sen, A.K.: On Economic Inequality. Oxford University Press, Oxford (1973)CrossRef
15.
Zurück zum Zitat Serebrenik, A., van den Brand, M.: Theil index for aggregation of software metrics values. In: Proceedings of 26th IEEE International Conference on Software Maintenance (ICSM 2010), pp. 1–9. IEEE Computer Society, Timişoara, September 2010 Serebrenik, A., van den Brand, M.: Theil index for aggregation of software metrics values. In: Proceedings of 26th IEEE International Conference on Software Maintenance (ICSM 2010), pp. 1–9. IEEE Computer Society, Timişoara, September 2010
16.
Zurück zum Zitat Shannon, C.E.: A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)MathSciNetCrossRef Shannon, C.E.: A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)MathSciNetCrossRef
17.
Zurück zum Zitat Theil, H.: Economics and Information Theory. North-Holland Publishing Company, Amsterdam (1967) Theil, H.: Economics and Information Theory. North-Holland Publishing Company, Amsterdam (1967)
18.
Zurück zum Zitat Vasa, R., Lumpe, M., Branch, P., Nierstrasz, O.: Comparative analysis of evolving software systems using the gini coefficient. In: Proceedings of 25th IEEE International Conference on Software Maintenance (ICSM 2009), pp. 179–188. IEEE Computer Society, Edmonton, September 2009 Vasa, R., Lumpe, M., Branch, P., Nierstrasz, O.: Comparative analysis of evolving software systems using the gini coefficient. In: Proceedings of 25th IEEE International Conference on Software Maintenance (ICSM 2009), pp. 179–188. IEEE Computer Society, Edmonton, September 2009
Metadaten
Titel
Finding the k in K-means Clustering: A Comparative Analysis Approach
verfasst von
Markus Lumpe
Quoc Bao Vo
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-26350-2_31