Skip to main content
Erschienen in: International Journal of Data Science and Analytics 4/2017

29.04.2017 | Regular Paper

\(L^{2}\)-norm transformation for improving k-means clustering

Finding a suitable model by range transformation for novel data analysis

verfasst von: Piyush Kumar Sharma, Gary Holness

Erschienen in: International Journal of Data Science and Analytics | Ausgabe 4/2017

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In the age of increasingly pervasive sensing applications, measurement of unknown pattern phenomena resulting in novel data presents a challenge to selection of appropriate modeling tools. Because there is no rich history of domain knowledge, one can easily make early commitments to poor modeling choices. Data transformation, a solution in effort to modify the data’s geometry, can make important regularities more clear. The wrong transformation can damage the very pattern information one seeks to identify. In contrast to data transformation, we contribute an alternative method, range transformation focusing on altering the measurement tool. As a function, a model maps data inputs to a range. Focusing on transformations of the model’s range, we can find a generally applicable way to alter the model’s properties to best suit the data. Every modification to a function class, something we call editing the function, results in a change to the original function’s range. This work contributes a method for modifying a broad class of models to suit novel data through range transformation. We investigate range transformation for a class of information theoretic transformations and evaluate impact on classification and clustering. We also develop an optimization-based framework employing range transformation based on desired geometric properties and use it to improve a widely used model, k-means clustering.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)MathSciNetMATH Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)MathSciNetMATH
2.
Zurück zum Zitat Bhattacharyya, A.: On a measure of divergence between two multinomial populations. Sankhyā Indian J. Stat, 401–406 (1946) Bhattacharyya, A.: On a measure of divergence between two multinomial populations. Sankhyā Indian J. Stat, 401–406 (1946)
3.
Zurück zum Zitat Blanc, C., Guitton, P., Schlick, C.: A methodology for description of geometrical deformations. In: Proceedings of Pacific Graphics, vol. 94 (1994) Blanc, C., Guitton, P., Schlick, C.: A methodology for description of geometrical deformations. In: Proceedings of Pacific Graphics, vol. 94 (1994)
4.
Zurück zum Zitat Brannon, R.: Kinematics: the mathematics of deformation. Course Notes, ME EN 6530 (2008) Brannon, R.: Kinematics: the mathematics of deformation. Course Notes, ME EN 6530 (2008)
5.
Zurück zum Zitat Buhmann, M.D.: Radial basis functions: theory and implementations. Camb. Monogr. Appl. Comput. Math. 12, 147–165 (2004) Buhmann, M.D.: Radial basis functions: theory and implementations. Camb. Monogr. Appl. Comput. Math. 12, 147–165 (2004)
6.
7.
Zurück zum Zitat Chen, J., Thalmann, N.M., Tsang, Z., Thalmann, D.: Fundamentals of computer graphics. World Scientific, Singapore (1994)CrossRef Chen, J., Thalmann, N.M., Tsang, Z., Thalmann, D.: Fundamentals of computer graphics. World Scientific, Singapore (1994)CrossRef
8.
Zurück zum Zitat Chisini, O.: Sul concetto di media. Periodico di Matematiche 9:2(4), 106–116 (1929)MATH Chisini, O.: Sul concetto di media. Periodico di Matematiche 9:2(4), 106–116 (1929)MATH
9.
Zurück zum Zitat Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)MATH Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (2012)MATH
10.
Zurück zum Zitat Coxeter, H.S.M., Greitzer, S.L.: Geometry Revisited. MAA, Washington, DC (1967)MATH Coxeter, H.S.M., Greitzer, S.L.: Geometry Revisited. MAA, Washington, DC (1967)MATH
11.
Zurück zum Zitat Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI–1(2), 224–227 (1979)CrossRef Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI–1(2), 224–227 (1979)CrossRef
12.
Zurück zum Zitat Dixon, L.C.W., Szegö, G.P.: Towards Global Optimisation, vol. 2. North-Holland, Amsterdam (1978)MATH Dixon, L.C.W., Szegö, G.P.: Towards Global Optimisation, vol. 2. North-Holland, Amsterdam (1978)MATH
13.
Zurück zum Zitat Faloutsos, C., Lin, K.I.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. SIGMOD Rec. 24(2), 163–174 (1995)CrossRef Faloutsos, C., Lin, K.I.: Fastmap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. SIGMOD Rec. 24(2), 163–174 (1995)CrossRef
14.
Zurück zum Zitat Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)CrossRef Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)CrossRef
16.
Zurück zum Zitat Griffith, D.A.: Reformulating classical linear statistical models. In: Advanced Spatial Statistics, pp. 82–107. Springer Netherlands (1988) Griffith, D.A.: Reformulating classical linear statistical models. In: Advanced Spatial Statistics, pp. 82–107. Springer Netherlands (1988)
17.
Zurück zum Zitat Güvenir, H.A., Altingovde, S., Uysal, I., Erel, E.: Bankruptcy prediction using feature projection based classification. In: Proceedings of SCI/ISAS (1999) Güvenir, H.A., Altingovde, S., Uysal, I., Erel, E.: Bankruptcy prediction using feature projection based classification. In: Proceedings of SCI/ISAS (1999)
18.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)CrossRef
19.
Zurück zum Zitat Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser C (Appl. Stat.) 28(1), 100–108 (1979)MATH Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. J. R. Stat. Soc. Ser C (Appl. Stat.) 28(1), 100–108 (1979)MATH
20.
Zurück zum Zitat Inder, J.: New developments in generalized information measures. Adv. Imag. Electron Phys. 91, 37–135 (1995)CrossRef Inder, J.: New developments in generalized information measures. Adv. Imag. Electron Phys. 91, 37–135 (1995)CrossRef
21.
Zurück zum Zitat Kapur, J.N.: A comparative assessment of various measures of directed divergence. Adv. Manag. Stud. 3(1), 1–16 (1984) Kapur, J.N.: A comparative assessment of various measures of directed divergence. Adv. Manag. Stud. 3(1), 1–16 (1984)
22.
Zurück zum Zitat Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, New York (2009)MATH Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, New York (2009)MATH
23.
Zurück zum Zitat Krause, E.F.: Taxicab Geometry: An Adventure in Non-Euclidean Geometry. Courier Corporation, North Chelmsford (2012) Krause, E.F.: Taxicab Geometry: An Adventure in Non-Euclidean Geometry. Courier Corporation, North Chelmsford (2012)
24.
Zurück zum Zitat Kullback, S.: Information Theory and Statistics, 2nd edn. Dover Publications, New York (1968)MATH Kullback, S.: Information Theory and Statistics, 2nd edn. Dover Publications, New York (1968)MATH
27.
Zurück zum Zitat Mahalanobis, P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)MathSciNetMATH Mahalanobis, P.C.: On the generalised distance in statistics. Proc. Natl. Inst. Sci. India 2(1), 49–55 (1936)MathSciNetMATH
28.
Zurück zum Zitat Mortenson, M.: Geometric Transformations fo 3D Modeling. Industrial Press Inc, Norwalk (2007) Mortenson, M.: Geometric Transformations fo 3D Modeling. Industrial Press Inc, Norwalk (2007)
31.
Zurück zum Zitat Platt, J.C.: Advances in kernel methods. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods, chap. Fast Training of Support Vector Machines Using Sequential Minimal Optimization, pp. 185–208. MIT Press, Cambridge (1999) Platt, J.C.: Advances in kernel methods. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods, chap. Fast Training of Support Vector Machines Using Sequential Minimal Optimization, pp. 185–208. MIT Press, Cambridge (1999)
32.
Zurück zum Zitat Raykar, V.C., Duraiswami, R.: Fast optimal bandwidth selection for kernel density estimation. In: SDM, pp. 524–528. SIAM (2006) Raykar, V.C., Duraiswami, R.: Fast optimal bandwidth selection for kernel density estimation. In: SDM, pp. 524–528. SIAM (2006)
33.
Zurück zum Zitat Rodgers, J.L., Kohler, H.P.: Reformulating and simplifying the df analysis model. Behav. Genet. 35(2), 211–217 (2005)CrossRef Rodgers, J.L., Kohler, H.P.: Reformulating and simplifying the df analysis model. Behav. Genet. 35(2), 211–217 (2005)CrossRef
34.
35.
Zurück zum Zitat Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefMATH
36.
Zurück zum Zitat Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRef Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)CrossRef
37.
Zurück zum Zitat Schlick, C.B.P.G.C.: A methodology for description of geometrical deformations. Fundam. Comput. Graph. 94 (1994) Schlick, C.B.P.G.C.: A methodology for description of geometrical deformations. Fundam. Comput. Graph. 94 (1994)
39.
Zurück zum Zitat Sharma, P., Holness, G.: Dilation of Chisini–Jensen–Shannon divergence. In: 3rd IEEE International Conference on Data Science and Advanced Analytics (2016) Sharma, P., Holness, G.: Dilation of Chisini–Jensen–Shannon divergence. In: 3rd IEEE International Conference on Data Science and Advanced Analytics (2016)
40.
Zurück zum Zitat Sharma, P.K., Holness, G.: Dilation of Chisini–Jensen–Shannon divergences. In: 33rd International Conference on Machine Learning (ICML 2016) (2016) Sharma, P.K., Holness, G.: Dilation of Chisini–Jensen–Shannon divergences. In: 33rd International Conference on Machine Learning (ICML 2016) (2016)
41.
Zurück zum Zitat Sharma, P.K., Holness, G., Markushin, Y., Melikechi, N.: A family of Chisini mean based Jensen–Shannon divergence kernels. In: 14th IEEE International Conference on Machine Learning and Applications. IEEE, Miami, FL (2015) Sharma, P.K., Holness, G., Markushin, Y., Melikechi, N.: A family of Chisini mean based Jensen–Shannon divergence kernels. In: 14th IEEE International Conference on Machine Learning and Applications. IEEE, Miami, FL (2015)
43.
Zurück zum Zitat Simonoff, J.S.: Smoothing Methods in Statistics. Springer, Berlin (2012)MATH Simonoff, J.S.: Smoothing Methods in Statistics. Springer, Berlin (2012)MATH
44.
Zurück zum Zitat Singha, J., Das, K.: Indian sign language recognition using eigen value weighted Euclidean distance based classification technique. arXiv preprint arXiv:1303.0634 (2013) Singha, J., Das, K.: Indian sign language recognition using eigen value weighted Euclidean distance based classification technique. arXiv preprint arXiv:​1303.​0634 (2013)
45.
Zurück zum Zitat Smith, A.M.: A dual algorithm for the weighted Euclidean distance min–max location problem in R2 and R3. ProQuest. Doctoral dissertation, Clemson University, South Carolina, USA (2009) Smith, A.M.: A dual algorithm for the weighted Euclidean distance min–max location problem in R2 and R3. ProQuest. Doctoral dissertation, Clemson University, South Carolina, USA (2009)
47.
Zurück zum Zitat Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRef Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRef
48.
Zurück zum Zitat Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63(2), 411–423 (2001)MathSciNetCrossRefMATH Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63(2), 411–423 (2001)MathSciNetCrossRefMATH
49.
Zurück zum Zitat Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013) Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013)
50.
Zurück zum Zitat Wagstaff, A.: The demand for health: an empirical reformulation of the Grossman model. Health Econ. 2(2), 189–198 (1993)CrossRef Wagstaff, A.: The demand for health: an empirical reformulation of the Grossman model. Health Econ. 2(2), 189–198 (1993)CrossRef
51.
Zurück zum Zitat Wang, L., Zhang, Y., Feng, J.: On the Euclidean distance of images. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1334–1339 (2005)CrossRef Wang, L., Zhang, Y., Feng, J.: On the Euclidean distance of images. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1334–1339 (2005)CrossRef
Metadaten
Titel
-norm transformation for improving k-means clustering
Finding a suitable model by range transformation for novel data analysis
verfasst von
Piyush Kumar Sharma
Gary Holness
Publikationsdatum
29.04.2017
Verlag
Springer International Publishing
Erschienen in
International Journal of Data Science and Analytics / Ausgabe 4/2017
Print ISSN: 2364-415X
Elektronische ISSN: 2364-4168
DOI
https://doi.org/10.1007/s41060-017-0054-1

Weitere Artikel der Ausgabe 4/2017

International Journal of Data Science and Analytics 4/2017 Zur Ausgabe