Skip to main content

2015 | OriginalPaper | Buchkapitel

Efficient Cluster Detection by Ordered Neighborhoods

verfasst von : Emin Aksehirli, Bart Goethals, Emmanuel Müller

Erschienen in: Big Data Analytics and Knowledge Discovery

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Detecting cluster structures seems to be a simple task, i.e. separating similar from dissimilar objects. However, given today’s complex data, (dis-)similarity measures and traditional clustering algorithms are not reliable in separating clusters from each other. For example, when too many dimensions are considered simultaneously, objects become unique and (dis-)similarity does not provide meaningful information to detect clusters anymore. While the (dis-)similarity measures might be meaningful for individual dimensions, algorithms fail to combine this information for cluster detection. In particular, it is the severe issue of a combinatorial search space that results in inefficient algorithms.
In this paper we propose a cluster detection method based on the ordered neighborhoods. By considering such ordered neighborhoods in each dimension individually, we derive properties that allow us to detect clustered objects in dimensions in linear time. Our algorithm exploits the ordered neighborhoods in order to find both the similar objects and the dimensions in which these objects show high similarity. Evaluation results show that our method is scalable with both database size and dimensionality and enhances cluster detection w.r.t. state-of-the-art clustering techniques.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999)CrossRef Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. SIGMOD Rec. 28(2), 61–72 (1999)CrossRef
2.
Zurück zum Zitat Aksehirli, E., Goethals, B., Müller, E., Vreeken, J.: Cartification: a neighborhood preserving transformation for mining high dimensional data. In: ICDM, pp. 937–942, December 2013 Aksehirli, E., Goethals, B., Müller, E., Vreeken, J.: Cartification: a neighborhood preserving transformation for mining high dimensional data. In: ICDM, pp. 937–942, December 2013
3.
Zurück zum Zitat Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96(12), 6745–6750 (1999)CrossRef Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96(12), 6745–6750 (1999)CrossRef
4.
Zurück zum Zitat Assent, I., Krieger, R., Müller, E., Seidl, T.: INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp. 719–724, December 2008 Assent, I., Krieger, R., Müller, E., Seidl, T.: INSCY: indexing subspace clusters with in-process-removal of redundancy. In: ICDM, pp. 719–724, December 2008
5.
Zurück zum Zitat Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998) CrossRef Beyer, K., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 217–235. Springer, Heidelberg (1998) CrossRef
6.
Zurück zum Zitat Emrich, T., Kriegel, H.-P., Mamoulis, N., Niedermayer, J., Renz, M., Züfle, A.: Reverse-nearest neighbor queries on uncertain moving object trajectories. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part II. LNCS, vol. 8422, pp. 92–107. Springer, Heidelberg (2014) CrossRef Emrich, T., Kriegel, H.-P., Mamoulis, N., Niedermayer, J., Renz, M., Züfle, A.: Reverse-nearest neighbor queries on uncertain moving object trajectories. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part II. LNCS, vol. 8422, pp. 92–107. Springer, Heidelberg (2014) CrossRef
7.
Zurück zum Zitat Goldstein, M.: \(k_n\)-nearest neighbor classification. IEEE TIT 18(5), 627–630 (1972)MATH Goldstein, M.: \(k_n\)-nearest neighbor classification. IEEE TIT 18(5), 627–630 (1972)MATH
8.
Zurück zum Zitat Jarvis, R., Patrick, E.: Clustering using a similarity measure based on shared near neighbors. IEEE TC C–22(11), 1025–1034 (1973) Jarvis, R., Patrick, E.: Clustering using a similarity measure based on shared near neighbors. IEEE TC C–22(11), 1025–1034 (1973)
9.
Zurück zum Zitat Kailing, K., Kriegel, H.P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: SDM, vol. 4. SIAM (2004) Kailing, K., Kriegel, H.P., Kröger, P.: Density-connected subspace clustering for high-dimensional data. In: SDM, vol. 4. SIAM (2004)
10.
Zurück zum Zitat Kriegel, H.P., Kröger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp. 8, November 2005 Kriegel, H.P., Kröger, P., Renz, M., Wurst, S.: A generic framework for efficient subspace clustering of high-dimensional data. In: ICDM, pp. 8, November 2005
11.
Zurück zum Zitat McCann, S., Lowe, D.: Local naive bayes nearest neighbor for image classification. In: CVPR, pp. 3650–3656, June 2012 McCann, S., Lowe, D.: Local naive bayes nearest neighbor for image classification. In: CVPR, pp. 3650–3656, June 2012
12.
Zurück zum Zitat Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, KDD 2008, pp. 533–541. ACM, New York (2008) Moise, G., Sander, J.: Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, KDD 2008, pp. 533–541. ACM, New York (2008)
13.
Zurück zum Zitat Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386, December 2009 Müller, E., Assent, I., Günnemann, S., Krieger, R., Seidl, T.: Relevant subspace clustering: Mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp. 377–386, December 2009
14.
Zurück zum Zitat Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009) Müller, E., Günnemann, S., Assent, I., Seidl, T.: Evaluating clustering in subspace projections of high dimensional data. PVLDB 2(1), 1270–1281 (2009)
15.
Zurück zum Zitat Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M.E., Batchelor, T.T., Black, P.M., Deimling, A.V., Pomeroy, S.L., Golub, T.R., Louis, D.N.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003) Nutt, C.L., Mani, D.R., Betensky, R.A., Tamayo, P., Cairncross, J.G., Ladd, C., Pohl, U., Hartmann, C., McLaughlin, M.E., Batchelor, T.T., Black, P.M., Deimling, A.V., Pomeroy, S.L., Golub, T.R., Louis, D.N.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)
16.
Zurück zum Zitat Park, Y., Park, S., Jung, W., Lee, S.G.: Reversed CF: a fast collaborative filtering algorithm using a k-nearest neighbor graph. Expert Syst. Appl. 42(8), 4022–4028 (2015)MathSciNetCrossRef Park, Y., Park, S., Jung, W., Lee, S.G.: Reversed CF: a fast collaborative filtering algorithm using a k-nearest neighbor graph. Expert Syst. Appl. 42(8), 4022–4028 (2015)MathSciNetCrossRef
17.
Zurück zum Zitat Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Sci. 344(6191), 1492–1496 (2014)CrossRef Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Sci. 344(6191), 1492–1496 (2014)CrossRef
18.
Zurück zum Zitat Schneider, J., Vlachos, M.: Fast parameterless density-based clustering via random projections. In: CIKM, CIKM 2013, pp. 861–866. ACM, New York (2013) Schneider, J., Vlachos, M.: Fast parameterless density-based clustering via random projections. In: CIKM, CIKM 2013, pp. 861–866. ACM, New York (2013)
19.
Zurück zum Zitat Sequeira, K., Zaki, M.: SCHISM: A new approach for interesting subspace mining. In: ICDM, vol. 0, pp. 186–193. IEEE Computer Society (2004) Sequeira, K., Zaki, M.: SCHISM: A new approach for interesting subspace mining. In: ICDM, vol. 0, pp. 186–193. IEEE Computer Society (2004)
20.
Zurück zum Zitat Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Disc. 26(2), 332–397 (2013)MathSciNetCrossRefMATH Sim, K., Gopalkrishnan, V., Zimek, A., Cong, G.: A survey on enhanced subspace clustering. Data Min. Knowl. Disc. 26(2), 332–397 (2013)MathSciNetCrossRefMATH
21.
Zurück zum Zitat Weinberger, K., Saul, L.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vis. 70(1), 77–90 (2006)CrossRef Weinberger, K., Saul, L.: Unsupervised learning of image manifolds by semidefinite programming. Int. J. Comput. Vis. 70(1), 77–90 (2006)CrossRef
Metadaten
Titel
Efficient Cluster Detection by Ordered Neighborhoods
verfasst von
Emin Aksehirli
Bart Goethals
Emmanuel Müller
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-22729-0_2