Top

Pattern Analysis and Applications

Published in:

16-12-2015 | Theoretical Advances

A novel supervised cluster adjustment method using a fast exact nearest neighbor search algorithm

Authors: Ali Zaghian, Fakhroddin Noorbehbahani

Published in: Pattern Analysis and Applications | Issue 3/2017

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Supervised clustering is a new research area that aims to improve unsupervised clustering algorithms exploiting supervised information. Today, there are several clustering algorithms, but the effective supervised cluster adjustment method which is able to adjust the resulting clusters, regardless of applied clustering algorithm has not been presented yet. In this paper, we propose a new supervised cluster adjustment method which can be applied to any clustering algorithm. Since the adjustment method is based on finding the nearest neighbors, a novel exact nearest neighbor search algorithm is also introduced which is significantly faster than the classic one. Several datasets and clustering evaluation metrics are employed to examine the effectiveness of the proposed cluster adjustment method and the proposed fast exact nearest neighbor algorithm comprehensively. The experimental results show that the proposed algorithms are significantly effective in improving clusters and accelerating nearest neighbor searches.

previous article A new validity index for crisp clusters

next article Class-specific image representation for image classification using multiple scale-invariant region detectors

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41:578–588CrossRefMATH

Grira N, Crucianu M, Boujemaa N (2004) Unsupervised and semi-supervised clustering: a brief survey. In: A review of machine learning techniques for processing multimedia content, report of the MUSCLE European network of excellence (6th Framework Programme), pp 1–12. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.59.4074

Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68

Eick CF, Zeidat NM, Zhao Z (2004) Supervised clustering—algorithms and benefits. In: ICTAI, pp 774–776

Vilalta R, Achari M, Eick CF (2004) Piece-wise model fitting using local data patterns. In: Proceedings of the 16th European conference on artificial intelligence, ECAI’2004, Vol 16, p 559

Eick CF, Rouhana A, Bagherjeiran A, Vilalta R (2006) Using clustering to learn distance functions for supervised similarity assessment. Eng Appl Artif Intell 19(4):395–401CrossRef

Basu S, Bilenko M, Mooney M (2003) Comparing and unifying search-based and similarity-based approaches to semi-supervised clustering. In: Proceedings of the ICML-2003 workshop on the continuum from labeled to unlabeled data, pp 42–49

Cohn D, Caruana R, McCallum A (2003) Semi-supervised clustering with user feedback. Constrained Clust Adv Algorithms Theory Appl 4(1):17–32MATH

Klein D, Kamvar SD, Manning CD (2002) From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. In: International conference machine learning, pp 307–314

10.

Xing EP, Ng AY, Jordan MI, Russell S (2003) Distance metric learning, with application to clustering with side-information. In: Advances in neural information processing systems 15, pp 505–512

11.

Bilenko M, Mooney RJ (2003) Adaptive duplicate detection using learnable string similarity measures. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC, pp 39–48. doi:10.1145/956755.956759

12.

Basu S, Banerjee A, Mooney R (2002) Semi-supervised clustering by seeding. In: Proceedings of the 19th international conference on machine learning (ICML-2002), pp 19–26

13.

Demiriz A, Bennett K, Embrechts MJ (1999) Semi-supervised clustering using genetic algorithms. In: Artificial neural networks in engineering (ANNIE-99), pp 809–814

14.

Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of 17th international conference machine learning (ICML 2000), pp 1103–1110

15.

Wu X, Kumar V, Ross Quinlan J, Ghosh J, Yang Q, Motoda H, McLachlan G, Ng A, Liu B, Yu P, Zhou Z-H, Steinbach M, Hand D, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37CrossRef

16.

Kolahdouzan MR, Shahabi C (2004) Voronoi-based K nearest neighbor search for spatial network databases. In: VLDB, pp 840–851

17.

Toussaint G (2002) Proximity graphs for nearest neighbor decision rules: recent progress. In: Proceedings of the 34th symposium on the INTERFACE, pp 17–20

18.

Liu T, Moore AW, Gray AG (2003) Efficient exact k-NN and nonparametric classification in high dimensions. BT—advances in neural information processing systems 16, neural information processing systems, NIPS 2003, December 8–13, 2003, Vancouver and Whistler, British Columbia, Canada, pp 265–272. Retrieved from http://papers.nips.cc/paper/2469-new-algorithms-for-efficient-high-dimensional-non-parametric-classification

19.

de Berg M, Cheong O, van Kreveld M, Overmars M (2008) Computational geometry algorithms and applications, 3rd edn. Springer, HeidelbergMATH

20.

Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: International conference on very large data bases (VLDB), pp 426–435

21.

Guttmann A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM-SIGMOD, pp 47–57

22.

Kim YJ, Patel JM (2010) Performance comparison of the R*-tree and the quadtree for knn and distance join queries. IEEE Trans Knowl Data Eng 22(7):1014–1027CrossRef

23.

Gionis A, Indyk P, Motwani R (1999) Similarity search in high dimensions via hashing. In: VLDB’99 proceedings of the 25th international conference on very large data bases, pp 518–529

24.

Wang X (2011) A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality. In: IJCNN, pp 1293–1299

25.

Bache K, Lichman M (2013) UCI machine learning repository (Online). Accessed 4 Apr 2013. http://archive.ics.uci.edu/ml

26.

Amigó E, Gonzalo J, Artiles J, Verdejo F (2009) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retr Boston 12(4):461–486CrossRef

27.

Zhao Y (2001) Criterion functions for document clustering: experiments and analysis (technical report), Department of Computer Science, University of Minnesota, pp 1–30

28.

Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques (00-34), Technical report, University of Minnesota

29.

Meila M (2003) Comparing clusterings by the variation of information. Learning theory and kernel machines. Springer, Washington, pp 173–187CrossRef

30.

Bagga A, Baldwin B (1998) Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the 17th international conference on computational linguistics, vol 1, pp 79–85

31.

Gennari JH, Langley P, Fisher DH (1989) Models of incremental concept formation. Artif Intell 40(1–3):11–61CrossRef

32.

MacQueen JB (1966) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp 281–297

33.

Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New YorkCrossRefMATH

34.

Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the 18th annual ACM-SIAM symposium on discrete algorithms, pp 1027–1035

35.

Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69MathSciNetCrossRefMATH

36.

Rousseeuw L, Kaufman L (1987) Clustering by means of medoids. Statistical data analysis based L1-norm related methods. First international conference, 405, 405–416

37.

Powers DMW (2011) Evaluation: from precision, recall and f-measure to ROC, informedness, markedness and correlation. J Mach Learn Technol 2(1):37–63MathSciNet

Title: A novel supervised cluster adjustment method using a fast exact nearest neighbor search algorithm
Authors: Ali Zaghian
Fakhroddin Noorbehbahani
Publication date: 16-12-2015
Publisher: Springer London
Published in: Pattern Analysis and Applications / Issue 3/2017
Print ISSN: 1433-7541
Electronic ISSN: 1433-755X
DOI: https://doi.org/10.1007/s10044-015-0527-6

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Other articles of this Issue 3/2017

Approximation of fruit skin patterns using spherical Voronoi diagrams

Decision tree induction based on minority entropy for the class imbalance problem

Nonlinear regression applied to interval-valued data

Adaptive binarization method for degraded document images based on surface contrast variation

Discriminant error correcting output codes based on spectral clustering

Brain tumor classification from multi-modality MRI using wavelets and machine learning

Premium Partner