Skip to main content
Log in

Tree-based localized fuzzy twin support vector clustering with square loss function

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Twin support vector machine (TWSVM) is an efficient supervised learning algorithm, proposed for the classification problems. Motivated by its success, we propose Tree-based localized fuzzy twin support vector clustering (Tree-TWSVC). Tree-TWSVC is a novel clustering algorithm that builds the cluster model as a binary tree, where each node comprises of proposed TWSVM-based classifier, termed as localized fuzzy TWSVM (LF-TWSVM). The proposed clustering algorithm Tree-TWSVC has efficient learning time, achieved due to the tree structure and the formulation that leads to solving a series of systems of linear equations. Tree-TWSVC delivers good clustering accuracy because of the square loss function and it uses nearest neighbour graph based initialization method. The proposed algorithm restricts the cluster hyperplane from extending indefinitely by using cluster prototype, which further improves its accuracy. It can efficiently handle large datasets and outperforms other TWSVM-based clustering methods. In this work, we propose two implementations of Tree-TWSVC: Binary Tree-TWSVC and One-against-all Tree-TWSVC. To prove the efficacy of the proposed method, experiments are performed on a number of benchmark UCI datasets. We have also given the application of Tree-TWSVC as an image segmentation tool.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Networks 10(5):988–999

    Article  Google Scholar 

  2. Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910

    Article  MATH  Google Scholar 

  3. Khemchandani R (2008) Mathematical programming applications in machine learning, Ph.D. dissertation. Indian Institute of Technology Delhi New Delhi-110016, India

  4. Kumar MA, Gopal M (2009) Least squares twin support vector machines for pattern classification. Expert Syst Appl 36(4):7535–7543

    Article  Google Scholar 

  5. Sartakhti JS, Ghadiri N, Afrabandpey H (2015) Fuzzy least squares twin support vector machines. arXiv:1505.05451

  6. Tanveer M, Khan MA, Ho SS (2016) Robust energy-based least squares twin support vector machines. Appl Intell:1–13

  7. Shao YH, Wang Z, Chen WJ, Deng NY (2013) Least squares twin parametric-margin support vector machine for classification. Appl Intell 39(3):451–464

    Article  Google Scholar 

  8. Khemchandani R, Pal A (2016) Multi-category laplacian least squares twin support vector machine. Appl Intell 45(2):458–474

    Article  Google Scholar 

  9. Hastie T, Tibshirani R, Friedman J (2009) Unsupervised learning. Springer

  10. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recogn Lett 31(8):651–666

    Article  Google Scholar 

  11. Jain AK, Dubes RC (1998) Algorithms for clustering data. Prentice-Hall Inc.

  12. Ng AY, Michael IJ, Yair W (2002) On spectral clustering: Analysis and an algorithm. Adv Neural Inf Proces Syst:849–856

  13. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22 (8):888–905

    Article  Google Scholar 

  14. Wu W, Xiong H, Shekhar S (eds.) (2013) Clustering and information retrieval (Vol. 11) Springer Science and Business Media

  15. Moon TK (1996) The expectation-maximization algorithm. IEEE Signal Process Mag 13(6):47–60

    Article  Google Scholar 

  16. Al-Harbi SH, Rayward-Smith VJ (2006) Adapting k-means for supervised clustering. Appl Intell 24 (3):219–226

    Article  Google Scholar 

  17. Bradley PS, Mangasarian OL (2000) k-plane clustering. J Global Optim 16(1):23–32

    Article  MathSciNet  MATH  Google Scholar 

  18. Yang ZM, Guo YR, Li CN, Shao YH (2015) Local k-proximal plane clustering. Neural Comput & Applic 26(1):199–211

    Article  Google Scholar 

  19. Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin clustering. Adv Neural Inf Proces Syst 17:1537–1544

    Google Scholar 

  20. Valizadegan H, Jin R (2006) Generalized maximum margin clustering and unsupervised kernel learning. Adv Neural Inf Proces Syst:1417–1424

  21. Boyd S, Vandenberghe L (2004) Convex Optimization. Cambridge university press

  22. Lobo MS, Vandenberghe L, Boyd S (1998) Applications of second-order cone programming. Linear Algebra Appl 284(1):193–228

    Article  MathSciNet  MATH  Google Scholar 

  23. Zhang K, Tsang IW, Kwok JT (2009) Maximum margin clustering made practical. IEEE Trans Neural Networks 20(4):583–596

    Article  Google Scholar 

  24. Wang Z, Shao YH, Bai L, Deng NY (2015) Twin support vector machine for clustering. IEEE Transactions Neural Networks and Learning Systems 26(10):2583–2588

    Article  MathSciNet  Google Scholar 

  25. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Networks 13(2):415–425

    Article  Google Scholar 

  26. Khemchandani R, Pal A (2016) Fuzzy least squares twin support vector clustering. Accepted by Neural computing and applications

  27. Mangasarian OL (1993) Nonlinear programming. SIAM 10

  28. Yuille AL, Rangarajan A (2003) The concave-convex procedure. Neural Comput 15(4):915–936

    Article  MATH  Google Scholar 

  29. Smola AJ, Schol̇kopf B (1998) Learning with kernels. Citeseer

  30. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  MATH  Google Scholar 

  31. Gunn SR (1998) Support vector machines for classification and regression. ISIS technical report 14

  32. Larose DT (2005) k-nearest neighbor algorithm. Discovering Knowledge in Data: An Introduction to Data Mining: 90-106

  33. Huttenlocher DP, Klanderman GA, Rucklidge WJ (1993) Comparing images using the hausdorff distance. IEEE Trans Pattern Anal Mach Intell 15(9):850–863

    Article  Google Scholar 

  34. Cormen TH (2009) Introduction to algorithms. MIT press

  35. Wang X, Wang Y, Wang L (2004) Improving fuzzy c-means clustering based on feature-weight learning. Pattern Recogn Lett 25(10):1123–1132

    Article  Google Scholar 

  36. Blake C, Merz CJ (1998) Uci repository of machine learning databases. Available: www.ics.uci.edu

  37. Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification

  38. Tan PN, Steinbach m, Kumar V (2005) Introduction to data mining. Addison-Wesley

  39. Alzate C, Suykens JAK (2010) Multiway spectral clustering with out-of-sample extensions through weighted kernel PCA. IEEE Trans Pattern Anal Mach Intell 32(2):335–347

    Article  Google Scholar 

  40. Alzate C, Suykens JAK (2011) Out-of-sample eigenvectors in kernel spectral clustering. In: The 2011 International Joint Conference on Neural Networks (IJCNN). IEEE, pp 2349–2356

  41. Arbelaez P, Fowlkes C, Martin D (2007) The berkeley segmentation dataset and benchmark. http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds

  42. Khan JF, Adhami RR, Bhuiyan SM (2009) A customized gabor filter for unsupervised color image segmentation. Image Vis Comput 27(4):489–501

    Article  Google Scholar 

  43. Mehrkanoon S, Alzate C, Mall R, Langone R, Suykens JAK (2015) Multiclass semi-supervised learning based upon kernel spectral clustering. IEEE Transactions on Neural Networks and Learning Systems 26 (4):720–733

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

We would like to take this opportunity to thank Dr.Suresh Chandra, for his guidance and constant encouragement throughout the preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reshma Rastogi.

Appendix A: Loss function of TWSVM

Appendix A: Loss function of TWSVM

TWSVM uses hinge loss function which is given by

\(L_{h}=\left \{ \begin {array}{ll} 0, & y_{i}f_{i} \geq 1 \\ 1-y_{i}f_{i}, & otherwise \end {array} \right . \)

For any SVM or TWSVM based clustering method, the clustering error or the hyperplanes change little after initial labeling or during subsequent iterations. This arises due to hinge loss function, as shown in Fig. 6a, where the classifier tries to push y i f i to the point beyond y i f i =1 (towards right) [23]. Here, solid line shows loss with initial labels and dotted line shows loss after flipping of labels. As observed from the empirical margin distribution of y i f i , most of the patterns have margins y i f i ≫1. If the label of a pattern is changed, the loss will be very large and the classifier is unwilling to flip the class labels. So, the procedure gets stuck in a local optimum and adheres to the initial label estimates. To prevent the premature convergence of the iterative procedure, the loss function is changed to square loss and is given as L s =(1−y i f i )2. This loss function is symmetric around y i f i =1, as shown in Fig. 6b and penalizes preliminary wrong predictions. Therefore, it permits flipping of labels if needed and leads to a significant improvement in the clustering performance.

Fig. 6
figure 6

Flipping of labels. a. Hinge loss function; b. Square loss function

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rastogi, R., Saigal, P. Tree-based localized fuzzy twin support vector clustering with square loss function. Appl Intell 47, 96–113 (2017). https://doi.org/10.1007/s10489-016-0886-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-016-0886-8

Keywords

Navigation