Abstract
This paper proposes a new large margin classifier—the structured large margin machine (SLMM)—that is sensitive to the structure of the data distribution. The SLMM approach incorporates the merits of “structured” learning models, such as radial basis function networks and Gaussian mixture models, with the advantages of “unstructured” large margin learning schemes, such as support vector machines and maxi-min margin machines. We derive the SLMM model from the concepts of “structured degree” and “homospace”, based on an analysis of existing structured and unstructured learning models. Then, by using Ward’s agglomerative hierarchical clustering on input data (or data mappings in the kernel space) to extract the underlying data structure, we formulate SLMM training as a sequential second order cone programming. Many promising features of the SLMM approach are illustrated, including its accuracy, scalability, extensibility, and noise tolerance. We also demonstrate the theoretical importance of the SLMM model by showing that it generalizes existing approaches, such as SVMs and M4s, provides novel insight into learning models, and lays a foundation for conceiving other “structured” classifiers.
Article PDF
Similar content being viewed by others
References
Andersen, E. D., & Andersen, A. D. (2001). The MOSEK interior point optimizer for linear programming: An implementation of the homogeneous algorithm. In High performance optimization (pp. 197–232). Dordrecht: Kluwer Academic.
Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121–167.
Christodoulou, C., & Pattichis, C. (1999). Unsupervised pattern recognition for the classification of EMG signals. IEEE Transactions on Biomedical Engineering, 46(2), 169–178.
Devroye, L., Gyorfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Berlin: Springer.
Duda, R., Hart, P., & Stork, D. G. (2001). Pattern classification (2nd edn.). New York: Wiley.
El-Hamdouchi, A., & Willett, P. (1989). Comparison of hierarchic agglomerative clustering methods of document retrieval. The Computer Journal, 32(3), 220–227.
Everitt, S., Landau, S., & Leese, M. (2001). Cluster analysis. London: Hodder.
Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Girolami, M. (2002). Mercer kernel-based clustering in feature space. IEEE Transactions on Neural Networks, 13(3), 780–784.
Haykin, S. (1999). Neural networks: a comprehensive foundation. New Jersey: Prentice Hall.
Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.
Huang, K., Yang, H., King, I., & Lyu, M. R. (2004a). Learning large margin classifiers locally and globally. In: Proceedings of the 21st international conference on machine learning (ICML)’04, Banff, Canada.
Huang, K., Yang, H., King, I., Lyu, M. R., & Chan, L. (2004b). Minimum error minimax probability machine. Journal of Machine Learning Research, 5, 1253–1286.
Jain, A. K., & Dubes, R. (1988). Algorithms for clustering data. New Jersey: Prentice Hall.
Lanckriet, G. R. G., Ghaoui, L. E., Bhattacharyya, C., & Jordan, M. I. (2002). A robust minimax approach to classification. Journal of Machine Learning Research, 3, 555–582.
Lobo, M., Vandenberghe, L., Boyd, S., & Lebret, H. (1998). Applications of second-order cone programming. Linear Algebra and its Applications, 284, 193–228.
Mukherjee, B., Heberlein, L., & Levitt, K. (1994). Network intrusion detection. IEEE Transactions on Neural Networks, 8(3), 26–41.
Osuna, E., Freund, R., & Girosi, F. (1997). Training support vector machines: an application to face detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR)’97 (pp. 130–136).
Platt, J., Cristianini, N., & Shawe-Taylor, J. (2000). Large margin DAGS for multiclass classification. Advances in Neural information Processing Systems, 12, 547–553.
Rätsch, G., Onoda, T., & Müller, K. R. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.
Salvador, S., & Chan, P. (2004). Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In Proceedings of the 16th IEEE international conference on tools with AI (pp. 576–584).
Schölkopf, B., Mika, S., Burges, C., Knirsch, P., Müller, K.-R., Rätsch, G., & Smola, A. (1999). Input space vs. feature space in kernel-based methods. IEEE Transactions on Neural Networks, 10(5), 1000–1017.
Schölkopf, B., Smola, A., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10, 1299–1319.
Smola, A., Bartlett, P., Schölkopf, B., & Schuurmans, D. (2000). Advances in large margin classifiers. Cambridge: MIT.
Sturm, J. (1999). Using Sedumi 1.02, a Matlab toolbox for optimization over symmetric cones. Optimization Methods and Software, 11, 625–653.
Vapnik, V. N. (1999). The nature of statistical learning theory. New York: Springer.
Veeramachaneni, S., & Nagy, G. (2005). Style context with second-order statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(1), 14–22.
Wang, D., Yeung, D. S., & Tsang, E. (2005). Sample reduction for SVMs via data structure analysis. In Proceedings of the IEEE international conference on system, man, and cybernetics SMC’05 (pp. 1030–1035), Hawaii, USA.
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Dale Schuurmans.
This work was supported by the Hong Kong Research Grant Council under Grants G-T891 and B-Q519.
Rights and permissions
About this article
Cite this article
Yeung, D.S., Wang, D., Ng, W.W.Y. et al. Structured large margin machines: sensitive to data distributions. Mach Learn 68, 171–200 (2007). https://doi.org/10.1007/s10994-007-5015-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-007-5015-9