Skip to main content
Erschienen in:
Buchtitelbild

2015 | OriginalPaper | Buchkapitel

Random Local SVMs for Classifying Large Datasets

verfasst von : Thanh-Nghi Do, François Poulet

Erschienen in: Future Data and Security Engineering

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We propose a new parallel ensemble learning algorithm of random local support vector machines, called krSVM for the effectively non-linear classification of large datasets. The random local SVM in the krSVM learning strategy uses kmeans algorithm to partition the data into k clusters, followed which it constructs a non-linear SVM in each cluster to classify the data locally in the parallel way on multi-core computers. The krSVM algorithm is faster than the standard SVM in the non-linear classification of large datasets while maintaining the classification correctness. The numerical test results on 4 datasets from UCI repository and 3 benchmarks of handwritten letters recognition showed that our proposed algorithm is efficient compared to the standard SVM.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
It must be noted that the complexity of the kSVM approach does not include the kmeans clustering used to partition the full dataset. But this step requires insignificant time compared with the quadratic programming solution.
 
2
Two classifiers are diverse if they make different errors on new data points [18].
 
Literatur
1.
3.
Zurück zum Zitat MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press 1, pp. 281–297, January 1967 MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press 1, pp. 281–297, January 1967
4.
Zurück zum Zitat Asuncion, A., Newman, D.: UCI repository of machine learning databases (2007) Asuncion, A., Newman, D.: UCI repository of machine learning databases (2007)
5.
Zurück zum Zitat LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRef LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRef
6.
Zurück zum Zitat LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRef
8.
Zurück zum Zitat Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, New York (2000)CrossRefMATH Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, New York (2000)CrossRefMATH
9.
Zurück zum Zitat Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods Support Vector Learning, pp. 185–208. MTT Press, Cambridge (1999) Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods Support Vector Learning, pp. 185–208. MTT Press, Cambridge (1999)
10.
Zurück zum Zitat Do, T.-N.: Non-linear classification of massive datasets with a parallel algorithm of local support vector machines. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering. AISC, vol. 358, pp. 231–241. Springer, Heidelberg (2015) CrossRef Do, T.-N.: Non-linear classification of massive datasets with a parallel algorithm of local support vector machines. In: Le Thi, H.A., Nguyen, N.T., Do, T.V. (eds.) Advanced Computational Methods for Knowledge Engineering. AISC, vol. 358, pp. 231–241. Springer, Heidelberg (2015) CrossRef
11.
Zurück zum Zitat Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)CrossRef Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007)CrossRef
12.
Zurück zum Zitat Chang, C.C., Lin, C.J.: LIBSVM : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011)CrossRef Chang, C.C., Lin, C.J.: LIBSVM : a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(27), 1–27 (2011)CrossRef
13.
Zurück zum Zitat Vapnik, V.: Principles of risk minimization for learning theory. In: Advances in Neural Information Processing Systems 4, (NIPS Conference, Denver, Colorado, USA, December 2–5, 1991), pp. 831–838 (1991) Vapnik, V.: Principles of risk minimization for learning theory. In: Advances in Neural Information Processing Systems 4, (NIPS Conference, Denver, Colorado, USA, December 2–5, 1991), pp. 831–838 (1991)
14.
Zurück zum Zitat Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)CrossRef Bottou, L., Vapnik, V.: Local learning algorithms. Neural Comput. 4(6), 888–900 (1992)CrossRef
15.
Zurück zum Zitat Vapnik, V., Bottou, L.: Local algorithms for pattern recognition and dependencies estimation. Neural Comput. 5(6), 893–909 (1993)CrossRef Vapnik, V., Bottou, L.: Local algorithms for pattern recognition and dependencies estimation. Neural Comput. 5(6), 893–909 (1993)CrossRef
16.
Zurück zum Zitat Board, OpenMP Architecture Review: OpenMP application program interface version 3.0 (2008) Board, OpenMP Architecture Review: OpenMP application program interface version 3.0 (2008)
18.
Zurück zum Zitat Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000) CrossRef Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000) CrossRef
19.
Zurück zum Zitat Lin, C.: A practical guide to support vector classification (2003) Lin, C.: A practical guide to support vector classification (2003)
20.
Zurück zum Zitat Yu, H., Yang, J., Han, J.: Classifying large data sets using svms with hierarchical clusters. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 306–315. ACM (2003) Yu, H., Yang, J., Han, J.: Classifying large data sets using svms with hierarchical clusters. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 306–315. ACM (2003)
21.
Zurück zum Zitat Do, T.N., Poulet, F.: Towards high dimensional data mining with boosting of psvm and visualization tools. In: Proceedings of 6th International Conference on Entreprise Information Systems, pp. 36–41 (2004) Do, T.N., Poulet, F.: Towards high dimensional data mining with boosting of psvm and visualization tools. In: Proceedings of 6th International Conference on Entreprise Information Systems, pp. 36–41 (2004)
22.
Zurück zum Zitat Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theoryof 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992) Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theoryof 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
23.
Zurück zum Zitat Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Principe, J., Gile, L., Morgan, N., Wilson, E. (eds.) Neural Networks for Signal Processing VII, pp. 276–285 (1997) Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Principe, J., Gile, L., Morgan, N., Wilson, E. (eds.) Neural Networks for Signal Processing VII, pp. 276–285 (1997)
24.
Zurück zum Zitat Mangasarian, O., Musicant, D.: Lagrangian support vector machines. J. Mach. Learn. Res. 1, 161–177 (2001)MathSciNetMATH Mangasarian, O., Musicant, D.: Lagrangian support vector machines. J. Mach. Learn. Res. 1, 161–177 (2001)MathSciNetMATH
25.
Zurück zum Zitat Fung, G., Mangasarian, O.: Proximal support vector classifiers. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 77–86. ACM (2001) Fung, G., Mangasarian, O.: Proximal support vector classifiers. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 77–86. ACM (2001)
26.
Zurück zum Zitat Mangasarian, O.: A finite newton method for classification problems. Technical report 01–11, Data Mining Institute, Computer Sciences Department, University of Wisconsin (2001) Mangasarian, O.: A finite newton method for classification problems. Technical report 01–11, Data Mining Institute, Computer Sciences Department, University of Wisconsin (2001)
27.
Zurück zum Zitat Suykens, J., Vandewalle, J.: Least squares support vector machines classifiers. Neural Process. Lett. 9(3), 293–300 (1999)CrossRefMATH Suykens, J., Vandewalle, J.: Least squares support vector machines classifiers. Neural Process. Lett. 9(3), 293–300 (1999)CrossRefMATH
28.
Zurück zum Zitat Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the Twenty-Fourth International Conference Machine Learning, pp. 807–814. ACM (2007) Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the Twenty-Fourth International Conference Machine Learning, pp. 807–814. ACM (2007)
29.
Zurück zum Zitat Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 161–168. NIPS Foundation (2008). http://books.nips.cc Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems, vol. 20, pp. 161–168. NIPS Foundation (2008). http://​books.​nips.​cc
30.
Zurück zum Zitat Do, T.N., Poulet, F.: Incremental svm and visualization tools for bio-medical data mining. In: Proceedings of Workshop on Data Mining and Text Mining in Bioinformatics, pp. 14–19 (2003) Do, T.N., Poulet, F.: Incremental svm and visualization tools for bio-medical data mining. In: Proceedings of Workshop on Data Mining and Text Mining in Bioinformatics, pp. 14–19 (2003)
31.
Zurück zum Zitat Do, T.N., Poulet, F.: Classifying one billion data with a new distributed svm algorithm. In: Proceedings of 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66. IEEE Press (2006) Do, T.N., Poulet, F.: Classifying one billion data with a new distributed svm algorithm. In: Proceedings of 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66. IEEE Press (2006)
32.
Zurück zum Zitat Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the 2nd SIAM International Conference on Data Mining (2002) Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the 2nd SIAM International Conference on Data Mining (2002)
33.
Zurück zum Zitat Poulet, F., Do, T.N.: Mining very large datasets with support vector machine algorithms. In: Camp, O., Filipe, J., Hammoudi, S., Piattini, M. (eds.) Enterprise Information Systems V, pp. 177–184 (2004) Poulet, F., Do, T.N.: Mining very large datasets with support vector machine algorithms. In: Camp, O., Filipe, J., Hammoudi, S., Piattini, M. (eds.) Enterprise Information Systems V, pp. 177–184 (2004)
34.
Zurück zum Zitat Do, T.: Parallel multiclass stochastic gradient descent algorithms for classifying million images with very-high-dimensional signatures into thousands classes. Vietnam J. Comput. Sci. 1(2), 107–115 (2014)CrossRef Do, T.: Parallel multiclass stochastic gradient descent algorithms for classifying million images with very-high-dimensional signatures into thousands classes. Vietnam J. Comput. Sci. 1(2), 107–115 (2014)CrossRef
35.
Zurück zum Zitat Do, T.-N., Nguyen, V.-H., Poulet, F.: Speed Up SVM algorithm for massive classification tasks. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 147–157. Springer, Heidelberg (2008) CrossRef Do, T.-N., Nguyen, V.-H., Poulet, F.: Speed Up SVM algorithm for massive classification tasks. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 147–157. Springer, Heidelberg (2008) CrossRef
36.
Zurück zum Zitat Do, T.N., Poulet, F.: Mining very large datasets with svm and visualization. In: Proceedings of 7th International Conference on Entreprise Information Systems, pp. 127–134 (2005) Do, T.N., Poulet, F.: Mining very large datasets with svm and visualization. In: Proceedings of 7th International Conference on Entreprise Information Systems, pp. 127–134 (2005)
37.
Zurück zum Zitat Boley, D., Cao, D.: Training support vector machines using adaptive clustering. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B. (eds.) Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22–24, 2004, SIAM, pp. 126–137 (2004) Boley, D., Cao, D.: Training support vector machines using adaptive clustering. In: Berry, M.W., Dayal, U., Kamath, C., Skillicorn, D.B. (eds.) Proceedings of the Fourth SIAM International Conference on Data Mining, Lake Buena Vista, Florida, USA, April 22–24, 2004, SIAM, pp. 126–137 (2004)
38.
Zurück zum Zitat Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006. ACM (2000) Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006. ACM (2000)
39.
Zurück zum Zitat Pavlov, D., Mao, J., Dom, B.: Scaling-up support vector machines using boosting algorithm. In: 15th International Conference on Pattern Recognition, vol. 2, pp. 219–222 (2000) Pavlov, D., Mao, J., Dom, B.: Scaling-up support vector machines using boosting algorithm. In: 15th International Conference on Pattern Recognition, vol. 2, pp. 219–222 (2000)
40.
Zurück zum Zitat Do, T.N., Le-Thi, H.A.: Classifying large datasets with svm. In: Proceedings of 4th International Conference on Computational Management Science (2007) Do, T.N., Le-Thi, H.A.: Classifying large datasets with svm. In: Proceedings of 4th International Conference on Computational Management Science (2007)
41.
Zurück zum Zitat Do, T.N., Fekete, J.D.: Large scale classification with support vector machine algorithms. In: Wani, M.A., Kantardzic, M.M., Li, T., Liu, Y., Kurgan, L.A., Ye, J., Ogihara, M., Sagiroglu, S., Chen, X.-W., Peterson, L.E., Hafeez, K. (eds.) The Sixth International Conference on Machine Learning and Applications, ICMLA 2007, Cincinnati, Ohio, USA, 13–15 December 2007, pp. 7–12. IEEE Computer Society (2007) Do, T.N., Fekete, J.D.: Large scale classification with support vector machine algorithms. In: Wani, M.A., Kantardzic, M.M., Li, T., Liu, Y., Kurgan, L.A., Ye, J., Ogihara, M., Sagiroglu, S., Chen, X.-W., Peterson, L.E., Hafeez, K. (eds.) The Sixth International Conference on Machine Learning and Applications, ICMLA 2007, Cincinnati, Ohio, USA, 13–15 December 2007, pp. 7–12. IEEE Computer Society (2007)
42.
Zurück zum Zitat Freund, Y., Schapire, R.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(5), 771–780 (1999) Freund, Y., Schapire, R.: A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14(5), 771–780 (1999)
44.
Zurück zum Zitat Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)CrossRef Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixtures of local experts. Neural Comput. 3(1), 79–87 (1991)CrossRef
45.
Zurück zum Zitat Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc.:Ser B 39(1), 1–38 (1977)MathSciNetMATH Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the em algorithm. J. Roy. Stat. Soc.:Ser B 39(1), 1–38 (1977)MathSciNetMATH
46.
Zurück zum Zitat Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 985–992. The MIT Press (2001) Vincent, P., Bengio, Y.: K-local hyperplane and convex distance nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 985–992. The MIT Press (2001)
47.
Zurück zum Zitat Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2126–2136 (2006) Zhang, H., Berg, A., Maire, M., Malik, J.: SVM-KNN: discriminative nearest neighbor classification for visual category recognition. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 2126–2136 (2006)
48.
Zurück zum Zitat Yang, T., Kecman, V.: Adaptive local hyperplane classification. Neurocomputing 71(13–15), 3001–3004 (2008)CrossRef Yang, T., Kecman, V.: Adaptive local hyperplane classification. Neurocomputing 71(13–15), 3001–3004 (2008)CrossRef
49.
Zurück zum Zitat Segata, N., Blanzieri, E.: Fast and scalable local kernel machines. J. Mach. Learn. Res. 11, 1883–1926 (2010)MathSciNetMATH Segata, N., Blanzieri, E.: Fast and scalable local kernel machines. J. Mach. Learn. Res. 11, 1883–1926 (2010)MathSciNetMATH
50.
Zurück zum Zitat Cheng, H., Tan, P.N., Jin, R.: Efficient algorithm for localized support vector machine. IEEE Trans. Knowl. Data Eng. 22(4), 537–549 (2010)CrossRef Cheng, H., Tan, P.N., Jin, R.: Efficient algorithm for localized support vector machine. IEEE Trans. Knowl. Data Eng. 22(4), 537–549 (2010)CrossRef
51.
Zurück zum Zitat Kecman, V., Brooks, J.: Locally linear support vector machines and other local models. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–6 (2010) Kecman, V., Brooks, J.: Locally linear support vector machines and other local models. In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–6 (2010)
52.
Zurück zum Zitat Ladicky, L., Torr, P.H.S.: Locally linear support vector machines. In: Getoor, L., Scheffer, T., (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 – July 2, 2011, pp. 985–992. Omnipress (2011) Ladicky, L., Torr, P.H.S.: Locally linear support vector machines. In: Getoor, L., Scheffer, T., (eds.) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 – July 2, 2011, pp. 985–992. Omnipress (2011)
53.
Zurück zum Zitat Gu, Q., Han, J.: Clustered support vector machines. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29 – May 1, 2013, JMLR Proceedings, vol. 31, pp. 307–315 (2013) Gu, Q., Han, J.: Clustered support vector machines. In: Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2013, Scottsdale, AZ, USA, April 29 – May 1, 2013, JMLR Proceedings, vol. 31, pp. 307–315 (2013)
Metadaten
Titel
Random Local SVMs for Classifying Large Datasets
verfasst von
Thanh-Nghi Do
François Poulet
Copyright-Jahr
2015
DOI
https://doi.org/10.1007/978-3-319-26135-5_1