Skip to main content

2015 | OriginalPaper | Buchkapitel

Massive Classification with Support Vector Machines

verfasst von : Thanh Nghi Do, Hoai An Le Thi

Erschienen in: Transactions on Computational Collective Intelligence XVIII

Verlag: Springer Berlin Heidelberg

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The new boosting of Least-Squares SVM (LS-SVM), Proximal SVM (PSVM), Newton SVM (NSVM) algorithms aim at classifying very large datasets on standard personal computers (PCs). We extend the PSVM, LS-SVM and NSVM in several ways to efficiently classify large datasets. We developed a row incremental version for datasets with billions of data points. By adding a Tikhonov regularization term and using the Sherman-Morrison-Woodbury formula, we developed new algorihms to process datasets with a small number of data points but very high dimensionality. Finally, by applying boosting including AdaBoost and Arcx4 to these algorithms, we developed classification algorithms for massive, very-high-dimensional datasets. Numerical test results on large datasets from the UCI repository showed that our algorithms are often significantly faster and/or more accurate than state-of-the-art algorithms LibSVM, CB-SVM, SVM-perf and LIBLINEAR.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
3.
Zurück zum Zitat Fayyad, U., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel - data mining: the next 10 years. SIGKDD Explor. 5(2), 191–196 (2004)CrossRef Fayyad, U., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel - data mining: the next 10 years. SIGKDD Explor. 5(2), 191–196 (2004)CrossRef
5.
Zurück zum Zitat Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theoryof 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992) Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theoryof 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
6.
Zurück zum Zitat Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Principe, J., Gile, L., Morgan, N., Wilson, E., (eds.) Neural Networks for Signal Processing VII, pp. 276–285 (1997) Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Principe, J., Gile, L., Morgan, N., Wilson, E., (eds.) Neural Networks for Signal Processing VII, pp. 276–285 (1997)
7.
Zurück zum Zitat Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999) Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
8.
Zurück zum Zitat Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. Adv. Neural Inf. Process. Syst. 13, 409–415 (2001) Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. Adv. Neural Inf. Process. Syst. 13, 409–415 (2001)
9.
Zurück zum Zitat Do, T.N., Poulet, F.: Incremental svm and visualization tools for bio-medical data mining. In: Proceedings of Workshop on Data Mining and Text Mining in Bioinformatics, pp. 14–19 (2003) Do, T.N., Poulet, F.: Incremental svm and visualization tools for bio-medical data mining. In: Proceedings of Workshop on Data Mining and Text Mining in Bioinformatics, pp. 14–19 (2003)
10.
Zurück zum Zitat Do, T.N., Poulet, F.: Towards high dimensional data mining with boosting of psvm and visualization tools. In: Proceedings of 6th International Conference on Entreprise Information Systems, pp. 36–41 (2004) Do, T.N., Poulet, F.: Towards high dimensional data mining with boosting of psvm and visualization tools. In: Proceedings of 6th International Conference on Entreprise Information Systems, pp. 36–41 (2004)
11.
Zurück zum Zitat Do, T.N., Poulet, F.: Classifying one billion data with a new distributed svm algorithm. In: Proceedings of 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66. IEEE Press (2006) Do, T.N., Poulet, F.: Classifying one billion data with a new distributed svm algorithm. In: Proceedings of 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66. IEEE Press (2006)
12.
Zurück zum Zitat Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the 2nd SIAM International Conference on Data Mining (2002) Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the 2nd SIAM International Conference on Data Mining (2002)
13.
Zurück zum Zitat Poulet, F., Do, T.N.: Mining very large datasets with support vector machine algorithms. In: Camp, O., Filipe, J., Hammoudi, S., Piattini, M., et al. (eds.) Enterprise Information Systems V, pp. 177–184. Kluwer Academic Publishers, Dordrecht (2004) Poulet, F., Do, T.N.: Mining very large datasets with support vector machine algorithms. In: Camp, O., Filipe, J., Hammoudi, S., Piattini, M., et al. (eds.) Enterprise Information Systems V, pp. 177–184. Kluwer Academic Publishers, Dordrecht (2004)
14.
Zurück zum Zitat Do, T.N., Le-Thi, H.A.: Classifying large datasets with svm. In: Proceedings of 4th International Conference on Computational Management Science (2007) Do, T.N., Le-Thi, H.A.: Classifying large datasets with svm. In: Proceedings of 4th International Conference on Computational Management Science (2007)
15.
Zurück zum Zitat Syed, N., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceedings of the ACM SIGKDD International Conference on KDD. ACM (1999) Syed, N., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceedings of the ACM SIGKDD International Conference on KDD. ACM (1999)
16.
Zurück zum Zitat Do, T.N., Poulet, F.: Mining very large datasets with svm and visualization. In: Proceedings of 7th International Conference on Entreprise Information Systems, pp. 127–134 (2005) Do, T.N., Poulet, F.: Mining very large datasets with svm and visualization. In: Proceedings of 7th International Conference on Entreprise Information Systems, pp. 127–134 (2005)
17.
Zurück zum Zitat Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006. ACM (2000) Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006. ACM (2000)
18.
Zurück zum Zitat Suykens, J., Vandewalle, J.: Least squares support vector machines classifiers. Neural Process. Lett. 9(3), 293–300 (1999)MathSciNetCrossRef Suykens, J., Vandewalle, J.: Least squares support vector machines classifiers. Neural Process. Lett. 9(3), 293–300 (1999)MathSciNetCrossRef
19.
Zurück zum Zitat Fung, G., Mangasarian, O.: Proximal support vector classifiers. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 77–86. ACM (2001) Fung, G., Mangasarian, O.: Proximal support vector classifiers. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 77–86. ACM (2001)
20.
Zurück zum Zitat Mangasarian, O.: A finite newton method for classification problems. Technical report 01–11, Data Mining Institute, Computer Sciences Department, University of Wisconsin (2001) Mangasarian, O.: A finite newton method for classification problems. Technical report 01–11, Data Mining Institute, Computer Sciences Department, University of Wisconsin (2001)
21.
Zurück zum Zitat Tikhonov, A.N.: On the stability of inverse problems. Dokl Akad. Nauk SSSR 39(5), 195–198 (1943) Tikhonov, A.N.: On the stability of inverse problems. Dokl Akad. Nauk SSSR 39(5), 195–198 (1943)
22.
Zurück zum Zitat Golub, G., Loan, C.V.: Matrix Computations, 3rd edn. John Hopkins University Press, Baltimore (1996)MATH Golub, G., Loan, C.V.: Matrix Computations, 3rd edn. John Hopkins University Press, Baltimore (1996)MATH
23.
Zurück zum Zitat Rao, C.: Linear Statistical Inference and Its applications. Wiley, New York (1965)MATH Rao, C.: Linear Statistical Inference and Its applications. Wiley, New York (1965)MATH
24.
Zurück zum Zitat Freund, Y., Schapire, R.: A short introduction to boosting. J. Japan. Soc. Artif. Intell. 14(5), 771–780 (1999) Freund, Y., Schapire, R.: A short introduction to boosting. J. Japan. Soc. Artif. Intell. 14(5), 771–780 (1999)
26.
Zurück zum Zitat Mangasarian, O., Musicant, D.: Lagrangian support vector machines. J. Mach. Learn. Res. 1, 161–177 (2001)MathSciNetMATH Mangasarian, O., Musicant, D.: Lagrangian support vector machines. J. Mach. Learn. Res. 1, 161–177 (2001)MathSciNetMATH
30.
Zurück zum Zitat Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)MATH Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)MATH
31.
Zurück zum Zitat Joachims, T.: Training linear svms in linear time. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 217–226. ACM (2006) Joachims, T.: Training linear svms in linear time. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 217–226. ACM (2006)
32.
Zurück zum Zitat Yu, H., Yang, J., Han, J.: Classifying large data sets using svms with hierarchical clusters. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 306–315. ACM (2003) Yu, H., Yang, J., Han, J.: Classifying large data sets using svms with hierarchical clusters. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 306–315. ACM (2003)
33.
Zurück zum Zitat Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)CrossRef Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)CrossRef
34.
Zurück zum Zitat Reyzin, L., Schapire, R.: How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 753–760. ACM (2006) Reyzin, L., Schapire, R.: How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 753–760. ACM (2006)
35.
Zurück zum Zitat Dongarra, J., Pozo, R., Walker, D.: LAPACK++: a design overview of object-oriented extensions for high performance linear algebra. In: Proceedings of Supercomputing, pp. 162–171 (1993) Dongarra, J., Pozo, R., Walker, D.: LAPACK++: a design overview of object-oriented extensions for high performance linear algebra. In: Proceedings of Supercomputing, pp. 162–171 (1993)
Metadaten
Titel
Massive Classification with Support Vector Machines
verfasst von
Thanh Nghi Do
Hoai An Le Thi
Copyright-Jahr
2015
Verlag
Springer Berlin Heidelberg
DOI
https://doi.org/10.1007/978-3-662-48145-5_8