Top

Soft Computing

Published in:

08-03-2021 | Methodologies and Application

A distributed ensemble of relevance vector machines for large-scale data sets on Spark

Authors: Wangchen Qin, Fang Liu, Mi Tong, Zhengying Li

Published in: Soft Computing | Issue 10/2021

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

The relevance vector machine (RVM) is a machine learning algorithm based on sparse Bayesian theory that shows good classification performance for small-scale data sets. However, due to the high runtime complexity \(O\left( n^{3}\right) \) and space complexity \(O\left( n^{2}\right) \) of the RVM, it is difficult to train models for medium-sized or large-scale data sets. Therefore, a distributed ensemble of relevance vector machines on the Spark framework (DE-RVM) is proposed. In this approach, a data set is divided into a number of disjoint subsets of data, and on each subset, a set of RVM classifiers are trained using the AdaBoostRVM based on sample type (STAB-RVM) according to the concept of ensemble learning. A final classifier is generated by the combination method with a diversity measure for the RVM classifiers. The smallest empirical loss of the combinatorial classifier is obtained in the quadratic programming problem. The algorithm was applied to both artificial data sets and real data sets. The experimental results show that the proposed method offers good classification performance and can effectively improve the ability of the RVM to process large-scale data sets.

previous article Optimization in business strategy as a part of sustainable economic growth using clique covering of fuzzy graphs

next article A class of line search-type methods for nonsmooth convex regularized minimization

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

http://archive.ics.uci.edu/ml/datasets.html.

http://scikit-learn.org/

Bacardit J, Llorà X (2013) Large-scale data mining using genetics-based machine learning. Wiley Interdiscip Rev: Data Min Knowl Discov 3(1):37–61

Barddal JP, Barddal JP, Bifet A (2017) A survey on ensemble learning for data stream classification. ACM Comput Surv 50(2):23

Bechini A, Marcelloni F, Segatori A (2016) A MapReduce solution for associative classification of big data. Inf Sci 332:33–55CrossRef

Bi Y (2012) The impact of diversity on the accuracy of evidential classifier ensembles. Int J Approx Reason 53(4):584–607MathSciNetCrossRef

Candela JQ, Hansen LK (2004) Learning with uncertainty-Gaussian processes and relevance vector machines (Doctoral dissertation, unknown)

Choi TM, Chan HK, Yue X (2017) Recent development in big data analytics for business operations and risk management. IEEE Trans Cybern 47(1):81–92CrossRef

Csató L, Opper M (2002) Sparse on-line Gaussian processes. Neural Comput 14(3):641–668CrossRef

Dean J, Ghemawat S (2010) MapReduce: a flexible data processing tool. Commun ACM 53(1):72–77CrossRef

Dong C, Tian L (2012) Accelerating relevance-vector-machine-based classification of hyperspectral image with parallel computing. Math Problems Eng

Grolinger K, Hayes M, Higashino W A, L’Heureux A, Allison DS, Capretz MA (2014) Challenges for mapreduce in big data. In: IEEE world congress on services (SERVICES). IEEE, pp 182–189

Koh JL, Chen CC, Chan CY, Chen AL (2017) MapReduce skyline query processing with partitioning and distributed dominance tests. Inf Sci 375:114–137CrossRef

Krogh A, Vedelsby J (1995) Neural network ensembles, cross validation, and active learning. In: Advances in neural information processing systems, pp 231–238

Kumar A, Shankar R, Choudhary A, Thakur LS (2016) A big data MapReduce framework for fault diagnosis in cloud-based manufacturing. Int J Prod Res 54(23):7060–7073CrossRef

Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach Learn 51(2):181–207CrossRef

Lei Y, Ding X, Wang S (2008) Visual tracker using sequential bayesian learning: discriminative, generative, and hybrid. IEEE Trans Syst Man Cybern Part B (Cybern) 38(6):1578–1591CrossRef

Li X, Wang L, Sung E (2008) AdaBoost with SVM-based component classifiers. Eng Appl Artif Intell 21(5):785–795CrossRef

Low Y, Gonzalez JE, Kyrola A, Bickson D, Guestrin CE, Hellerstein J (2014) Graphlab: a new framework for parallel machine learning. arXiv:1408.2041

Meng X, Bradley J, Yavuz B, Sparks E, Venkataraman S, Liu D, Xin D (2016) Mllib: Machine learning in apache spark. J Mach Learn Res 17(1):1235–1241MathSciNetMATH

Palit I, Reddy CK (2012) Scalable and parallel boosting with mapreduce. IEEE Trans Knowl Data Eng 24(10):1904–1916CrossRef

Seeger M, Williams C, Lawrence N (2003) Fast forward selection to speed up sparse Gaussian process regression. In: Artificial intelligence and statistics 9 (No. EPFL-CONF-161318)

Silva C, Ribeiro B (2008) Towards expanding relevance vector machines to large scale datasets. Int J Neural Syst 18(01):45–58CrossRef

Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: IEEE 26th symposium on mass storage systems and technologies (MSST). IEEE, pp 1–10

Smola AJ, Bartlett PL (2001) Sparse greedy Gaussian process regression. In: Advances in neural information processing systems, pp 619–625

Tang EK, Suganthan PN, Yao X (2006) An analysis of diversity measures. Mach Learn 65(1):247–271CrossRef

Tashk ARB, Sayadiyan A, Valiollahzadeh S (2007) Face detection using adaboosted RVM-based component classifier. In: 5th International symposium on image and signal processing and analysis, ISPA 2007. IEEE, pp 351–355

Tipping ME (2001) Sparse Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244MathSciNetMATH

Tipping ME, Faul AC (2003) Fast marginal likelihood maximisation for sparse Bayesian models. In: AISTATS

Yang D, Liang G, Jenkins DD, Peterson GD, Li H (2010) High performance relevance vector machine on GPUs. In: Symposium on application accelerators in high performance computing

Yu Y, Li YF, Zhou ZH (2011) July) Diversity regularized machine. IJCAI Proc Int Joint Conf Artif Intell 22(1):1603

Zaharia M, Chowdhury M, Das T, Dave A, Ma J, Mccauley M (2012) Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Usenix conference on networked systems design and implementation, vol 70. USENIX Association, p 2

Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Ghodsi A (2016) Apache Spark: a unified engine for big data processing. Commun ACM 59(11):56–65CrossRef

Title: A distributed ensemble of relevance vector machines for large-scale data sets on Spark
Authors: Wangchen Qin
Fang Liu
Mi Tong
Zhengying Li
Publication date: 08-03-2021
Publisher: Springer Berlin Heidelberg
Published in: Soft Computing / Issue 10/2021
Print ISSN: 1432-7643
Electronic ISSN: 1433-7479
DOI: https://doi.org/10.1007/s00500-021-05671-y

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Other articles of this Issue 10/2021

Topological properties of locally finite covering rough sets and K-topological rough set structures

A class of line search-type methods for nonsmooth convex regularized minimization

Optimization in business strategy as a part of sustainable economic growth using clique covering of fuzzy graphs

Correction to: Generalized interval-valued picture fuzzy linguistic induced hybrid operator and TOPSIS method for linguistic group decision-making

Multi-UAV reconnaissance task allocation for heterogeneous targets using grouping ant colony optimization algorithm

An approach based on combining Choquet integral and TOPSIS methods to uncertain MAGDM problems

Premium Partner