nach oben

Data Mining and Knowledge Discovery

Erschienen in:

16.09.2017

An improvement of the parameterized frequent directions algorithm

verfasst von: Deena P. Francis, Kumudha Raimond

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 2/2018

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Matrix sketching is a technique used to create summaries of large matrices. Frequent directions (FD) and its parameterized variant, \(\alpha \)-FD are deterministic sketching techniques that have theoretical guarantees and also work well in practice. An algorithm called the iterative singular value decomposition (iSVD) has been shown to have better performance than FD and \(\alpha \)-FD in several datasets, despite the lack of theoretical guarantees. However, in datasets with major and sudden drift, iSVD performs poorly when compared to the other algorithms. The \(\alpha \)-FD algorithm has better error guarantees and empirical performance when compared to FD. However, it has two limitations: the restriction on the effective values of its parameter \(\alpha \) due to its dependence on sketch size and its constant factor reduction from selected squared singular values, both of which result in reduced empirical performance. In this paper, we present a modified parameterized FD algorithm, \(\beta \)-FD in order to overcome the limitations of \(\alpha \)-FD, while maintaining similar error guarantees to that of \(\alpha \)-FD. Empirical results on datasets with sudden and major drift and those with gradual and minor or no drift indicate that there is a trade-off between the errors in both kinds of data for different parameter values, and for \(\beta \approx 28\), our algorithm has overall better error performance than \(\alpha \)-FD.

Vorheriger Artikel Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking

Nächster Artikel Robust graph regularized nonnegative matrix factorization for clustering

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J. ACM (JACM) 54(2):9MathSciNetCrossRefMATH

Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: ESANN

Boutsidis C, Mahoney MW, Drineas P (2009) An improved approximation algorithm for the column subset selection problem. In: Proceedings of the twentieth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 968–977

Brand M (2002) Incremental singular value decomposition of uncertain data with missing values. In: European conference on computer vision. Springer, Berlin, pp 707–720

Buss S (2016) Connectus data set Florida sparse matrix collection. http://www.cise.ufl.edu/research/sparse/matrices/Buss/connectus.html

Clarkson KL, Woodruff DP (2013) Low rank approximation and regression in input sparsity time. In: Proceedings of the forty-fifth annual ACM symposium on theory of computing. ACM, New York, pp 81–90

Cuturi M (2011) Fast global alignment kernels. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 929–936

Desai A, Ghashami M, Phillips JM (2016) Improved practical matrix sketching with guarantees. IEEE Trans Knowl Data Eng 28(7):1678–1690CrossRefMATH

Drineas P, Kannan R, Mahoney MW (2006) Fast monte carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM J Comput 36(1):158–183MathSciNetCrossRefMATH

Ghashami M, Phillips JM (2014) Relative errors for deterministic low-rank matrix approximations. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, pp 707–717

Ghashami M, Desai A, Phillips JM (2014) Improved practical matrix sketching with guarantees. In: European symposium on algorithms. Springer, Berlin, pp 467–479

Ghashami M, Liberty E, Phillips JM, Woodruff DP (2016) Frequent directions: simple and deterministic matrix sketching. SIAM J Comput 45(5):1762–1792MathSciNetCrossRefMATH

Hall PM, Marshall AD, Martin RR (1998) Incremental eigenanalysis for classification. In: BMVC, vol 98. Citeseer, pp 286–295

Har-Peled S (2014) Low rank matrix approximation in linear time. arXiv preprint arXiv:1410.8802

Hoens TR, Chawla NV, Polikar R (2011) Heuristic updatable weighted random subspaces for non-stationary environments. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, Washington, pp 241–250

Katakis I, Tsoumakas G, Vlahavas IP (2008) An ensemble of classifiers for coping with recurring contexts in data streams. In: ECAI, pp 763–764

Katakis I, Tsoumakas G, Vlahavas I (2010) Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowl Inf Syst 22(3):371–391CrossRef

Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22(1):79–86MathSciNetCrossRefMATH

Lecun Y, Cortes C (2009) The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/

Levey A, Lindenbaum M (2000) Sequential Karhunen–Loeve basis extraction and its application to images. IEEE Trans Image Process 9(8):1371–1374CrossRefMATH

Liberty E (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, London, pp 581–588

Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends Mach Learn 3(2):123–224MATH

Nelson J, Nguyên HL (2013) Osnap: faster numerical linear algebra algorithms via sparser subspace embeddings. In: 2013 IEEE 54th annual symposium on Foundations of Computer Science (FOCS). IEEE, Washington, pp 117–126

Sarlós T (2006) Improved approximation algorithms for large matrices via random projections. In: 2006 47th annual IEEE symposium on foundations of computer science (FOCS’06). IEEE, Washington, pp 143–152

Schlimmer JC, Granger RH (1986) Incremental learning from noisy data. Mach Learn 1(3):317–354

Tsymbal A (2004) The problem of concept drift: definitions and related work. Computer Science Department, Trinity College Dublin, Dublin 106(2)

Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200-2011 dataset. Tech. rep, California Institute of Technology

Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 4(30):964–994MathSciNetCrossRef

Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

Woodruff DP et al (2014) Sketching as a tool for numerical linear algebra. Found Trends Theor Comput Sci 10(1–2):1–157MathSciNetCrossRefMATH

Titel: An improvement of the parameterized frequent directions algorithm
verfasst von: Deena P. Francis
Kumudha Raimond
Publikationsdatum: 16.09.2017
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 2/2018
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-017-0542-x

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 2/2018

Efficient geometric-based computation of the string subsequence kernel

Fast analytical methods for finding significant labeled graph motifs

Exploiting the roles of aspects in personalized POI recommender systems

Do your friends make you buy this brand?

Robust graph regularized nonnegative matrix factorization for clustering

Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking

Premium Partner