nach oben

Data Mining and Knowledge Discovery

Erschienen in:

01.11.2015

Learning sequential classifiers from long and noisy discrete-event sequences efficiently

verfasst von: Gessé Dafé, Adriano Veloso, Mohammed Zaki, Wagner Meira Jr.

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 6/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

A variety of applications, such as information extraction, intrusion detection and protein fold recognition, can be expressed as sequences of discrete events or elements (rather than unordered sets of features), that is, there is an order dependence among the elements composing each data instance. These applications may be modeled as classification problems, and in this case the classifier should exploit sequential interactions among the elements, so that the ordering relationship among them is properly captured. Dominant approaches to this problem include: (i) learning Hidden Markov Models, (ii) exploiting frequent sequences extracted from the data and (iii) computing string kernels. Such approaches, however, are computationally hard and vulnerable to noise, especially if the data shows long range dependencies (i.e., long subsequences are necessary in order to model the data). In this paper we provide simple algorithms that build highly effective sequential classifiers. Our algorithms are based on enumerating approximately contiguous subsequences from the training set on a demand-driven basis, exploiting a lightweight and flexible subsequence matching function and an innovative subsequence enumeration strategy called pattern silhouettes, making our learning algorithms fast and the corresponding classifiers robust to noisy data. Our empirical results on a variety of datasets indicate that the best trade-off between accuracy and learning time is usually obtained by limiting the length of the subsequences by a factor of \(\log {n}\), which leads to a \(O(n\log {n})\) learning cost (where \(n\) is the length of the sequence being classified). Finally, we show that, in most of the cases, our classifiers are faster than existing solutions (sometimes, by orders of magnitude), also providing significant accuracy improvements in most of the evaluated cases.

Vorheriger Artikel Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization

Nächster Artikel A general framework for predictive tensor modeling with domain knowledge

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

A similar trend is observed for SC-SC algorithm.

The reason we include the LAC algorithm (which does not exploit adjacency information) as a baseline, is to evaluate the possible benefits of exploiting adjacency information while producing the classifier.

Agrawal R, Srikant R (1995) Mining sequential patterns. In: ICDE, pp 3–14

Bannister W (2007) Associative and sequential classification with adaptive constrained regression methods. PhD thesis, Tempe, AZ, USA

Baum L, Petrie T (1966) Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat 37(6):1554–1563MathSciNetCrossRefMATH

Baum L, Petrie T, Soules G, Weiss N (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41:164MathSciNetCrossRefMATH

Bicego M, Murino V, Figueiredo M (2003a) A sequential pruning strategy for the selection of the number of states in hidden Markov models. Pattern Recognit Lett 24(9–10):1395–1407CrossRefMATH

Bicego M, Murino V, Figueiredo M (2004) Similarity-based classification of sequences using hidden Markov models. Pattern Recognit 37(12):2281–2291CrossRef

Bicego M, Murino V, Pelillo M, Torsello A (2006) Similarity-based pattern recognition. Pattern Recognit 39(10):1813–1814CrossRef

Bühlmann P, Wyner A (1999) Variable length Markov chains. Ann Stat 27(2):480–513CrossRefMATH

Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27, software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

Davis A, Veloso A, da Silva A, Laender A, Meira W Jr (2012) Named entity disambiguation in streaming data. In: ACL, pp 815–824

Deshpande M, Karypis G (2004) Selective Markov models for predicting web page accesses. ACM Trans Internet Technol 4(2):163–184CrossRef

Durbin R, Eddy AKS, Mitchison G (1998) Biological sequence analysis. Cambridge University Press

Durbin R, Eddy S, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press

Du Preez J (1998) Efficient training of high-order hidden Markov models using first-order representations. Comput Speech Lang 12(1):23–39CrossRef

Eddy S (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763CrossRef

Galassi U, Giordana A, Saitta L (2007) Incremental construction of structured hidden Markov models. In: IJCAI, pp 798–803

Golding A, Roth D (1996) Applying winnow to context-sensitive spelling correction. CoRR

Han H, Giles C, Zha H, Li C, Tsioutsiouliklis K (2004) Two supervised learning approaches for name disambiguation in author citations. In: JCDL, pp 296–305

Han H, Zha H, Giles C (2005) Name disambiguation in author citations using a k-way spectral clustering method. In: JCDL, pp 334–343

Haussler D (1999) Convolution kernels on discrete structures. Tech. rep,Technical report, UC Santa Cruz

Hu J, Brown M, Turin W (1996) Hmm based on-line handwriting recognition. IEEE Trans Pattern Anal Mach Intell 18(10):1039–1045CrossRef

Hughey R, Krogh A (1996) Hidden Markov models for sequence analysis: extension and analysis of the basic method. Comput Appl Biosci 12(2):95–107

Kriouile A, Mari J, Haon J (1990) Some improvements in speech recognition algorithms based on HMM. In: ICASSP, pp 545–548

Kuksa P, Huang PH, Pavlovic V (2008) A fast, large-scale learning method for protein sequence classification. In: 8th international workshop on data mining in bioinformatics, pp 29–37

Lane T, Brodley C (1999) Temporal sequence learning and data reduction for anomaly detection. ACM Trans Inf Syst Secur 2(3):295–331CrossRef

Law H, Chan C (1996) N-th order ergodic multigram hmm for modeling of languages without marked word boundaries. In: COLING, pp 204–209

Lesh N, Zaki M, Ogihara M (1999) Mining features for sequence classification. In: KDD, pp 342–346

Leslie C, Kuang R (2003) Fast kernels for inexact string matching. In: COLT, pp 114–128

Leslie C, Kuang R (2004) Fast string kernels using inexact matching for protein sequences. J Mach Learn Res 5:1435–1455MathSciNetMATH

Leslie C, Eskin E, Noble WS (2002a) The spectrum kernel: a string kernel for SVM protein classification. Pac Symp Biocomput 7:566–575

Leslie C, Eskin E, Weston J, Noble W (2002b) Mismatch string kernels for SVM protein classification. In: NIPS, pp 1417–1424

Leslie C, Eskin E, Cohen A, Weston J, Noble W (2004) Mismatch string kernels for discriminative protein classification. Bioinformatics 20(4):467–476CrossRef

Lin M, Hsueh S, Chen M, Hsu H (2009) Mining sequential patterns for image classification in ubiquitous multimedia systems. In: IIH-MSP, pp 303–306

Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins C (2002) Text classification using string kernels. J Mach Learn Res 2:419–444MATH

Lodhi H, Muggleton S, Sternberg M (2009) Multi-class protein fold recognition using large margin logic based divide and conquer learning. SIGKDD Explor 11(2):117–122CrossRef

Malik H, Kender J (2008) Classifying high-dimensional text and web data using very short patterns. In: ICDM, pp 923–928

Müller S, Eickeler S, Rigoll G (2000) Crane gesture recognition using pseudo 3-d hidden Markov models. In: FG (Conf. on Automatic Face and Gesture Recognition), pp 398–402

Murzin A, Brenner S, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540

Pitkow J, Pirolli P (1999) Mining longest repeating subsequences to predict world wide web surfing. In: USENIX symposium on Internet technologies and systems

Rabiner L (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77(2):257–286CrossRef

Rieck K, Laskov P (2008) Linear-time computation of similarity measures for sequential data. J Mach Learn Res 9:23–48MATH

Rousu J, Shawe-Taylor J (2005) Efficient computation of gapped substring kernels on large alphabets. J Mach Learn Res 6(1):1323–1344MathSciNetMATH

Saul L, Jordan M (1999) Mixed memory Markov models: decomposing complex stochastic processes as mixtures of simpler ones. Mach Learn 37(1):75–87CrossRefMATH

Schwardt L, Preez JD (2000) Efficient mixed-order hidden Markov model inference. In: ICSLP, pp 238–241

Sha F, Saul L (2006) Large margin hidden Markov models for automatic speech recognition. In: NIPS, pp 1249–1256

Silva I, Gomide J, Veloso A, Meira Jr W, Ferreira R (2011) Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In: SIGIR, pp 475–484

Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: EDBT, pp 3–17

Srivatsan L, Sastry P, Unnikrishnan K (2005) Discovering frequent episodes and learning hidden Markov models: a formal connection. IEEE TKDE 17:1505–1517

Syed Z, Indyk P, Guttag J (2009) Learning approximate sequential patterns for classification. J Mach Learn Res 10:1913–1936MathSciNetMATH

Szymanski B (2004) Recursive data mining for masquerade detection and author identification. Workshop on Information Assurance, pp 424–431

Tseng V, Lee C (2005) CBS: a new classification method by using sequential patterns. In: SDM

Vapnik V (1979) Estimation of dependences based on empirical data (in Russian). Nauka

Veloso A, Meira W Jr (2011) Demand-driven associative classification. Springer

Wang Y, Zhou L, Feng J, Wang J, Liu Z (2006) Mining complex time-series data by learning Markovian models. In: ICDM, pp 1136–1140

Watkins C (1999) Dynamic alignment kernels. Advances in neural information processing systems, pp 39–50

Ye L, Keogh E (2009) Time series shapelets: a new primitive for data mining. In: KDD, pp 947–956

Ye L, Keogh E (2011) Time series shapelets: a novel technique that allows accurate, interpretable and fast classification. Data Min Knowl Discov 22(1–2):149–182MathSciNetCrossRefMATH

Zaki M (2000) Sequence mining in categorical domains: Incorporating constraints. In: CIKM, pp 422–429

Zaki M, Carothers C, Szymanski B (2010) Vogue: a variable order hidden Markov model with duration based on frequent sequence mining. TKDD 4(1):1–31

Titel: Learning sequential classifiers from long and noisy discrete-event sequences efficiently
verfasst von: Gessé Dafé
Adriano Veloso
Mohammed Zaki
Wagner Meira Jr.
Publikationsdatum: 01.11.2015
Verlag: Springer US
Erschienen in: Data Mining and Knowledge Discovery / Ausgabe 6/2015
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI: https://doi.org/10.1007/s10618-014-0391-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 6/2015

Properties of the sample estimators used for statistical normalization of feature vectors

Quadratic regularization projected Barzilai–Borwein method for nonnegative matrix factorization

Weakly supervised nonnegative matrix factorization for user-driven clustering

Cluster validity functions for categorical data: a solution-space perspective

A general framework for predictive tensor modeling with domain knowledge

Size matters: choosing the most informative set of window lengths for mining patterns in event sequences

Premium Partner