nach oben

Journal of Intelligent Information Systems

Erschienen in:

01.12.2015

Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining

verfasst von: Dang Bach Bui, Fedja Hadzic, Andrea Tagarelli, Michael Hecker

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Tree-structured data are popular in many domains making structural classification an important task. In this paper, an associative classification method is introduced based on a structure preserving flat representation of trees. A major difference to traditional tree mining techniques is that subtrees are constrained by the position in the original trees, leading to a drastic reduction in the number of rules generated, especially with data having great structural variation among tree instances. This characteristic would be desired in the current status of frequent pattern mining, where excessive patterns hinder the practical use of results. However the question remains whether this reduction comes at a high cost in accuracy and coverage rate reduction. We explore this aspect and compare the approach with a state-of-the-art structural classifier based on same subtree type, but not positional constrained in any way. We investigate the effect of using different types of frequent pattern (frequent or closed), or subtree types (induced, embedded or embedded-plus-disconnected subtrees) to the performance of the two classifiers. Different rule strength measures such as confidence, weighted confidence and likelihood are also examined in our study. The experiments on three real-world data sets reveal important similarities and differences between the methods.

Vorheriger Artikel Optimizing text classification through efficient feature selection based on quality metric

Nächster Artikel Mining coverage patterns from transactional databases

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In International conference on very large data bases (VLDB’94) (pp. 487–499).

Arimura, H., & Uno, T. (2005). An output-polynomial time algorithm for mining frequent closed attribute trees. In International conference on inductive logic programming (ILP’05) (pp. 1–19).

Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In SIAM international conference on data mining (SIAM’02) (pp. 158–174).

Bose, R.P.J.C., & van der Aalst, W.M.P. (2011). Analysis of patient treatment procedures. In Business process management workshops (BPM’12) (pp. 165–166).

Bringmann, B., & Zimmermann, A. (2005). Tree²: decision trees for tree structured data. In European conference on principles and practice of knowledge discovery in databases (PKDD’05) (pp. 46–58).

Bui, D.B., Hadzic, F., Potdar, V. (2012). A framework for application of tree-Structured data mining to process log analysis. In Intelligent data engineering and automated learning (IDEAL’12) (pp. 423–434).

Chehreghani, M.H., Lucas, C., Rahgozar, M., Ghadimi, E. (2009). Efficient rule based structural algorithms for classification of tree structured data. Intelligent Data Analysis Journal, 13(1), 165–188.

Cheng, H., Yan, X., Han, J., Yu, P.S. (2008). Direct discriminative pattern mining for effective classification. In International conference on data engineering (ICDE’08) (pp. 169–178).

Chi, Y., Xia, Y., Yang, Y., Muntz, R. (2005a). Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 17(2), 190–202.CrossRef

Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N. (2005b). Freequent subtree mining—an overview. Fundamenta Informaticae, 66(1–2), 161–198.MATHMathSciNet

Costa, G., Ortale, R., Ritacco, E. (2013). X-class: associative classification of Xml documents by structure. ACM Transactions on Information Systems, 31(1), 3:1–3:40.CrossRef

De Knijf, J. (2006). FAT-CAT: frequent attributes tree based classification. In International conference on the initiative for the evaluation of XML retrieval (INEX’06) (pp. 485–496).

Dong, G., Zhang, X., Wong, L., Li, J. (1999). CAEP: classification by aggregating emerging patterns. In International conference on discovery science (DS’99) (pp. 30–42).

Garboni, C., Masseglia, F., Trousse, B. (2006). Sequential pattern mining for structure-Based XML document classification. In International conference on initiative for the evaluation of XML retrieval (INEX’06) (pp. 458–468).

Geamsakul, W., Yoshida, T., Ohara, K., Motoda, H., Yokoi, H., Takabayashi, K. (2005). Constructing a decision tree for graph-structured data and its applications. Fundamenta Informaticae, 66(1–2), 131–160.MATHMathSciNet

Geng, L., & Hamilton, H.J. (2006). Interestingness measures for data mining: a survey. ACM Computing Surveys, 38(3).

Hadzic, F. (2012). A structure preserving flat data format representation for tree-structured data. In PAKDD workshop on quality issues, measure of interestingness, and evaluation of data mining models (pp. 221–233).

Hadzic, F., & Bui, D.B. (2013). CRM data set and experimental results from CSLOG data sets. http://cse.hcmut.edu.vn/thuanle/dangbui. Accessed 29 Nov 2013.

Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM SIGMOD International Conference on Management of Data (SIGMOD’00) (pp. 1–12).

Han, J., Cheng, H., Xin, D., Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.MathSciNetCrossRef

Helmer, S., Augsten, N., Bohlen, M. (2012). Measuring structural similarity of semistructured data based on information-theoretic approaches. VLDB Journal, 21(5), 677–702.CrossRef

Kim, H., Kim, S., Weninger, T., Han, J., Abdelzaher, T. (2010). NDPMine: efficiently mining discriminative numerical features for pattern-based classification. In European conference on machine learning and knowledge discovery in databases (ECML PKDD’10) (pp. 35–50).

Le Bras, Y., Lenca, P., Lallich, S. (2011). Mining classification rules without support: an anti-monotone property of Jaccard measure. In International conference on discovery science (DS’11) (pp. 179–193).

Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.MATHCrossRef

Li, W., Han, J., Pei, J. (2001). CMAR: accurate and efficient classification based on multiple class-association rules. In International conference on data mining (ICDM’01) (pp. 369–376).

Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In International conference on knowledge discovery and data mining (KDD’98) (pp. 80–86).

Quinlan, J. R., & Cameron-Jones, R. M. (1993). FOIL: a midterm report. In European conference on machine learning (ECML’93) (pp. 3–20).

Shasha, D., Wang, J.T.L., Shan, H., Zhang, K. (2002). ATreeGrep: approximate searching in unordered trees. In International conference on scientific and statistical database management (SSDBM’02) (pp. 89–98).

Termier, A., Rousset, M.C., Sebag, M. (2002). TreeFinder: a first step towards XML data mining. In IEEE international conference on data mining (ICDM’02) (pp. 450–457).

Thabtah, F. (2007). A review of associative classification mining. Knowledge Engineering Review, 22(1), 37–65.CrossRef

Thabtah, F., Cowling, P., Peng, Y. (2004). MMAC: a new multi-class, multi-label associative classification approach. In International conference on data mining (ICDM’04) (pp. 217–224).

Veloso, A., Meira, W., Zaki, M.J. (2006). Lazy associative classification. In International conference on data mining (ICDM’06) (pp. 645–654).

Wang, J., & Karypis, G. (2006). On mining instance-centric classification rules. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1497–1511.CrossRef

Wang, K., Zhou, S., He, Y. (2000). Growing decision tree on support-less association rules. In ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’00) (pp. 265–269).

Wang, K., He, Y., Cheung, D.W. (2001). Mining confident rules without support requirement. In International conference on information and knowledge management (CIKM’01) (pp. 89–96).

Yin, X., & Han, J. (2003). CPAR: classification based on predictive association rule. In SIAM international conference on data mining (SIAM’03) (pp. 369–376).

Zaki, M.J. (2005). Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1021–1035.CrossRef

Zaki, M.J., & Aggarwal, C.C. (2006). XRules: an effective algorithm for structural classification of XML data. Machine Learning Journal, 62(1–2), 137–170.CrossRef

Titel: Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining
verfasst von: Dang Bach Bui
Fedja Hadzic
Andrea Tagarelli
Michael Hecker
Publikationsdatum: 01.12.2015
Verlag: Springer US
Erschienen in: Journal of Intelligent Information Systems / Ausgabe 3/2015
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI: https://doi.org/10.1007/s10844-014-0312-9

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 3/2015

Mining coverage patterns from transactional databases

Guest editor’s introduction: special issue on quality issues, measures of interestingness and evaluation of data mining models

Optimizing text classification through efficient feature selection based on quality metric

Subspace clustering of data streams: new algorithms and effective evaluation measures

Objectively evaluating condensed representations and interestingness measures for frequent itemset mining

Cost-based quality measures in subgroup discovery

Premium Partner