Skip to main content
Erschienen in: Journal of Intelligent Information Systems 3/2015

01.12.2015

Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining

verfasst von: Dang Bach Bui, Fedja Hadzic, Andrea Tagarelli, Michael Hecker

Erschienen in: Journal of Intelligent Information Systems | Ausgabe 3/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Tree-structured data are popular in many domains making structural classification an important task. In this paper, an associative classification method is introduced based on a structure preserving flat representation of trees. A major difference to traditional tree mining techniques is that subtrees are constrained by the position in the original trees, leading to a drastic reduction in the number of rules generated, especially with data having great structural variation among tree instances. This characteristic would be desired in the current status of frequent pattern mining, where excessive patterns hinder the practical use of results. However the question remains whether this reduction comes at a high cost in accuracy and coverage rate reduction. We explore this aspect and compare the approach with a state-of-the-art structural classifier based on same subtree type, but not positional constrained in any way. We investigate the effect of using different types of frequent pattern (frequent or closed), or subtree types (induced, embedded or embedded-plus-disconnected subtrees) to the performance of the two classifiers. Different rule strength measures such as confidence, weighted confidence and likelihood are also examined in our study. The experiments on three real-world data sets reveal important similarities and differences between the methods.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In International conference on very large data bases (VLDB’94) (pp. 487–499). Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. In International conference on very large data bases (VLDB’94) (pp. 487–499).
Zurück zum Zitat Arimura, H., & Uno, T. (2005). An output-polynomial time algorithm for mining frequent closed attribute trees. In International conference on inductive logic programming (ILP’05) (pp. 1–19). Arimura, H., & Uno, T. (2005). An output-polynomial time algorithm for mining frequent closed attribute trees. In International conference on inductive logic programming (ILP’05) (pp. 1–19).
Zurück zum Zitat Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In SIAM international conference on data mining (SIAM’02) (pp. 158–174). Asai, T., Abe, K., Kawasoe, S., Arimura, H., Satamoto, H., Arikawa, S. (2002). Efficient substructure discovery from large semi-structured data. In SIAM international conference on data mining (SIAM’02) (pp. 158–174).
Zurück zum Zitat Bose, R.P.J.C., & van der Aalst, W.M.P. (2011). Analysis of patient treatment procedures. In Business process management workshops (BPM’12) (pp. 165–166). Bose, R.P.J.C., & van der Aalst, W.M.P. (2011). Analysis of patient treatment procedures. In Business process management workshops (BPM’12) (pp. 165–166).
Zurück zum Zitat Bringmann, B., & Zimmermann, A. (2005). Tree2: decision trees for tree structured data. In European conference on principles and practice of knowledge discovery in databases (PKDD’05) (pp. 46–58). Bringmann, B., & Zimmermann, A. (2005). Tree2: decision trees for tree structured data. In European conference on principles and practice of knowledge discovery in databases (PKDD’05) (pp. 46–58).
Zurück zum Zitat Bui, D.B., Hadzic, F., Potdar, V. (2012). A framework for application of tree-Structured data mining to process log analysis. In Intelligent data engineering and automated learning (IDEAL’12) (pp. 423–434). Bui, D.B., Hadzic, F., Potdar, V. (2012). A framework for application of tree-Structured data mining to process log analysis. In Intelligent data engineering and automated learning (IDEAL’12) (pp. 423–434).
Zurück zum Zitat Chehreghani, M.H., Lucas, C., Rahgozar, M., Ghadimi, E. (2009). Efficient rule based structural algorithms for classification of tree structured data. Intelligent Data Analysis Journal, 13(1), 165–188. Chehreghani, M.H., Lucas, C., Rahgozar, M., Ghadimi, E. (2009). Efficient rule based structural algorithms for classification of tree structured data. Intelligent Data Analysis Journal, 13(1), 165–188.
Zurück zum Zitat Cheng, H., Yan, X., Han, J., Yu, P.S. (2008). Direct discriminative pattern mining for effective classification. In International conference on data engineering (ICDE’08) (pp. 169–178). Cheng, H., Yan, X., Han, J., Yu, P.S. (2008). Direct discriminative pattern mining for effective classification. In International conference on data engineering (ICDE’08) (pp. 169–178).
Zurück zum Zitat Chi, Y., Xia, Y., Yang, Y., Muntz, R. (2005a). Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 17(2), 190–202.CrossRef Chi, Y., Xia, Y., Yang, Y., Muntz, R. (2005a). Mining closed and maximal frequent subtrees from databases of labeled rooted trees. IEEE Transactions on Knowledge and Data Engineering, 17(2), 190–202.CrossRef
Zurück zum Zitat Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N. (2005b). Freequent subtree mining—an overview. Fundamenta Informaticae, 66(1–2), 161–198.MATHMathSciNet Chi, Y., Muntz, R.R., Nijssen, S., Kok, J.N. (2005b). Freequent subtree mining—an overview. Fundamenta Informaticae, 66(1–2), 161–198.MATHMathSciNet
Zurück zum Zitat Costa, G., Ortale, R., Ritacco, E. (2013). X-class: associative classification of Xml documents by structure. ACM Transactions on Information Systems, 31(1), 3:1–3:40.CrossRef Costa, G., Ortale, R., Ritacco, E. (2013). X-class: associative classification of Xml documents by structure. ACM Transactions on Information Systems, 31(1), 3:1–3:40.CrossRef
Zurück zum Zitat De Knijf, J. (2006). FAT-CAT: frequent attributes tree based classification. In International conference on the initiative for the evaluation of XML retrieval (INEX’06) (pp. 485–496). De Knijf, J. (2006). FAT-CAT: frequent attributes tree based classification. In International conference on the initiative for the evaluation of XML retrieval (INEX’06) (pp. 485–496).
Zurück zum Zitat Dong, G., Zhang, X., Wong, L., Li, J. (1999). CAEP: classification by aggregating emerging patterns. In International conference on discovery science (DS’99) (pp. 30–42). Dong, G., Zhang, X., Wong, L., Li, J. (1999). CAEP: classification by aggregating emerging patterns. In International conference on discovery science (DS’99) (pp. 30–42).
Zurück zum Zitat Garboni, C., Masseglia, F., Trousse, B. (2006). Sequential pattern mining for structure-Based XML document classification. In International conference on initiative for the evaluation of XML retrieval (INEX’06) (pp. 458–468). Garboni, C., Masseglia, F., Trousse, B. (2006). Sequential pattern mining for structure-Based XML document classification. In International conference on initiative for the evaluation of XML retrieval (INEX’06) (pp. 458–468).
Zurück zum Zitat Geamsakul, W., Yoshida, T., Ohara, K., Motoda, H., Yokoi, H., Takabayashi, K. (2005). Constructing a decision tree for graph-structured data and its applications. Fundamenta Informaticae, 66(1–2), 131–160.MATHMathSciNet Geamsakul, W., Yoshida, T., Ohara, K., Motoda, H., Yokoi, H., Takabayashi, K. (2005). Constructing a decision tree for graph-structured data and its applications. Fundamenta Informaticae, 66(1–2), 131–160.MATHMathSciNet
Zurück zum Zitat Geng, L., & Hamilton, H.J. (2006). Interestingness measures for data mining: a survey. ACM Computing Surveys, 38(3). Geng, L., & Hamilton, H.J. (2006). Interestingness measures for data mining: a survey. ACM Computing Surveys, 38(3).
Zurück zum Zitat Hadzic, F. (2012). A structure preserving flat data format representation for tree-structured data. In PAKDD workshop on quality issues, measure of interestingness, and evaluation of data mining models (pp. 221–233). Hadzic, F. (2012). A structure preserving flat data format representation for tree-structured data. In PAKDD workshop on quality issues, measure of interestingness, and evaluation of data mining models (pp. 221–233).
Zurück zum Zitat Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM SIGMOD International Conference on Management of Data (SIGMOD’00) (pp. 1–12). Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In ACM SIGMOD International Conference on Management of Data (SIGMOD’00) (pp. 1–12).
Zurück zum Zitat Han, J., Cheng, H., Xin, D., Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.MathSciNetCrossRef Han, J., Cheng, H., Xin, D., Yan, X. (2007). Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, 15(1), 55–86.MathSciNetCrossRef
Zurück zum Zitat Helmer, S., Augsten, N., Bohlen, M. (2012). Measuring structural similarity of semistructured data based on information-theoretic approaches. VLDB Journal, 21(5), 677–702.CrossRef Helmer, S., Augsten, N., Bohlen, M. (2012). Measuring structural similarity of semistructured data based on information-theoretic approaches. VLDB Journal, 21(5), 677–702.CrossRef
Zurück zum Zitat Kim, H., Kim, S., Weninger, T., Han, J., Abdelzaher, T. (2010). NDPMine: efficiently mining discriminative numerical features for pattern-based classification. In European conference on machine learning and knowledge discovery in databases (ECML PKDD’10) (pp. 35–50). Kim, H., Kim, S., Weninger, T., Han, J., Abdelzaher, T. (2010). NDPMine: efficiently mining discriminative numerical features for pattern-based classification. In European conference on machine learning and knowledge discovery in databases (ECML PKDD’10) (pp. 35–50).
Zurück zum Zitat Le Bras, Y., Lenca, P., Lallich, S. (2011). Mining classification rules without support: an anti-monotone property of Jaccard measure. In International conference on discovery science (DS’11) (pp. 179–193). Le Bras, Y., Lenca, P., Lallich, S. (2011). Mining classification rules without support: an anti-monotone property of Jaccard measure. In International conference on discovery science (DS’11) (pp. 179–193).
Zurück zum Zitat Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.MATHCrossRef Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.MATHCrossRef
Zurück zum Zitat Li, W., Han, J., Pei, J. (2001). CMAR: accurate and efficient classification based on multiple class-association rules. In International conference on data mining (ICDM’01) (pp. 369–376). Li, W., Han, J., Pei, J. (2001). CMAR: accurate and efficient classification based on multiple class-association rules. In International conference on data mining (ICDM’01) (pp. 369–376).
Zurück zum Zitat Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In International conference on knowledge discovery and data mining (KDD’98) (pp. 80–86). Liu, B., Hsu, W., Ma, Y. (1998). Integrating classification and association rule mining. In International conference on knowledge discovery and data mining (KDD’98) (pp. 80–86).
Zurück zum Zitat Quinlan, J. R., & Cameron-Jones, R. M. (1993). FOIL: a midterm report. In European conference on machine learning (ECML’93) (pp. 3–20). Quinlan, J. R., & Cameron-Jones, R. M. (1993). FOIL: a midterm report. In European conference on machine learning (ECML’93) (pp. 3–20).
Zurück zum Zitat Shasha, D., Wang, J.T.L., Shan, H., Zhang, K. (2002). ATreeGrep: approximate searching in unordered trees. In International conference on scientific and statistical database management (SSDBM’02) (pp. 89–98). Shasha, D., Wang, J.T.L., Shan, H., Zhang, K. (2002). ATreeGrep: approximate searching in unordered trees. In International conference on scientific and statistical database management (SSDBM’02) (pp. 89–98).
Zurück zum Zitat Termier, A., Rousset, M.C., Sebag, M. (2002). TreeFinder: a first step towards XML data mining. In IEEE international conference on data mining (ICDM’02) (pp. 450–457). Termier, A., Rousset, M.C., Sebag, M. (2002). TreeFinder: a first step towards XML data mining. In IEEE international conference on data mining (ICDM’02) (pp. 450–457).
Zurück zum Zitat Thabtah, F. (2007). A review of associative classification mining. Knowledge Engineering Review, 22(1), 37–65.CrossRef Thabtah, F. (2007). A review of associative classification mining. Knowledge Engineering Review, 22(1), 37–65.CrossRef
Zurück zum Zitat Thabtah, F., Cowling, P., Peng, Y. (2004). MMAC: a new multi-class, multi-label associative classification approach. In International conference on data mining (ICDM’04) (pp. 217–224). Thabtah, F., Cowling, P., Peng, Y. (2004). MMAC: a new multi-class, multi-label associative classification approach. In International conference on data mining (ICDM’04) (pp. 217–224).
Zurück zum Zitat Veloso, A., Meira, W., Zaki, M.J. (2006). Lazy associative classification. In International conference on data mining (ICDM’06) (pp. 645–654). Veloso, A., Meira, W., Zaki, M.J. (2006). Lazy associative classification. In International conference on data mining (ICDM’06) (pp. 645–654).
Zurück zum Zitat Wang, J., & Karypis, G. (2006). On mining instance-centric classification rules. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1497–1511.CrossRef Wang, J., & Karypis, G. (2006). On mining instance-centric classification rules. IEEE Transactions on Knowledge and Data Engineering, 18(11), 1497–1511.CrossRef
Zurück zum Zitat Wang, K., Zhou, S., He, Y. (2000). Growing decision tree on support-less association rules. In ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’00) (pp. 265–269). Wang, K., Zhou, S., He, Y. (2000). Growing decision tree on support-less association rules. In ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’00) (pp. 265–269).
Zurück zum Zitat Wang, K., He, Y., Cheung, D.W. (2001). Mining confident rules without support requirement. In International conference on information and knowledge management (CIKM’01) (pp. 89–96). Wang, K., He, Y., Cheung, D.W. (2001). Mining confident rules without support requirement. In International conference on information and knowledge management (CIKM’01) (pp. 89–96).
Zurück zum Zitat Yin, X., & Han, J. (2003). CPAR: classification based on predictive association rule. In SIAM international conference on data mining (SIAM’03) (pp. 369–376). Yin, X., & Han, J. (2003). CPAR: classification based on predictive association rule. In SIAM international conference on data mining (SIAM’03) (pp. 369–376).
Zurück zum Zitat Zaki, M.J. (2005). Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1021–1035.CrossRef Zaki, M.J. (2005). Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8), 1021–1035.CrossRef
Zurück zum Zitat Zaki, M.J., & Aggarwal, C.C. (2006). XRules: an effective algorithm for structural classification of XML data. Machine Learning Journal, 62(1–2), 137–170.CrossRef Zaki, M.J., & Aggarwal, C.C. (2006). XRules: an effective algorithm for structural classification of XML data. Machine Learning Journal, 62(1–2), 137–170.CrossRef
Metadaten
Titel
Evaluation of an associative classifier based on position-constrained frequent/closed subtree mining
verfasst von
Dang Bach Bui
Fedja Hadzic
Andrea Tagarelli
Michael Hecker
Publikationsdatum
01.12.2015
Verlag
Springer US
Erschienen in
Journal of Intelligent Information Systems / Ausgabe 3/2015
Print ISSN: 0925-9902
Elektronische ISSN: 1573-7675
DOI
https://doi.org/10.1007/s10844-014-0312-9

Weitere Artikel der Ausgabe 3/2015

Journal of Intelligent Information Systems 3/2015 Zur Ausgabe

Premium Partner