Skip to main content
Erschienen in: Software Quality Journal 2/2020

19.12.2019

Mining Association Rules from Code (MARC) to support legacy software management

verfasst von: Christos Tjortjis

Erschienen in: Software Quality Journal | Ausgabe 2/2020

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper presents a methodology for Mining Association Rules from Code (MARC), aiming at capturing program structure, facilitating system understanding and supporting software management. MARC groups program entities (paragraphs or statements) based on similarities, such as variable use, data types and procedure calls. It comprises three stages: code parsing/analysis, association rule mining and rule grouping. Code is parsed to populate a database with records and respective attributes. Association rules are then extracted from this database and subsequently processed to abstract programs into groups containing interrelated entities. Entities are then grouped together if their attributes participate to common rules. This abstraction is performed at the program level or even the paragraph level, in contrast to other approaches that work at the system level. Groups can then be visualised as collections of interrelated entities. The methodology was evaluated using real-life COBOL programs. Results showed that the methodology facilitates program comprehension by using source code only, where domain knowledge and documentation are either unavailable or unreliable.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Agrawal, R. and Srikant, R., (1994), “Fast algorithms for mining association rules”, Proc. 20thInt’l Conf. Very Large Data Bases (VLDB 94), pp. 487-499. Agrawal, R. and Srikant, R., (1994), “Fast algorithms for mining association rules”, Proc. 20thInt’l Conf. Very Large Data Bases (VLDB 94), pp. 487-499.
Zurück zum Zitat Arshad S., Tjortjis, C., (2016) “Clustering software metric values extracted from C# code for maintainability assessment”, SETN 16, Article No. 24, ACM Int’l Conf. Proc. Series. Arshad S., Tjortjis, C., (2016) “Clustering software metric values extracted from C# code for maintainability assessment”, SETN 16, Article No. 24, ACM Int’l Conf. Proc. Series.
Zurück zum Zitat Brooks, R. (1983). Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies, 18(6), 543–554.CrossRef Brooks, R. (1983). Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies, 18(6), 543–554.CrossRef
Zurück zum Zitat Canfora, G., Cimitile, A., De Lucia, A., & Di Lucca, G. A. (2001). Decomposing legacy systems into objects: an eclectic approach. Information and Software Technology, 43, 401–412.CrossRef Canfora, G., Cimitile, A., De Lucia, A., & Di Lucca, G. A. (2001). Decomposing legacy systems into objects: an eclectic approach. Information and Software Technology, 43, 401–412.CrossRef
Zurück zum Zitat Dave, N., Potts, K., Dinh, V., & Asuncion, H. U. (2014). Combining association mining with topic modeling to discover more file relationships. International Journal on Advances in Software, 7(3 & 4), 539–550. Dave, N., Potts, K., Dinh, V., & Asuncion, H. U. (2014). Combining association mining with topic modeling to discover more file relationships. International Journal on Advances in Software, 7(3 & 4), 539–550.
Zurück zum Zitat De Oca, C.M. and Carver, D.L., (1998) “Identification of data cohesive subsystems using data mining techniques”, Proceedings International Conference Software Maintenance (ICSM 98), IEEE Comparative. Soc. Press, pp.16-23. De Oca, C.M. and Carver, D.L., (1998) “Identification of data cohesive subsystems using data mining techniques”, Proceedings International Conference Software Maintenance (ICSM 98), IEEE Comparative. Soc. Press, pp.16-23.
Zurück zum Zitat Deng, Z. H., & Lv, S. L. (2014). Fast mining frequent itemsets using nodesets. Expert Systems with Applications, 41(10), 4505–4512.CrossRef Deng, Z. H., & Lv, S. L. (2014). Fast mining frequent itemsets using nodesets. Expert Systems with Applications, 41(10), 4505–4512.CrossRef
Zurück zum Zitat Eddy, B., (2014) “Structured source retrieval for improving software search during program comprehension tasks Proceedings ACM SIGPLAN Conference. Systems, Programming, and Applications: Software for Humanity (SPLASH '14), pp. 13-15. Eddy, B., (2014) “Structured source retrieval for improving software search during program comprehension tasks Proceedings ACM SIGPLAN Conference. Systems, Programming, and Applications: Software for Humanity (SPLASH '14), pp. 13-15.
Zurück zum Zitat Ghafari, S.M. and Tjortjis, C., (2016) “Association rules mining by improving the imperialism competitive algorithm (ARMICA)”, IFIP Advances in Information and Communication Technology, Proc. 12th Int'l Conf. on Artificial Intelligence Applications and Innovations (AIAI 2016), Vol. 475, Springer, pp 242-254. Ghafari, S.M. and Tjortjis, C., (2016) “Association rules mining by improving the imperialism competitive algorithm (ARMICA)”, IFIP Advances in Information and Communication Technology, Proc. 12th Int'l Conf. on Artificial Intelligence Applications and Innovations (AIAI 2016), Vol. 475, Springer, pp 242-254.
Zurück zum Zitat Ghafari, S.M., Tjortjis, C., (2019) “A survey on association rules mining using heuristics”, WIREs Data Mining and Knowledge Discovery, Wiley, Vol. 9, no. 4 Ghafari, S.M., Tjortjis, C., (2019) “A survey on association rules mining using heuristics”, WIREs Data Mining and Knowledge Discovery, Wiley, Vol. 9, no. 4
Zurück zum Zitat Kanellopoulos, Y., Makris, C., & Tjortjis, C. (2007). An improved methodology on information distillation by mining program source code. Data & Knowledge Engineering, 61(2), 359–383.CrossRef Kanellopoulos, Y., Makris, C., & Tjortjis, C. (2007). An improved methodology on information distillation by mining program source code. Data & Knowledge Engineering, 61(2), 359–383.CrossRef
Zurück zum Zitat Khadka, R., Batlajery, B.V., Saeidi, A., Jansen, S., Hage, J., (2014) “How do professionals perceive legacy systems and software modernization?”, Proceedings36th International Conference Software Engineering (ICSE 14), pp. 36-47. Khadka, R., Batlajery, B.V., Saeidi, A., Jansen, S., Hage, J., (2014) “How do professionals perceive legacy systems and software modernization?”, Proceedings36th International Conference Software Engineering (ICSE 14), pp. 36-47.
Zurück zum Zitat Kouris, I. N., Makris, C., and Tsakalidis, A., (2003) “An improved algorithm for mining association rules using multiple support values”, Proc. 16th Int’l Florida Artificial Intelligence Research Society Conf., (FLAIRS 03), pp. 309-314. Kouris, I. N., Makris, C., and Tsakalidis, A., (2003) “An improved algorithm for mining association rules using multiple support values”, Proc. 16th Int’l Florida Artificial Intelligence Research Society Conf., (FLAIRS 03), pp. 309-314.
Zurück zum Zitat Kunz, T., & Black, J. P. (1995). Using automatic process clustering for design recovery and distributed debugging. IEEE Transactions on Software Engineering, 21(6), 515–527.CrossRef Kunz, T., & Black, J. P. (1995). Using automatic process clustering for design recovery and distributed debugging. IEEE Transactions on Software Engineering, 21(6), 515–527.CrossRef
Zurück zum Zitat Lakhotia, A. (1997). A unified system for expressing software subsystem classification techniques. Journal of Systems and Software, 36(3), 211–231.CrossRef Lakhotia, A. (1997). A unified system for expressing software subsystem classification techniques. Journal of Systems and Software, 36(3), 211–231.CrossRef
Zurück zum Zitat Letovsky, S., (1986) “Cognitive processes in program comprehension”, 1stWorkshop Empirical Studies of Programmers, pp 58-79. Letovsky, S., (1986) “Cognitive processes in program comprehension”, 1stWorkshop Empirical Studies of Programmers, pp 58-79.
Zurück zum Zitat Littman, D. C., Pinto, J., Letovsky, S., and Soloway, E., (1986) “Mental models and software maintenance”, 1stWorkshop Empirical Studies of Programmers, pp. 80-98. Littman, D. C., Pinto, J., Letovsky, S., and Soloway, E., (1986) “Mental models and software maintenance”, 1stWorkshop Empirical Studies of Programmers, pp. 80-98.
Zurück zum Zitat Maqbool, O., Babri, H. A., Karim, A., & Sarwar, M. (2005). Metarule-guided association rule mining for program understanding. IEE Proceedings Software, 152(6), 281–296.CrossRef Maqbool, O., Babri, H. A., Karim, A., & Sarwar, M. (2005). Metarule-guided association rule mining for program understanding. IEE Proceedings Software, 152(6), 281–296.CrossRef
Zurück zum Zitat Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 1–4.CrossRef Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 1–4.CrossRef
Zurück zum Zitat Misirli, A. T., Bener, A. B., & Turhan, B. (2011). An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3), 515–536.CrossRef Misirli, A. T., Bener, A. B., & Turhan, B. (2011). An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3), 515–536.CrossRef
Zurück zum Zitat Mitchell, B. S., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.CrossRef Mitchell, B. S., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.CrossRef
Zurück zum Zitat Papas, D., & Tjortjis, C. (2014). Combining clustering and classification for software quality evaluation. Lecture Notes Computer Science, Springer-Verlag, 8445, 273–286.CrossRef Papas, D., & Tjortjis, C. (2014). Combining clustering and classification for software quality evaluation. Lecture Notes Computer Science, Springer-Verlag, 8445, 273–286.CrossRef
Zurück zum Zitat Rousidis, D., Tjortjis, C., (2005) “Clustering data retrieved from java source code to support software maintenance: a case study”, Proc. IEEE 9th European Conf. Software Maintenance Reengineering (CSMR 05), IEEE Comp. Soc. Press, pp. 276-279. Rousidis, D., Tjortjis, C., (2005) “Clustering data retrieved from java source code to support software maintenance: a case study”, Proc. IEEE 9th European Conf. Software Maintenance Reengineering (CSMR 05), IEEE Comp. Soc. Press, pp. 276-279.
Zurück zum Zitat Sartipi, K., Kontogiannis, K. and Mavaddat, F., (2000), “Architectural design recovery using data mining techniques”, Proc. 2nd European Working Conf. Software Maintenance Reengineering (CSMR 2000), IEEE Comp. Soc. Press, pp. 129-140. Sartipi, K., Kontogiannis, K. and Mavaddat, F., (2000), “Architectural design recovery using data mining techniques”, Proc. 2nd European Working Conf. Software Maintenance Reengineering (CSMR 2000), IEEE Comp. Soc. Press, pp. 129-140.
Zurück zum Zitat Shtern, M., & Tzerpos, V. (2014). Methods for selecting and improving software clustering algorithms. Software Practice and Experience, 44(1), 33–46.CrossRef Shtern, M., & Tzerpos, V. (2014). Methods for selecting and improving software clustering algorithms. Software Practice and Experience, 44(1), 33–46.CrossRef
Zurück zum Zitat Silva, J. (2012). A vocabulary of program slicing-based techniques. ACM Computing Surveys (CSUR), 44(3), 12.CrossRef Silva, J. (2012). A vocabulary of program slicing-based techniques. ACM Computing Surveys (CSUR), 44(3), 12.CrossRef
Zurück zum Zitat Sobernig, S., and Zdun, U., (2016) “Distilling architectural design decisions and their relationships using frequent item-sets”, 13th Working IEEE/IFIP Conf. on Software Architecture, pp. 61-70. Sobernig, S., and Zdun, U., (2016) “Distilling architectural design decisions and their relationships using frequent item-sets”, 13th Working IEEE/IFIP Conf. on Software Architecture, pp. 61-70.
Zurück zum Zitat Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRef Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRef
Zurück zum Zitat Sommerville, I. (2016). Software Engineering (10th ed.). Harlow: Addison-Wesley.MATH Sommerville, I. (2016). Software Engineering (10th ed.). Harlow: Addison-Wesley.MATH
Zurück zum Zitat Tjortjis, C. and Layzell, P.J., (2001) “Expert maintainers’ strategies and needs when understanding software: a qualitative empirical study”, Proceedings IEEE 8th Asia-Pacific Software Engineering Conference. (APSEC 2001), IEEE Comparative Society Press, pp. 281-287. Tjortjis, C. and Layzell, P.J., (2001) “Expert maintainers’ strategies and needs when understanding software: a qualitative empirical study”, Proceedings IEEE 8th Asia-Pacific Software Engineering Conference. (APSEC 2001), IEEE Comparative Society Press, pp. 281-287.
Zurück zum Zitat Tjortjis, C., Gold, N., Layzell, P.J. and Bennett, K., (2002) “From system comprehension to program comprehension”, Proceedings. IEEE 26th Int’l Computer Software Applications Conference. (COMPSAC 02), IEEE Comparative . Society Press, pp. 427-432. Tjortjis, C., Gold, N., Layzell, P.J. and Bennett, K., (2002) “From system comprehension to program comprehension”, Proceedings. IEEE 26th Int’l Computer Software Applications Conference. (COMPSAC 02), IEEE Comparative . Society Press, pp. 427-432.
Zurück zum Zitat Tjortjis, C., Sinos, L. and Layzell, P.J., (2003)“Facilitating program comprehension by mining association rules from source code”, Proc. IEEE 11th Int’l Conf. Program Comprehension (ICPC 03), IEEE Comp. Soc. Press, pp. 125-132. Tjortjis, C., Sinos, L. and Layzell, P.J., (2003)“Facilitating program comprehension by mining association rules from source code”, Proc. IEEE 11th Int’l Conf. Program Comprehension (ICPC 03), IEEE Comp. Soc. Press, pp. 125-132.
Zurück zum Zitat H. Tribus, I. Morrigl, S. Axelsson, (2012)“Using data mining for static code analysis of C”, Proc. 8th Int’l Conf. Advanced Data Mining and Applications (ADMA 2012), LNAI 7713, pp. 603-614. H. Tribus, I. Morrigl, S. Axelsson, (2012)“Using data mining for static code analysis of C”, Proc. 8th Int’l Conf. Advanced Data Mining and Applications (ADMA 2012), LNAI 7713, pp. 603-614.
Zurück zum Zitat Tzerpos, V. and Holt, R., (1998) “Software botryology: automatic clustering of software systems”, Proc. IEEE 9th Int’l Workshop Database Expert Systems Applications (DEXA98), pp. 811. Tzerpos, V. and Holt, R., (1998) “Software botryology: automatic clustering of software systems”, Proc. IEEE 9th Int’l Workshop Database Expert Systems Applications (DEXA98), pp. 811.
Zurück zum Zitat I. Witten, E. Frank, M. Hall and C. Pal, (2016) “Data mining: practical machine learning tools and techniques”, 4th Ed., Morgan Kaufmann. I. Witten, E. Frank, M. Hall and C. Pal, (2016) “Data mining: practical machine learning tools and techniques”, 4th Ed., Morgan Kaufmann.
Zurück zum Zitat Yakhchi, S., Ghafari, S.M., Tjortjis, C., Fazeli, M., (2017) “ARMICA-improved: a new approach for association rule mining”, Proc. 10th Int’l Conf. on Knowledge Science, Engineering and Management (KSEM 17), Lecture Notes in Artificial Indigence, Springer-Verlag, vol. 10412, pp. 296-306. Yakhchi, S., Ghafari, S.M., Tjortjis, C., Fazeli, M., (2017) “ARMICA-improved: a new approach for association rule mining”, Proc. 10th Int’l Conf. on Knowledge Science, Engineering and Management (KSEM 17), Lecture Notes in Artificial Indigence, Springer-Verlag, vol. 10412, pp. 296-306.
Zurück zum Zitat Zhang, H. and Zhang, X., (2007) “Comments on ‘data mining static code attributes to learSeptn defect predictors”, IEEE Trans. Software Eng., pp. 635-636. Zhang, H. and Zhang, X., (2007) “Comments on ‘data mining static code attributes to learSeptn defect predictors”, IEEE Trans. Software Eng., pp. 635-636.
Zurück zum Zitat Zhang, D., Dang, Y., Lou, J.-G., Han, S., Zhang, H., and T. Xie, (2011) “Software analytics as a learning case in practice: approaches and experiences”, Proc. Int’l Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), pp. 55-58. Zhang, D., Dang, Y., Lou, J.-G., Han, S., Zhang, H., and T. Xie, (2011) “Software analytics as a learning case in practice: approaches and experiences”, Proc. Int’l Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), pp. 55-58.
Metadaten
Titel
Mining Association Rules from Code (MARC) to support legacy software management
verfasst von
Christos Tjortjis
Publikationsdatum
19.12.2019
Verlag
Springer US
Erschienen in
Software Quality Journal / Ausgabe 2/2020
Print ISSN: 0963-9314
Elektronische ISSN: 1573-1367
DOI
https://doi.org/10.1007/s11219-019-09480-3

Weitere Artikel der Ausgabe 2/2020

Software Quality Journal 2/2020 Zur Ausgabe

Premium Partner