Skip to main content
Top
Published in: Software Quality Journal 2/2020

19-12-2019

Mining Association Rules from Code (MARC) to support legacy software management

Author: Christos Tjortjis

Published in: Software Quality Journal | Issue 2/2020

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This paper presents a methodology for Mining Association Rules from Code (MARC), aiming at capturing program structure, facilitating system understanding and supporting software management. MARC groups program entities (paragraphs or statements) based on similarities, such as variable use, data types and procedure calls. It comprises three stages: code parsing/analysis, association rule mining and rule grouping. Code is parsed to populate a database with records and respective attributes. Association rules are then extracted from this database and subsequently processed to abstract programs into groups containing interrelated entities. Entities are then grouped together if their attributes participate to common rules. This abstraction is performed at the program level or even the paragraph level, in contrast to other approaches that work at the system level. Groups can then be visualised as collections of interrelated entities. The methodology was evaluated using real-life COBOL programs. Results showed that the methodology facilitates program comprehension by using source code only, where domain knowledge and documentation are either unavailable or unreliable.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Literature
go back to reference Agrawal, R. and Srikant, R., (1994), “Fast algorithms for mining association rules”, Proc. 20thInt’l Conf. Very Large Data Bases (VLDB 94), pp. 487-499. Agrawal, R. and Srikant, R., (1994), “Fast algorithms for mining association rules”, Proc. 20thInt’l Conf. Very Large Data Bases (VLDB 94), pp. 487-499.
go back to reference Arshad S., Tjortjis, C., (2016) “Clustering software metric values extracted from C# code for maintainability assessment”, SETN 16, Article No. 24, ACM Int’l Conf. Proc. Series. Arshad S., Tjortjis, C., (2016) “Clustering software metric values extracted from C# code for maintainability assessment”, SETN 16, Article No. 24, ACM Int’l Conf. Proc. Series.
go back to reference Brooks, R. (1983). Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies, 18(6), 543–554.CrossRef Brooks, R. (1983). Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies, 18(6), 543–554.CrossRef
go back to reference Canfora, G., Cimitile, A., De Lucia, A., & Di Lucca, G. A. (2001). Decomposing legacy systems into objects: an eclectic approach. Information and Software Technology, 43, 401–412.CrossRef Canfora, G., Cimitile, A., De Lucia, A., & Di Lucca, G. A. (2001). Decomposing legacy systems into objects: an eclectic approach. Information and Software Technology, 43, 401–412.CrossRef
go back to reference Dave, N., Potts, K., Dinh, V., & Asuncion, H. U. (2014). Combining association mining with topic modeling to discover more file relationships. International Journal on Advances in Software, 7(3 & 4), 539–550. Dave, N., Potts, K., Dinh, V., & Asuncion, H. U. (2014). Combining association mining with topic modeling to discover more file relationships. International Journal on Advances in Software, 7(3 & 4), 539–550.
go back to reference De Oca, C.M. and Carver, D.L., (1998) “Identification of data cohesive subsystems using data mining techniques”, Proceedings International Conference Software Maintenance (ICSM 98), IEEE Comparative. Soc. Press, pp.16-23. De Oca, C.M. and Carver, D.L., (1998) “Identification of data cohesive subsystems using data mining techniques”, Proceedings International Conference Software Maintenance (ICSM 98), IEEE Comparative. Soc. Press, pp.16-23.
go back to reference Deng, Z. H., & Lv, S. L. (2014). Fast mining frequent itemsets using nodesets. Expert Systems with Applications, 41(10), 4505–4512.CrossRef Deng, Z. H., & Lv, S. L. (2014). Fast mining frequent itemsets using nodesets. Expert Systems with Applications, 41(10), 4505–4512.CrossRef
go back to reference Eddy, B., (2014) “Structured source retrieval for improving software search during program comprehension tasks Proceedings ACM SIGPLAN Conference. Systems, Programming, and Applications: Software for Humanity (SPLASH '14), pp. 13-15. Eddy, B., (2014) “Structured source retrieval for improving software search during program comprehension tasks Proceedings ACM SIGPLAN Conference. Systems, Programming, and Applications: Software for Humanity (SPLASH '14), pp. 13-15.
go back to reference Ghafari, S.M. and Tjortjis, C., (2016) “Association rules mining by improving the imperialism competitive algorithm (ARMICA)”, IFIP Advances in Information and Communication Technology, Proc. 12th Int'l Conf. on Artificial Intelligence Applications and Innovations (AIAI 2016), Vol. 475, Springer, pp 242-254. Ghafari, S.M. and Tjortjis, C., (2016) “Association rules mining by improving the imperialism competitive algorithm (ARMICA)”, IFIP Advances in Information and Communication Technology, Proc. 12th Int'l Conf. on Artificial Intelligence Applications and Innovations (AIAI 2016), Vol. 475, Springer, pp 242-254.
go back to reference Ghafari, S.M., Tjortjis, C., (2019) “A survey on association rules mining using heuristics”, WIREs Data Mining and Knowledge Discovery, Wiley, Vol. 9, no. 4 Ghafari, S.M., Tjortjis, C., (2019) “A survey on association rules mining using heuristics”, WIREs Data Mining and Knowledge Discovery, Wiley, Vol. 9, no. 4
go back to reference Kanellopoulos, Y., Makris, C., & Tjortjis, C. (2007). An improved methodology on information distillation by mining program source code. Data & Knowledge Engineering, 61(2), 359–383.CrossRef Kanellopoulos, Y., Makris, C., & Tjortjis, C. (2007). An improved methodology on information distillation by mining program source code. Data & Knowledge Engineering, 61(2), 359–383.CrossRef
go back to reference Khadka, R., Batlajery, B.V., Saeidi, A., Jansen, S., Hage, J., (2014) “How do professionals perceive legacy systems and software modernization?”, Proceedings36th International Conference Software Engineering (ICSE 14), pp. 36-47. Khadka, R., Batlajery, B.V., Saeidi, A., Jansen, S., Hage, J., (2014) “How do professionals perceive legacy systems and software modernization?”, Proceedings36th International Conference Software Engineering (ICSE 14), pp. 36-47.
go back to reference Kouris, I. N., Makris, C., and Tsakalidis, A., (2003) “An improved algorithm for mining association rules using multiple support values”, Proc. 16th Int’l Florida Artificial Intelligence Research Society Conf., (FLAIRS 03), pp. 309-314. Kouris, I. N., Makris, C., and Tsakalidis, A., (2003) “An improved algorithm for mining association rules using multiple support values”, Proc. 16th Int’l Florida Artificial Intelligence Research Society Conf., (FLAIRS 03), pp. 309-314.
go back to reference Kunz, T., & Black, J. P. (1995). Using automatic process clustering for design recovery and distributed debugging. IEEE Transactions on Software Engineering, 21(6), 515–527.CrossRef Kunz, T., & Black, J. P. (1995). Using automatic process clustering for design recovery and distributed debugging. IEEE Transactions on Software Engineering, 21(6), 515–527.CrossRef
go back to reference Lakhotia, A. (1997). A unified system for expressing software subsystem classification techniques. Journal of Systems and Software, 36(3), 211–231.CrossRef Lakhotia, A. (1997). A unified system for expressing software subsystem classification techniques. Journal of Systems and Software, 36(3), 211–231.CrossRef
go back to reference Letovsky, S., (1986) “Cognitive processes in program comprehension”, 1stWorkshop Empirical Studies of Programmers, pp 58-79. Letovsky, S., (1986) “Cognitive processes in program comprehension”, 1stWorkshop Empirical Studies of Programmers, pp 58-79.
go back to reference Littman, D. C., Pinto, J., Letovsky, S., and Soloway, E., (1986) “Mental models and software maintenance”, 1stWorkshop Empirical Studies of Programmers, pp. 80-98. Littman, D. C., Pinto, J., Letovsky, S., and Soloway, E., (1986) “Mental models and software maintenance”, 1stWorkshop Empirical Studies of Programmers, pp. 80-98.
go back to reference Maqbool, O., Babri, H. A., Karim, A., & Sarwar, M. (2005). Metarule-guided association rule mining for program understanding. IEE Proceedings Software, 152(6), 281–296.CrossRef Maqbool, O., Babri, H. A., Karim, A., & Sarwar, M. (2005). Metarule-guided association rule mining for program understanding. IEE Proceedings Software, 152(6), 281–296.CrossRef
go back to reference Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 1–4.CrossRef Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: a response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(9), 1–4.CrossRef
go back to reference Misirli, A. T., Bener, A. B., & Turhan, B. (2011). An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3), 515–536.CrossRef Misirli, A. T., Bener, A. B., & Turhan, B. (2011). An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3), 515–536.CrossRef
go back to reference Mitchell, B. S., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.CrossRef Mitchell, B. S., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.CrossRef
go back to reference Papas, D., & Tjortjis, C. (2014). Combining clustering and classification for software quality evaluation. Lecture Notes Computer Science, Springer-Verlag, 8445, 273–286.CrossRef Papas, D., & Tjortjis, C. (2014). Combining clustering and classification for software quality evaluation. Lecture Notes Computer Science, Springer-Verlag, 8445, 273–286.CrossRef
go back to reference Rousidis, D., Tjortjis, C., (2005) “Clustering data retrieved from java source code to support software maintenance: a case study”, Proc. IEEE 9th European Conf. Software Maintenance Reengineering (CSMR 05), IEEE Comp. Soc. Press, pp. 276-279. Rousidis, D., Tjortjis, C., (2005) “Clustering data retrieved from java source code to support software maintenance: a case study”, Proc. IEEE 9th European Conf. Software Maintenance Reengineering (CSMR 05), IEEE Comp. Soc. Press, pp. 276-279.
go back to reference Sartipi, K., Kontogiannis, K. and Mavaddat, F., (2000), “Architectural design recovery using data mining techniques”, Proc. 2nd European Working Conf. Software Maintenance Reengineering (CSMR 2000), IEEE Comp. Soc. Press, pp. 129-140. Sartipi, K., Kontogiannis, K. and Mavaddat, F., (2000), “Architectural design recovery using data mining techniques”, Proc. 2nd European Working Conf. Software Maintenance Reengineering (CSMR 2000), IEEE Comp. Soc. Press, pp. 129-140.
go back to reference Shtern, M., & Tzerpos, V. (2014). Methods for selecting and improving software clustering algorithms. Software Practice and Experience, 44(1), 33–46.CrossRef Shtern, M., & Tzerpos, V. (2014). Methods for selecting and improving software clustering algorithms. Software Practice and Experience, 44(1), 33–46.CrossRef
go back to reference Silva, J. (2012). A vocabulary of program slicing-based techniques. ACM Computing Surveys (CSUR), 44(3), 12.CrossRef Silva, J. (2012). A vocabulary of program slicing-based techniques. ACM Computing Surveys (CSUR), 44(3), 12.CrossRef
go back to reference Sobernig, S., and Zdun, U., (2016) “Distilling architectural design decisions and their relationships using frequent item-sets”, 13th Working IEEE/IFIP Conf. on Software Architecture, pp. 61-70. Sobernig, S., and Zdun, U., (2016) “Distilling architectural design decisions and their relationships using frequent item-sets”, 13th Working IEEE/IFIP Conf. on Software Architecture, pp. 61-70.
go back to reference Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRef Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Software Engineering, 10(5), 595–609.CrossRef
go back to reference Sommerville, I. (2016). Software Engineering (10th ed.). Harlow: Addison-Wesley.MATH Sommerville, I. (2016). Software Engineering (10th ed.). Harlow: Addison-Wesley.MATH
go back to reference Tjortjis, C. and Layzell, P.J., (2001) “Expert maintainers’ strategies and needs when understanding software: a qualitative empirical study”, Proceedings IEEE 8th Asia-Pacific Software Engineering Conference. (APSEC 2001), IEEE Comparative Society Press, pp. 281-287. Tjortjis, C. and Layzell, P.J., (2001) “Expert maintainers’ strategies and needs when understanding software: a qualitative empirical study”, Proceedings IEEE 8th Asia-Pacific Software Engineering Conference. (APSEC 2001), IEEE Comparative Society Press, pp. 281-287.
go back to reference Tjortjis, C., Gold, N., Layzell, P.J. and Bennett, K., (2002) “From system comprehension to program comprehension”, Proceedings. IEEE 26th Int’l Computer Software Applications Conference. (COMPSAC 02), IEEE Comparative . Society Press, pp. 427-432. Tjortjis, C., Gold, N., Layzell, P.J. and Bennett, K., (2002) “From system comprehension to program comprehension”, Proceedings. IEEE 26th Int’l Computer Software Applications Conference. (COMPSAC 02), IEEE Comparative . Society Press, pp. 427-432.
go back to reference Tjortjis, C., Sinos, L. and Layzell, P.J., (2003)“Facilitating program comprehension by mining association rules from source code”, Proc. IEEE 11th Int’l Conf. Program Comprehension (ICPC 03), IEEE Comp. Soc. Press, pp. 125-132. Tjortjis, C., Sinos, L. and Layzell, P.J., (2003)“Facilitating program comprehension by mining association rules from source code”, Proc. IEEE 11th Int’l Conf. Program Comprehension (ICPC 03), IEEE Comp. Soc. Press, pp. 125-132.
go back to reference H. Tribus, I. Morrigl, S. Axelsson, (2012)“Using data mining for static code analysis of C”, Proc. 8th Int’l Conf. Advanced Data Mining and Applications (ADMA 2012), LNAI 7713, pp. 603-614. H. Tribus, I. Morrigl, S. Axelsson, (2012)“Using data mining for static code analysis of C”, Proc. 8th Int’l Conf. Advanced Data Mining and Applications (ADMA 2012), LNAI 7713, pp. 603-614.
go back to reference Tzerpos, V. and Holt, R., (1998) “Software botryology: automatic clustering of software systems”, Proc. IEEE 9th Int’l Workshop Database Expert Systems Applications (DEXA98), pp. 811. Tzerpos, V. and Holt, R., (1998) “Software botryology: automatic clustering of software systems”, Proc. IEEE 9th Int’l Workshop Database Expert Systems Applications (DEXA98), pp. 811.
go back to reference I. Witten, E. Frank, M. Hall and C. Pal, (2016) “Data mining: practical machine learning tools and techniques”, 4th Ed., Morgan Kaufmann. I. Witten, E. Frank, M. Hall and C. Pal, (2016) “Data mining: practical machine learning tools and techniques”, 4th Ed., Morgan Kaufmann.
go back to reference Yakhchi, S., Ghafari, S.M., Tjortjis, C., Fazeli, M., (2017) “ARMICA-improved: a new approach for association rule mining”, Proc. 10th Int’l Conf. on Knowledge Science, Engineering and Management (KSEM 17), Lecture Notes in Artificial Indigence, Springer-Verlag, vol. 10412, pp. 296-306. Yakhchi, S., Ghafari, S.M., Tjortjis, C., Fazeli, M., (2017) “ARMICA-improved: a new approach for association rule mining”, Proc. 10th Int’l Conf. on Knowledge Science, Engineering and Management (KSEM 17), Lecture Notes in Artificial Indigence, Springer-Verlag, vol. 10412, pp. 296-306.
go back to reference Zhang, H. and Zhang, X., (2007) “Comments on ‘data mining static code attributes to learSeptn defect predictors”, IEEE Trans. Software Eng., pp. 635-636. Zhang, H. and Zhang, X., (2007) “Comments on ‘data mining static code attributes to learSeptn defect predictors”, IEEE Trans. Software Eng., pp. 635-636.
go back to reference Zhang, D., Dang, Y., Lou, J.-G., Han, S., Zhang, H., and T. Xie, (2011) “Software analytics as a learning case in practice: approaches and experiences”, Proc. Int’l Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), pp. 55-58. Zhang, D., Dang, Y., Lou, J.-G., Han, S., Zhang, H., and T. Xie, (2011) “Software analytics as a learning case in practice: approaches and experiences”, Proc. Int’l Workshop on Machine Learning Technologies in Software Engineering (MALETS 2011), pp. 55-58.
Metadata
Title
Mining Association Rules from Code (MARC) to support legacy software management
Author
Christos Tjortjis
Publication date
19-12-2019
Publisher
Springer US
Published in
Software Quality Journal / Issue 2/2020
Print ISSN: 0963-9314
Electronic ISSN: 1573-1367
DOI
https://doi.org/10.1007/s11219-019-09480-3

Other articles of this Issue 2/2020

Software Quality Journal 2/2020 Go to the issue

EditorialNotes

In this issue

Premium Partner