Skip to main content
Top
Published in: International Journal on Software Tools for Technology Transfer 4/2019

20-03-2019 | Regular Paper

ASAP: A Source Code Authorship Program

Author: Matthew F. Tennyson

Published in: International Journal on Software Tools for Technology Transfer | Issue 4/2019

Log in

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Source code authorship attribution is the task of determining who wrote a computer program, based on its source code, usually when the author is either unknown or under dispute. Areas where this can be applied include software forensics, cases of software copyright infringement, and detecting plagiarism. Numerous methods of source code authorship attribution have been proposed and studied. However, there are no known easily accessible and user-friendly programs that perform this task. Instead, researchers typically develop software in an ad hoc manner for use in their studies, and the software is rarely made publicly available. In this paper, we present a software tool called A Source Code Authorship Program (ASAP), which is suitable to be used by either the layperson or the expert. An author can be attributed to individual documents one at a time, or complex authorship attribution experiments can easily be performed on large datasets. In this paper, the interface and implementation of the ASAP tool is presented, and the tool is validated by using it to replicate previously published authorship attribution experiments.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
1.
go back to reference Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. In: Proceedings of the Second Asian Information Retrieval Symposium (AIRS), pp. 174–189 (2005) Zhao, Y., Zobel, J.: Effective and scalable authorship attribution using function words. In: Proceedings of the Second Asian Information Retrieval Symposium (AIRS), pp. 174–189 (2005)
2.
go back to reference Frantzeskou, G., Stamatatos, E., Gritzalis, S., Katsikas, S.: Effective identification of source code authors using byte-level information. In: Proceedings of the 28th International Conference on Software Engineering (ICSE), pp. 893–896 (2006) Frantzeskou, G., Stamatatos, E., Gritzalis, S., Katsikas, S.: Effective identification of source code authors using byte-level information. In: Proceedings of the 28th International Conference on Software Engineering (ICSE), pp. 893–896 (2006)
3.
go back to reference Burrows, S., Tahaghoghi, S.: Source code authorship attribution using n-grams. In: Proceedings of the 12th Australasian Document Computing Symposium, pp. 32–39 (2007) Burrows, S., Tahaghoghi, S.: Source code authorship attribution using n-grams. In: Proceedings of the 12th Australasian Document Computing Symposium, pp. 32–39 (2007)
4.
go back to reference Krsul, I., Spafford, E.: Authorship analysis: identifying the author of a program. Comput. Secur. (COMPSEC) 16(3), 233–257 (1997)CrossRef Krsul, I., Spafford, E.: Authorship analysis: identifying the author of a program. Comput. Secur. (COMPSEC) 16(3), 233–257 (1997)CrossRef
5.
go back to reference MacDonell, S., Gray, A., MacLennan, G., Sallis, P.: Software forensics for discriminating between program authors. In: Proceedings of the 6th International Conference on Neural Information Processing (ICONIP), pp. 66–71 (1999) MacDonell, S., Gray, A., MacLennan, G., Sallis, P.: Software forensics for discriminating between program authors. In: Proceedings of the 6th International Conference on Neural Information Processing (ICONIP), pp. 66–71 (1999)
6.
go back to reference Ding, H., Samadzadeh, M.: Extraction of java program fingerprints for software authorship identification. J. Syst. Softw. 72, 49–57 (2004)CrossRef Ding, H., Samadzadeh, M.: Extraction of java program fingerprints for software authorship identification. J. Syst. Softw. 72, 49–57 (2004)CrossRef
7.
go back to reference Lange, R., Mancoridis, S.: Using code metric histograms and genetic algorithms to perform author identification for software forensics. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO), pp. 2082–2089 (2007) Lange, R., Mancoridis, S.: Using code metric histograms and genetic algorithms to perform author identification for software forensics. In: Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation (GECCO), pp. 2082–2089 (2007)
8.
go back to reference Kothari, J., Shevertalov, M., Stehle, E., Mancoridis, S.: A probabilistic approach to source code authorship identification, Proceedings of the Fourth International Conference on Information Technology, pp. 243248 (2007) Kothari, J., Shevertalov, M., Stehle, E., Mancoridis, S.: A probabilistic approach to source code authorship identification, Proceedings of the Fourth International Conference on Information Technology, pp. 243248 (2007)
9.
go back to reference Elenbogen, B., Seliya, N.: Detecting outsourced student programming assignments. J. Comput. Sci. Coll. 23(3), 50–57 (2008) Elenbogen, B., Seliya, N.: Detecting outsourced student programming assignments. J. Comput. Sci. Coll. 23(3), 50–57 (2008)
10.
go back to reference Shevertalov, M., Kothari, J., Stehle, E., Mancoridis, S.: On the use of discretized source code metrics for author identification. In: Proceedings of the 1st International Symposium on Search Based Software Engineering (SSBSE), pp. 69–78 (2009) Shevertalov, M., Kothari, J., Stehle, E., Mancoridis, S.: On the use of discretized source code metrics for author identification. In: Proceedings of the 1st International Symposium on Search Based Software Engineering (SSBSE), pp. 69–78 (2009)
11.
go back to reference Wisse, W., Veenman, C.: Scripting DNA: identifying the JavaScript programmer. Digit. Investig. 15, 6171 (2015)CrossRef Wisse, W., Veenman, C.: Scripting DNA: identifying the JavaScript programmer. Digit. Investig. 15, 6171 (2015)CrossRef
12.
go back to reference Neme, A., Pulido, J., Muoz, A., Hernndez, S., Dey, T.: Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing 147, 147–159 (2015)CrossRef Neme, A., Pulido, J., Muoz, A., Hernndez, S., Dey, T.: Stylistics analysis and authorship attribution algorithms based on self-organizing maps. Neurocomputing 147, 147–159 (2015)CrossRef
13.
go back to reference Caliskan-Islam, A., Harang, R., Liu, A., Narayanan, A., Voss, C., Yamaguchi, F.: De-anonymizing programmers via code stylometry. In: Proceedings of the 24th USENIX Security Symposium, pp. 255–270 (2015) Caliskan-Islam, A., Harang, R., Liu, A., Narayanan, A., Voss, C., Yamaguchi, F.: De-anonymizing programmers via code stylometry. In: Proceedings of the 24th USENIX Security Symposium, pp. 255–270 (2015)
15.
go back to reference Tennyson, M.: Authorship Attribution of Source Code. Nova Southeastern University, Florida (2013) Tennyson, M.: Authorship Attribution of Source Code. Nova Southeastern University, Florida (2013)
16.
go back to reference Tennyson, M., Mitropoulos, F.: Choosing a Profile Length in the SCAP Method of Source Code Authorship Attribution. In: 2014 Proceedings of the IEEE Southeastcon, pp. 1–6 (2014) Tennyson, M., Mitropoulos, F.: Choosing a Profile Length in the SCAP Method of Source Code Authorship Attribution. In: 2014 Proceedings of the IEEE Southeastcon, pp. 1–6 (2014)
17.
go back to reference Tennyson, M., Mitropoulos, F.: Improving the Burrows Method of Source Code Authorship Attribution. In: Proceedings of the IADIS International Conference on Applied Computing, p. 39 (2013) Tennyson, M., Mitropoulos, F.: Improving the Burrows Method of Source Code Authorship Attribution. In: Proceedings of the IADIS International Conference on Applied Computing, p. 39 (2013)
18.
go back to reference Burrows, S.: Source Code Authorship Attribution. RMIT, Melbourne (2010) Burrows, S.: Source Code Authorship Attribution. RMIT, Melbourne (2010)
19.
go back to reference Burrows, S., Uitdenbogerd, A., Turpin, A.: Comparing techniques for authorship attribution of source code. J. Softw. Pract. Exp. 44, 1–32 (2014)CrossRef Burrows, S., Uitdenbogerd, A., Turpin, A.: Comparing techniques for authorship attribution of source code. J. Softw. Pract. Exp. 44, 1–32 (2014)CrossRef
20.
go back to reference Swain, S., Mishra, G., Sindhu, C.: Recent approaches on authorship attribution techniques: an overview. In: Proceedings of the International Conference on Electronics, Communication and Aerospace Technology (ICECA), (2017) Swain, S., Mishra, G., Sindhu, C.: Recent approaches on authorship attribution techniques: an overview. In: Proceedings of the International Conference on Electronics, Communication and Aerospace Technology (ICECA), (2017)
21.
go back to reference Hendrikse, S.: The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files. Nova Southeastern University, Florida (2017) Hendrikse, S.: The Effect of Code Obfuscation on Authorship Attribution of Binary Computer Files. Nova Southeastern University, Florida (2017)
22.
go back to reference Tennyson, M.: A replicated comparative study of Source Code Authorship Attribution. In: Proceedings of the 3rd International Workshop on Replication in Empirical Software Engineering Research (RESER), pp. 76–83 (2013) Tennyson, M.: A replicated comparative study of Source Code Authorship Attribution. In: Proceedings of the 3rd International Workshop on Replication in Empirical Software Engineering Research (RESER), pp. 76–83 (2013)
23.
go back to reference McDonald, A., Afroz, S., Caliskan, A., Stolerman, A., Greenstadt, R.: Use fewer instances of the letter “i”: toward writing style anonymization. In: Proceedings of the International Symposium on Privacy Enhancing Technologies Symposium (PETS), pp. 299–318 (2012) McDonald, A., Afroz, S., Caliskan, A., Stolerman, A., Greenstadt, R.: Use fewer instances of the letter “i”: toward writing style anonymization. In: Proceedings of the International Symposium on Privacy Enhancing Technologies Symposium (PETS), pp. 299–318 (2012)
24.
go back to reference Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATH
25.
go back to reference Frank, E., Hall, M., Witten, I.: The WEKA Workbench, 4th edn. Morgan Kaufmann, Burlington (2016) Frank, E., Hall, M., Witten, I.: The WEKA Workbench, 4th edn. Morgan Kaufmann, Burlington (2016)
26.
go back to reference Prechelt, L., Malpohl, G., Philippsen, M.: Finding plagiarisms among a set of programs with JPlag. J. Univers. Comput. Sci. 8(11), 1016–1038 (2002) Prechelt, L., Malpohl, G., Philippsen, M.: Finding plagiarisms among a set of programs with JPlag. J. Univers. Comput. Sci. 8(11), 1016–1038 (2002)
27.
go back to reference Schleimer, S., Wilkerson, D., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 76–85 (2003) Schleimer, S., Wilkerson, D., Aiken, A.: Winnowing: local algorithms for document fingerprinting. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pp. 76–85 (2003)
28.
go back to reference Niezgoda, S., Way, T.: SNITCH: a software tool for detecting cut and paste plagiarism. In: Proceedings of the 37th SIGCSE Technical Symposium on Computer Science Education (SIGCSE), pp. 51–55 (2006) Niezgoda, S., Way, T.: SNITCH: a software tool for detecting cut and paste plagiarism. In: Proceedings of the 37th SIGCSE Technical Symposium on Computer Science Education (SIGCSE), pp. 51–55 (2006)
29.
go back to reference Robertson, S., Walker, S.: Okapi/Keenbow at TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), pp. 151–162 (1999) Robertson, S., Walker, S.: Okapi/Keenbow at TREC-8. In: Proceedings of the 8th Text Retrieval Conference (TREC-8), pp. 151–162 (1999)
Metadata
Title
ASAP: A Source Code Authorship Program
Author
Matthew F. Tennyson
Publication date
20-03-2019
Publisher
Springer Berlin Heidelberg
Published in
International Journal on Software Tools for Technology Transfer / Issue 4/2019
Print ISSN: 1433-2779
Electronic ISSN: 1433-2787
DOI
https://doi.org/10.1007/s10009-019-00517-3

Other articles of this Issue 4/2019

International Journal on Software Tools for Technology Transfer 4/2019 Go to the issue

Premium Partner