Skip to main content

2024 | OriginalPaper | Buchkapitel

Review of Various Approaches for Authorship Identification in Digital Forensics

verfasst von : Riya Sanjesh, J. Alamelu Mangai

Erschienen in: Proceedings of Third International Conference on Computing and Communication Networks

Verlag: Springer Nature Singapore

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Authorship identification involves extracting and analyzing the author writing styles. Digital Forensics along with cyber investigations employ writing style to identify the author and their traits. Authorship identification was done on lengthy and short texts in English, Arabic, Chinese, and Greek. However, it emphasizes a unique and difficult scenario: identifying whether two writings published in distinct discourse types are produced by the same person. We provide sets of texts that use four different discourse styles, including essays, emails, text messages, and business notes, based on a new corpus of English texts. The cross-discourse-type ownership verification assignment is highly challenging due to the disparities in communication intent, target audience, and formality level. This paper evaluates various aspects of authorship identification and provides a thorough analysis of the assessment findings. This study also explores the language proficiency and problems in authorship identification tasks. A number of significant authorship identification domain studies were assessed for data, characteristics, techniques, and outcomes. After reviewing the research, we conclude that the outcomes of authorship identification task depend primarily on the specified stylometric characteristics and dataset used. The beneficial qualities also vary by the language type.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Amer. Soc. Info. Sci. Technol. 57(3), 378–393 (2006). https://doi.org/10.1002/asi.20316. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Amer. Soc. Info. Sci. Technol. 57(3), 378–393 (2006). https://​doi.​org/​10.​1002/​asi.​20316.
3.
Zurück zum Zitat Benzebouchi, N. E., Azizi, N., Hammami, N. E., Schwab, D., Khelaifia, M.C.E., Aldwairi, M.: Authors’ writing styles based authorship identification system using the text representation vector. In: 2019 16th International Multi-Conference on Systems, Signals & Devices (SSD), pp. 371–376. IEEE (2019) Benzebouchi, N. E., Azizi, N., Hammami, N. E., Schwab, D., Khelaifia, M.C.E., Aldwairi, M.: Authors’ writing styles based authorship identification system using the text representation vector. In: 2019 16th International Multi-Conference on Systems, Signals & Devices (SSD), pp. 371–376. IEEE (2019)
4.
Zurück zum Zitat Gupta, S., Patra, T.K., Chaudhuri, C.: Role of machine learning in authorship attribution with select stylometric features. In: Intelligent Systems Design and Applications: 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021, pp. 920–932. Springer International Publishing, Cham (2022) Gupta, S., Patra, T.K., Chaudhuri, C.: Role of machine learning in authorship attribution with select stylometric features. In: Intelligent Systems Design and Applications: 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021) Held During December 13–15, 2021, pp. 920–932. Springer International Publishing, Cham (2022)
5.
Zurück zum Zitat Tamboli, M.S., Prasad, R.: A robust authorship attribution on big period. Int. J. Electr. & Comput. Eng. 9(4), 2088–8708 (2019) Tamboli, M.S., Prasad, R.: A robust authorship attribution on big period. Int. J. Electr. & Comput. Eng. 9(4), 2088–8708 (2019)
6.
Zurück zum Zitat Iyer, R.R., Rose, C.P.: A machine learning framework for authorship identification from texts (2019). arXiv preprint arXiv:1912.10204 Iyer, R.R., Rose, C.P.: A machine learning framework for authorship identification from texts (2019). arXiv preprint arXiv:​1912.​10204
7.
Zurück zum Zitat Ali, Z., Nagra, A.A., Hameed, Z., Asif, M.: Analysis of authorship attribution technique on Urdu tweets empowered by machine learning. Int. J. 10(3) (2021) Ali, Z., Nagra, A.A., Hameed, Z., Asif, M.: Analysis of authorship attribution technique on Urdu tweets empowered by machine learning. Int. J. 10(3) (2021)
8.
Zurück zum Zitat Abbasi, A., Javed, A.R., Iqbal, F., Jalil, Z., Gadekallu, T.R., Kryvinska, N.: Authorship identification using ensemble learning. Sci. Rep. 12(1), 9537 (2022)CrossRef Abbasi, A., Javed, A.R., Iqbal, F., Jalil, Z., Gadekallu, T.R., Kryvinska, N.: Authorship identification using ensemble learning. Sci. Rep. 12(1), 9537 (2022)CrossRef
9.
Zurück zum Zitat Fourkioti, O., Symeonidis, S., Arampatzis, A.: Language models and fusion for authorship attribution. Inf. Process. Manage. 56(6), 102061 (2019)CrossRef Fourkioti, O., Symeonidis, S., Arampatzis, A.: Language models and fusion for authorship attribution. Inf. Process. Manage. 56(6), 102061 (2019)CrossRef
10.
Zurück zum Zitat Ali, Z., Nagra, A.A., Hameed, Z., Asif, M.: Analysis of authorship attribution technique on Urdu tweets empowered by machine learning. Int. J. 10(3) (2021) Ali, Z., Nagra, A.A., Hameed, Z., Asif, M.: Analysis of authorship attribution technique on Urdu tweets empowered by machine learning. Int. J. 10(3) (2021)
11.
Zurück zum Zitat Verhoeven, B., Daelemans, W.: Clips stylometry investigation (CSI) corpus: A Dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), pp. 3081–3085 (2014) Verhoeven, B., Daelemans, W.: Clips stylometry investigation (CSI) corpus: A Dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC’14), pp. 3081–3085 (2014)
12.
Zurück zum Zitat Verhoeven, B., Daelemans, W., Plank, B.: Twisty: A multilingual Twitter stylometry corpus for gender and personality profiling. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16) (2016) Verhoeven, B., Daelemans, W., Plank, B.: Twisty: A multilingual Twitter stylometry corpus for gender and personality profiling. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16) (2016)
13.
Zurück zum Zitat Verhoeven, B., Soler Company, J., Daelemans, W.: Evaluating content-independent features for personality recognition. In: Proceedings of the 2014 ACM Multi Media Workshop on Computational Personality Recognition (WCPR’14), pp. 7–10. ACM, New York, NY (2014) Verhoeven, B., Soler Company, J., Daelemans, W.: Evaluating content-independent features for personality recognition. In: Proceedings of the 2014 ACM Multi Media Workshop on Computational Personality Recognition (WCPR’14), pp. 7–10. ACM, New York, NY (2014)
14.
Zurück zum Zitat Verhoeven, B., Škrjanec, I., Pollak, S.: Gender profiling for slovene twitter communication: The influence of gender marking, content and style. :oceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pp. 119–125 (2017). Retrieved from http://www.aclweb.org/anthology/W17-1418 Verhoeven, B., Škrjanec, I., Pollak, S.: Gender profiling for slovene twitter communication: The influence of gender marking, content and style. :oceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pp. 119–125 (2017). Retrieved from http://​www.​aclweb.​org/​anthology/​W17-1418
17.
Zurück zum Zitat Nasser Alsager, H.: Towards a stylometric authorship recognition model for the social media texts in Arabic. Arab. World Engl. J. (AWEJ) 11 (2021) Nasser Alsager, H.: Towards a stylometric authorship recognition model for the social media texts in Arabic. Arab. World Engl. J. (AWEJ) 11 (2021)
18.
Zurück zum Zitat Obeid, O., Zalmout, N., Khalifa, S., Taji, D., Oudah, M., Alhafni, B., Inoue, G., Eryani, F., Erdmann, A., Habash, N.: CAMeL tools: An opensource Python toolkit for Arabic natural language processing. In: Proceedings, 12th Language Resources and Evaluation Conference, pp. 7022–7032 (2020) Obeid, O., Zalmout, N., Khalifa, S., Taji, D., Oudah, M., Alhafni, B., Inoue, G., Eryani, F., Erdmann, A., Habash, N.: CAMeL tools: An opensource Python toolkit for Arabic natural language processing. In: Proceedings, 12th Language Resources and Evaluation Conference, pp. 7022–7032 (2020)
19.
Zurück zum Zitat Omar, A., Ibrahim Elghayesh, B., Ali Mohamed Kassem, M.: Authorship attribution revisited: The problem of flash fiction. Arab. World Engl. J. 10(3), 318–329 (2019) Omar, A., Ibrahim Elghayesh, B., Ali Mohamed Kassem, M.: Authorship attribution revisited: The problem of flash fiction. Arab. World Engl. J. 10(3), 318–329 (2019)
20.
Zurück zum Zitat Omar, A., Hamouda, W.I.: The effectiveness of stemming in the stylometric authorship attribution in Arabic. Int. J. Adv. Comput. Sci. Appl. 11(1), 116–121 (2020) Omar, A., Hamouda, W.I.: The effectiveness of stemming in the stylometric authorship attribution in Arabic. Int. J. Adv. Comput. Sci. Appl. 11(1), 116–121 (2020)
21.
Zurück zum Zitat Otoom, A.F., Abdullah, E.E., Jaafer, S., Hamdallh, A., Amer, D.: Towards author identification of Arabic text articles. In: Proceedings, 2014 5th International Conference on Information and Communication Systems (ICICS 2014), pp. 5–8 (2014). https://doi.org/10.1109/IACS.2014.6841971 Otoom, A.F., Abdullah, E.E., Jaafer, S., Hamdallh, A., Amer, D.: Towards author identification of Arabic text articles. In: Proceedings, 2014 5th International Conference on Information and Communication Systems (ICICS 2014), pp. 5–8 (2014). https://​doi.​org/​10.​1109/​IACS.​2014.​6841971
22.
Zurück zum Zitat Ouamour, S., Khennouf, S., Bourib, S., Hadjadj, H., Sayoud, H.: Effect of the text size on stylometry. Application on Arabic religious texts (2016) Ouamour, S., Khennouf, S., Bourib, S., Hadjadj, H., Sayoud, H.: Effect of the text size on stylometry. Application on Arabic religious texts (2016)
24.
Zurück zum Zitat Potha, N., Stamatatos, E.: Intrinsic author verification using topic modeling. In: Proceedings of the 10th Hellenic Conference on Artificial Intelligence, 20 (2018) Potha, N., Stamatatos, E.: Intrinsic author verification using topic modeling. In: Proceedings of the 10th Hellenic Conference on Artificial Intelligence, 20 (2018)
25.
Zurück zum Zitat Stamatatos, E.: Authorship verification: a review of recent advances, research in computing. Science 123, 9–25 (2016) Stamatatos, E.: Authorship verification: a review of recent advances, research in computing. Science 123, 9–25 (2016)
26.
Zurück zum Zitat Halvani, O., Graner, L., Regev, R.: Taveer: an interpretable topic-agnostic authorship verification method. In: Volkamer, M., Wressnegger, C. (eds.): ARES 2020: The 15th International Conference on Availability, Reliability and Security, ACM, pp. 41:1–41:10 (2020) Halvani, O., Graner, L., Regev, R.: Taveer: an interpretable topic-agnostic authorship verification method. In: Volkamer, M., Wressnegger, C. (eds.): ARES 2020: The 15th International Conference on Availability, Reliability and Security, ACM, pp. 41:1–41:10 (2020)
27.
Zurück zum Zitat Stamatatos, E.: Masking topic-related information to enhance authorship attribution. JASIST 69(3), 461–473 (2018) Stamatatos, E.: Masking topic-related information to enhance authorship attribution. JASIST 69(3), 461–473 (2018)
28.
Zurück zum Zitat Ding, S., Fung, B., Iqbal, F., Cheung, W.: Learning stylometric representations for authorship analysis. IEEE Trans. Cybern. 49, 107–121 (2019)CrossRef Ding, S., Fung, B., Iqbal, F., Cheung, W.: Learning stylometric representations for authorship analysis. IEEE Trans. Cybern. 49, 107–121 (2019)CrossRef
29.
Zurück zum Zitat Boenninghoff, B., Nickel, R.M., Zeiler, S., Kolossa, D.: Similarity learning for authorship verification in social media. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2457–2461 (2019). https://doi.org/10.1109/ICASSP.2019.8683405 Boenninghoff, B., Nickel, R.M., Zeiler, S., Kolossa, D.: Similarity learning for authorship verification in social media. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2457–2461 (2019). https://​doi.​org/​10.​1109/​ICASSP.​2019.​8683405
30.
Zurück zum Zitat Hosseinia, M., Mukherjee, A.: Parallel attention recurrent neural network for style change detection. In: Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.): Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (2018) Hosseinia, M., Mukherjee, A.: Parallel attention recurrent neural network for style change detection. In: Cappellato, L., Ferro, N., Nie, J.Y., Soulier, L. (eds.): Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings, CLEF and CEUR-WS.org (2018)
32.
Zurück zum Zitat Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news (2017) Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news (2017)
33.
Zurück zum Zitat Oren, H., Christian, W., Lukas, G.: Authorship verification based on compression-models (2017) Oren, H., Christian, W., Lukas, G.: Authorship verification based on compression-models (2017)
34.
Zurück zum Zitat Yadav, Himank & Li, Juliang. (2017). Social media writing style fingerprint Yadav, Himank & Li, Juliang. (2017). Social media writing style fingerprint
35.
Zurück zum Zitat Hosseinia, M., Mukherjee, A.: Experiments with neural networks for small and large scale authorship verification (2018) Hosseinia, M., Mukherjee, A.: Experiments with neural networks for small and large scale authorship verification (2018)
36.
Zurück zum Zitat Halvani, O., Christian, W., Lukas, G.: Unary and binary classification approaches and their implications for authorship verification (2018) Halvani, O., Christian, W., Lukas, G.: Unary and binary classification approaches and their implications for authorship verification (2018)
37.
Zurück zum Zitat Halvani, O., Winter, C., Graner, L.: Assessing the applicability of authorship verification methods (2019) Halvani, O., Winter, C., Graner, L.: Assessing the applicability of authorship verification methods (2019)
40.
Zurück zum Zitat Corbara, S., Moreo Fernández, A., Sebastiani, F., Tavoni, M.: MedLatinEpi and MedLatinLit : two datasets for the computational authorship analysis of medieval Latin texts. J. Comput. Cult. Herit. 15 (2022). https://doi.org/10.1145/3485822 Corbara, S., Moreo Fernández, A., Sebastiani, F., Tavoni, M.: MedLatinEpi and MedLatinLit : two datasets for the computational authorship analysis of medieval Latin texts. J. Comput. Cult. Herit. 15 (2022). https://​doi.​org/​10.​1145/​3485822
41.
Zurück zum Zitat Halvani, O., Graner, L., Regev, R. (2020). A step towards interpretable authorship verification. Halvani, O., Graner, L., Regev, R. (2020). A step towards interpretable authorship verification.
43.
Zurück zum Zitat Boenninghoff, B., Rupp, J., Nickel, R., Kolossa, D.: Deep Bayes Factor Scoring for Authorship Verification (2020) Boenninghoff, B., Rupp, J., Nickel, R., Kolossa, D.: Deep Bayes Factor Scoring for Authorship Verification (2020)
44.
Zurück zum Zitat Ruggiero, G., Gatt, A., Nissim, M.: Datasets and Models for Authorship Attribution on Italian Personal Writings (2020) Ruggiero, G., Gatt, A., Nissim, M.: Datasets and Models for Authorship Attribution on Italian Personal Writings (2020)
Metadaten
Titel
Review of Various Approaches for Authorship Identification in Digital Forensics
verfasst von
Riya Sanjesh
J. Alamelu Mangai
Copyright-Jahr
2024
Verlag
Springer Nature Singapore
DOI
https://doi.org/10.1007/978-981-97-0892-5_19