Towards an integrated e-mail forensic analysis framework

https://doi.org/10.1016/j.diin.2009.01.004Get rights and content

Abstract

Due to its simple and inherently vulnerable nature, e-mail communication is abused for numerous illegitimate purposes. E-mail spamming, phishing, drug trafficking, cyber bullying, racial vilification, child pornography, and sexual harassment are some common e-mail mediated cyber crimes. Presently, there is no adequate proactive mechanism for securing e-mail systems. In this context, forensic analysis plays a major role by examining suspected e-mail accounts to gather evidence to prosecute criminals in a court of law. To accomplish this task, a forensic investigator needs efficient automated tools and techniques to perform a multi-staged analysis of e-mail ensembles with a high degree of accuracy, and in a timely fashion. In this article, we present our e-mail forensic analysis software tool, developed by integrating existing state-of-the-art statistical and machine-learning techniques complemented with social networking techniques. In this framework we incorporate our two proposed authorship attribution approaches; one is presented for the first time in this article.

Section snippets

Motivations and background

In the majority of e-mail mediated cyber crimes, the victimization tactics used vary from simple anonymity to identity theft and impersonation. Due to two inherent limitations, e-mail communication is exposed to such illegitimate uses. One, there is no mechanism for message encryption at the sender end and/or an integrity check at the recipient end. Two, the widely used e-mail protocol, Simple Mail Transfer Protocol, lacks a source authentication mechanism. In fact, the metadata in the header

Proposed approach

The theoretical foundation of our framework is based on different well established techniques of statistical analysis, text mining (classification and clustering), and stylometric features analysis, together with behavioral modeling achieved by using social networking techniques. Stylometry is the statistical study of five different writing style (lexical, syntactic, structural, domain-specific and idiosyncratic) features (see Section 2.3.1). E-mail social network analysis is complemented by

Our framework (IEFAF)

IEFAF is an integrated analysis platform in which a security analyst can perform a variety of tasks related to e-mail analysis.

IEFAF is programmed in Java using several Java technologies like Java Swing, the Java Mail API, and JDBC. Swing is used to build the graphical interface and for information rendering in different visual formats (tree, list, picture, etc.). The Java Mail API is used to parse e-mails in several file formats and extract relevant information. JDBC allows us to connect to

Conclusion

As a result of growing e-mail misuse, investigators need efficient automated methods and tools for analyzing e-mails. In our work, we developed an e-mail analysis framework to assist investigators gather clues and evidence in an investigation in which e-mail communication is relevant. The framework offers different functionalities ranging from e-mail storing, editing, searching, and querying to more advanced functionalities such as authorship attribution and e-mail account localization.

References (24)

  • A. Abbasi et al.

    Writeprints: a stylometric approach to identity-level identification and similarity detection in cyberspace

    ACM Transactions on Information Systems

    (March 2008)
  • R. Agrawal et al.

    Mining association rules between sets of items in large databases

    ACM SIGMOD Record

    (June 1993)
  • R.H. Baayen et al.

    Outside the cave of shadows: using syntactic annotation to enhance authorship attribution

    Literary and Linguistic Computing

    (1996)
  • M. Bhattacharyya et al.

    MET: an experimental system for malicious email tracking

  • Malcolm Corney et al.

    Gender-preferential text mining of e-mail discourse

  • O. de Vel

    Mining e-mail authorship

  • O. de Vel et al.

    Mining e-mail content for author identification forensics

    SIGMOD Record

    (December 2001)
  • J.M. Farringdon

    Analyzing for authorship: a guide to the Cusum technique

    (2001)
  • R.S. Forsyth et al.

    Feature-finding for text classification

    Literary and Linguistic Computing

    (1996)
  • D. Gunopulos et al.

    Automatic subspace clustering of high dimensional data for data mining applications

  • D.I. Holmes

    The evolution of stylometry in humanities

    Literary and Linguistic Computing

    (1998)
  • F. Iqbal et al.

    A novel approach of mining write-prints for authorship attribution in e-mail forensics

    Digital Investigation

    (2008)
  • Cited by (58)

    View all citing articles on Scopus
    View full text