Skip to main content
Log in

Graph Theoretic and Spectral Analysis of Enron Email Data

  • Published:
Computational & Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

Analysis of social networks to identify communities and model their evolution has been an active area of recent research. This paper analyzes the Enron email data set to discover structures within the organization. The analysis is based on constructing an email graph and studying its properties with both graph theoretical and spectral analysis techniques. The graph theoretical analysis includes the computation of several graph metrics such as degree distribution, average distance ratio, clustering coefficient and compactness over the email graph. The spectral analysis shows that the email adjacency matrix has a rank-2 approximation. It is shown that preprocessing of data has significant impact on the results, thus a standard form is needed for establishing a benchmark data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adibi, J. and J. Shetty, The Enron Email Dataset Database Schema and Brief Statistical Report, http://www.isi.edu/adibi/Enron/Enron_Dataset_Report.pdf.

  • Browne, M. and M.W. Berry (2005), “Email Surveillance Using Nonnegative Matrix Factorization,” in Proceeding of SIAMInternational Conference on Data Mining, SIAM Workshop on Link Analysis, Counterterrorism and Security.

  • Chapanond, A. and M.S. Krishnamoorthy (2004), “User Classification for P2P Network,” Unpublished Manuscript,Rensselaer Polytechnic Institute, Troy, NY.

  • Corrada-Emmanuel, A., A. McCallum, and X. Wang (2004), Language Use in a Social Network: The Enron Email Dataset, CNLP Seminars.

  • Diesner, J. and K. Carley (2005), “Exploration of Communication Networks from the Enron Email Corpus,” in Proceeding of SIAMInternational Conference on Data Mining, SIAM Workshop on Link Analysis, Counterterrorism and Security.

  • Drineas, P., M.S. Krishnamoorthy, M.D. Sofka and B. Yener(2004), “Studying E-mail Graphs for Intelligence Monitoring and Analysis in the Absence of Semantic Information,” in IEEEInternational Conference on Intelligence and Security Informatics.

  • Enron Email Dataset, http://www-2.cs.cmu.edu/enron/.

  • Golub, G. and F. Van Loan (1984), Matrix Computations, Johns Hopkins University Press.

  • Han, J. and M. Kamber (2001), Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers.

  • Houston Chronicles, http://www.chron.com/content/chronicle/special/01/enron/index.html.

  • Kalman, Y. and S. Rafacli (2005), Email Chronemics:Unobtrusive Profiling of Response Times, HICSS-38, Hawaii.

  • Karypis, G., E.H. Han and V. Kumar (1998), “CHAMELEON: AHierarchical Clustering Algorithm of Spatial Data,” in Proceedings of the 8th Symposium Spatial Data Handling,Vancouver, Canada, pp. 45–55.

    Google Scholar 

  • Keila, P.S. and D.B. Skillicorn (2005), “Structure in the Enron Email Dataset,” in Proceeding of SIAM International Conference on Data Mining, SIAM Workshop on Link Analysis, Counterterrorism and Security.

  • Klimt, B. and Y. Yang (2004), “The Enron Corpus: A New Dataset for Email Classification Research,” To be published in Proceedings ofthe European Conference on Machine Learning (ECML).

  • Loch, C.H., J.R. Tyler, and R. Lukose (submitted), “Conversational Structure in Email and Face to Face Communication,” Draft, submitted to Organization Science.

  • McCallum, A., A. Corrada-Emmanuel, and X. Wang (2005), “The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks, with Application to Enron and Academic Email”, inProceeding of SIAM International Conference on Data Mining, SIAMWorkshop on Link Analysis, Counterterrorism and Security.

  • Newman, M.E.J. (2003), “The Structure and Function of Complex Networks,” In SIAM Review, June 2003.

  • Preston, N. and M. Krishnamoorthy (2004), “Graph Draw: A Graph Drawing System to study Social Networks,” Unpublished Manuscript, Rensselaer Polytechnic Institute, Troy, NY.

  • Priebe, C.E., J.M. Conroy, D.J. Marchette, and Y. Park (2005), “Scan Statistics on Enron Graphs,” in Proceeding of SIAMInternational Conference on Data Mining, SIAM Workshop on Link Analysis, Counterterrorism and Security.

  • Tyler, J.R., M.D. Wilkinson and B.A. Huberman (2003), “Email as Spectroscopy: Automated Discovery of Community Structure within Organizations,” in Proceeding of the International Conference on Communities and Technologies, Netherlands, kluwer Academic Publishers.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anurat Chapanond.

Additional information

Anurat Chapanond is currently a Ph.D. student in Computer Science, RPI. Anurat graduated B. Eng. degree in Computer Engineering from Chiangmai University (Thailand) in 1997, M. S. in Computer Science from Columbia University in 2002. His research interest is in web data mining analyses and algorithms.

M.S. Krishnamoorthy received the B.E. degree (with honors) from Madras University in 1969, the M. Tech degree in Electrical Engineering from the Indian Institute of Technology, Kanpur, in 1971, and the Ph. D. degree in Computer Science, also from the Indian Institute of Technology, in 1976.

From 1976 to 1979, he was an Assistant Professor of Computer Science at the Indian Institute of Technology, Kanpur. From 1979 to 1985, he was an Assistant Professor of Computer Science at Rensselaer Polytechnic Institute, Troy, NY, and since, 1985, he has been an Associate Professor of Computer Science at Rensselaer. Dr. Krishnamoorthy's research interests are in the design and analysis of combinatorial and algebraic algorithms, visualization algorithms and programming environments.

Bulent Yener is an Associate Professor in the Department of Computer Science and Co-Director of Pervasive Computing and Networking Center at Rensselaer Polytechnic Institute in Troy, New York. He is also a member of Griffiss Institute of Information Assurance.

Dr. Yener received MS. and Ph.D. degrees in Computer Science, both from Columbia University, in 1987 and 1994, respectively. Before joining to RPI, he was a Member of Technical Staff at the Bell Laboratories in Murray Hill, New Jersey.

His current research interests include bioinformatics, medical informtatics, routing problems in wireless networks, security and information assurance, intelligence and security informatics. He has served on the Technical Program Committee of leading IEEE conferences and workshops. Currently He is an associate editor of ACM/Kluwer Winet journal and the IEEE Network Magazine. Dr. Yener is a Senior Member of the IEEE Computer Society.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chapanond, A., Krishnamoorthy, M.S. & Yener, B. Graph Theoretic and Spectral Analysis of Enron Email Data. Comput Math Organiz Theor 11, 265–281 (2005). https://doi.org/10.1007/s10588-005-5381-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-005-5381-4

Keywords

Navigation