Skip to main content
Log in

Email Surveillance Using Non-negative Matrix Factorization

  • Published:
Computational & Mathematical Organization Theory Aims and scope Submit manuscript

Abstract

In this study, we apply a non-negative matrix factorization approach for the extraction and detection of concepts or topics from electronic mail messages. For the publicly released Enron electronic mail collection, we encode sparse term-by-message matrices and use a low rank non-negative matrix factorization algorithm to preserve natural data non-negativity and avoid subtractive basis vector and encoding interactions present in techniques such as principal component analysis. Results in topic detection and message clustering are discussed in the context of published Enron business practices and activities, and benchmarks addressing the computational complexity of our approach are provided. The resulting basis vectors and matrix projections of this approach can be used to identify and monitor underlying semantic features (topics) and message clusters in a general or high-level way without the need to read individual electronic mail messages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Berry, M. and M. Browne (2005), Understanding Search Engines: Mathematical Modeling and Text Retrieval (2nd ed.). Philadelphia, PA: SIAM.

    Google Scholar 

  • Berry, M., Z. Drmač, and E. Jessup (1999), “Matrices, Vector Spaces, and Information Retrieval,” SIAM Review, 41(2), 335–362.

    Google Scholar 

  • Donoho, D. and V. Stodden (2003), “When does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?” Technical report,Department of Statistics, Stanford University. Preprint.

  • Giles, J., L. Wo, and M. Berry (2003), “GTP (General Text Parser) Softwarefor Text Mining,” in H. Bozdogan (Ed.), Software for Text Mining, in Statistical Data Mining and Knowledge Discovery, Boca Raton,FL: CRC Press, pp. 455–471.

    Google Scholar 

  • Grieve, T. (2003, October 14). The Decline and Fall of the Enron Empire.Slate. http://www.salon.com/news/feature/2003/10/14/enron/index_np.html.

  • Guillamet, D. and J. Vitria (2002), “Determining a Suitable Metricwhen Using Non-Negative Matrix Factorization,” in Sixteenth International Conference on Pattern Recognition (ICPR'02), Vol. 2, Quebec City, QC, Canada.

  • Hoyer, P. (2002), “Non-Negative Sparse Coding,” in Proceedings of the IEEEWorkshop on Neural Networks for Signal Processing, Martigny, Switzerland.

  • Hyvärinen, A. and P. Hoyer (2000), “Emergence of Phase and Shift Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces,” Neural Computation, 12(7), 1705–1720.

    Google Scholar 

  • Jolliffe, I. (2002), Principle Component Analysis (2nd ed.).New York: Springer-Verlag.

    Google Scholar 

  • Keila, P. and D. Skillicorn (2005, April 23), “Structure in the Enron Email Dataset,” in Proceedings of the Link Analysis, Counterterrorism, and Security Workshop, Fifth SIAM International Conference on Data Mining, Newport Beach, CA, pp. 55–64.

  • Lee, D. and H. Seung (1999), “Learning the Parts of Objects by Non-Negative Matrix Factorization,” Nature, 401, 788–791.

    Google Scholar 

  • Lee, D. and H. Seung (2001), “Algorithms for Non-Negative Matrix Factorization,” Advances in Neural Information Processing Systems, 13, 556–562.

    Google Scholar 

  • Liu, W. and J. Yi (2003), “Existing and New Algorithms for Non-Negative Matrix Factorization,” Technical report, Departmentof Computer Sciences, University of Texas at Austin. Preprint.

  • McCallum, A., A. Corrada-Emmanuel, and X. Wang (2005, April 23). “The Author-Recipient-Topic Model for Topic and Role Discovery in Social Networks with Application to Enron and Academic Email,” in Proceedings ofthe Link Analysis, Counterterrorism, and Security Workshop, Fifth SIAMInternational Conference on Data Mining, Newport Beach, CA, pp. 33–44.

  • McLean, B. and P. Elkind (2003), The Smartest Guys in the Room: The Amazing Rise and Scandalous Fall of Enron. Portfolio.

  • Mu, Z., R. Plemmons, and P. Santago (2003), “Iterative Ultrasonic Signaland Image Deconvolution for Estimating the Complex Medium Response,”in IEEE Transactions on Ultrasonics and Frequency Control, IEEE, Submitted for publication.

  • Prasad, S., T. Torgersen, V. Pauca, R. Plemmons, and J. van der Gracht(2003), “Restoring Images with Space Variant Blur via Pupil Phase Engineering,” Optics in Info. Systems, Special Issue on Comp. Imaging, SPIEInt. Tech. Group Newsletter, 14(2), 4–5.

    Google Scholar 

  • Shahnaz, F., M. Berry, V. Pauca, and R. Plemmons (2006), “Document Clustering Using Nonnegative Matrix Factorization,” Information Processing and Management, 42(2), 373–386.

    Article  Google Scholar 

  • Xu, W., X. Liu, and Y. Gong (2003), “Document-Clustering based on Non-negative Matrix Factorization,” in Proceedings of SIGIR'03, July 28–August 1, Toronto, CA, pp. 267–273.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael W. Berry.

Additional information

Michael W. Berry is a Professor and Interim Department Head in the Department of Computer Science at the University of Tennessee and a faculty member in the Graduate School in Genome Science and Technology Program at the University of Tennessee and Oak Ridge National Laboratory. His research interests include information retrieval, data mining, scientific computing, computational science, and numerical linear algebra. He is a member of the Society for Industrial and Applied Mathematics (SIAM), Association for Computing Machinery (ACM), and the Computer Society of the Institute of Electrical and Electronics (IEEE). Professor Berry is on the editorial boards of “Computing in Science and Engineering” (IEEE Computer Society and the American Institute of Physics) and the SIAM Journal of Scientific Computing.

Murray Browne is a Research Associate in the Department of Computer Science at the University of Tennessee. He is a member of the American Society for Information Science and Technology and has published numerous essays, book reviews, newspaper articles, and feature stories.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Berry, M.W., Browne, M. Email Surveillance Using Non-negative Matrix Factorization. Comput Math Organiz Theor 11, 249–264 (2005). https://doi.org/10.1007/s10588-005-5380-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10588-005-5380-5

Keywords

Navigation