Abstract
A new method for detecting user access to irrelevant documents based on estimating the document text membership in typical subject areas of the analyzed user is proposed. The typical subject areas are formed using subject area modeling implemented via orthonormal nonnegative matrix factorization. An experimental study with real corporate correspondence formed from an Enron data set demonstrates the high classification accuracy of the proposed method, compared to traditional approaches.
Similar content being viewed by others
References
H. Zafar and J. G. Clark, “Current state of information security research in IS” Commun. of the Associat. Inf. Syst. 24 (1), 557–596 (2009).
R. E. Crossler, A. C. Johnston, P. B. Lowry, et al., “Future directions for behavioral information security research,” Computers & Security 32, 90–101 (2013).
R. V. Yampolskiy and V. Govindaraju, “Behavioural biometrics: a survey and classification,” Intern. J. Biometrics 1 (1), 81–113 (2008).
D. V. Tsarev, M. I. Petrovskiy, I. V. Mashechkin, et al., “Automatic text summarization using latent semantic analysis,” Program. Comput. Software 37 (6), 299–305 (2011).
D. V. Tsarev, M. I. Petrovskiy, and I. V. Mashechkin, “Using NMF-based text summarization to improve supervised and unsupervised classification,” in 11th International Conference on Hybrid Intelligent Systems (HIS 2011), Malacca, Malaysia, 2011 (IEEE, 2011), pp. 185–189.
I. V. Mashechkin, M. I. Petrovskii, and D. V. Tsarev, “Methods for calculation of text fragment relevance based on subject areamodels in the problemof automatic annotation,” Numer. Methods and Program 14 (1), 91–102 (2013).
C. D. Manning, P. Raghavan, and H. Schutze, “Introduction to Information Retrieval” (Cambridge University Press, Cambridge, 2008).
D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature. 401 (6755), 788–791 (1999).
W. Xu, X. Liu, and Y. Gong, “Document clustering based on non-negative matrix factorization,” Proc. of the 26th Annual Intern. ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM, 267–273 (2003).
M. W. Berry, M. Browne, A. N. Langville, et al., “Algorithms and applications for approximate nonnegative matrix factorization,” Comput. Statist. & Data Anal. 52 (1), 155–173 (2007).
A. Mirzal, Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations. arXiv:1010.5290. Submitted October 26, 2010.
Enron Email Dataset. http://www.cs.cmu.edu/~./enron/
Natural Language Toolkit (NLTK). http://www.nltk.org
A. Schclar, L. Rokach, A. Abramson, et al., “User authentication based on representative users,” IEEE Trans. on Systems,Man, and Cybernetics, Part C: Applications and Reviews 42 (6), 1669–1678 (2012).
H. Gascon, S. Uellenbeck, C. Wolf, et al., “Continuous authentication on mobile devices by analysis of typing motion behavior,” in Proceedings of GI Conference “Sicherheit”," Vienna, Austria, 2014 (Köllen Druck+Verlag, Bonn, 2014), p. 1–12.
Y. Song, M. Ben Salem, S. Hershkop, et al., “System level user behavior biometrics using Fisher features and Gaussianmixture models,” in Proceedings of IEEE CS Security and PrivacyWorkshops (SPW), San Francisco, USA, 2013 (IEEE, Washington, 2013), p. 52–59.
R: Data Analysis and Visualization. http://r-analytics.blogspot.ru/2011/11/r_08.html
Y. Li, B. Zhang, Y. Cao, et al., “Study on the BeiHang keystroke dynamics database,” in Proceedings of 2011 IEEE International Joint Conference on Biometrics (IJCB), Washington, USA, 2011 (IEEE, Washington, 2011), pp. 1–5.
K. O. Bailey, Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion (Air Force Institute of Technology, Ohio, 2013).
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by the RF Ministry of Education and Science, agreement no. 14.604.21.0056, unique identifier RFMEFI60414X0056.
Original Russian Text © I.V. Mashechkin, M.I. Petrovskii, D.V. Tsarev, 2016, published in Vestnik Moskovskogo Universiteta, Seriya 15: Vychislitel’naya Matematika i Kibernetika, 2016, No. 4, pp. 33–39.
About this article
Cite this article
Mashechkin, I.V., Petrovskii, M.I. & Tsarev, D.V. Machine learning methods for analyzing user behavior when accessing text data in information security problems. MoscowUniv.Comput.Math.Cybern. 40, 179–184 (2016). https://doi.org/10.3103/S0278641916040051
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0278641916040051