Skip to main content
Log in

Machine learning methods for analyzing user behavior when accessing text data in information security problems

  • Published:
Moscow University Computational Mathematics and Cybernetics Aims and scope Submit manuscript

Abstract

A new method for detecting user access to irrelevant documents based on estimating the document text membership in typical subject areas of the analyzed user is proposed. The typical subject areas are formed using subject area modeling implemented via orthonormal nonnegative matrix factorization. An experimental study with real corporate correspondence formed from an Enron data set demonstrates the high classification accuracy of the proposed method, compared to traditional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. H. Zafar and J. G. Clark, “Current state of information security research in IS” Commun. of the Associat. Inf. Syst. 24 (1), 557–596 (2009).

    Google Scholar 

  2. R. E. Crossler, A. C. Johnston, P. B. Lowry, et al., “Future directions for behavioral information security research,” Computers & Security 32, 90–101 (2013).

    Article  Google Scholar 

  3. R. V. Yampolskiy and V. Govindaraju, “Behavioural biometrics: a survey and classification,” Intern. J. Biometrics 1 (1), 81–113 (2008).

    Article  Google Scholar 

  4. D. V. Tsarev, M. I. Petrovskiy, I. V. Mashechkin, et al., “Automatic text summarization using latent semantic analysis,” Program. Comput. Software 37 (6), 299–305 (2011).

    Article  MathSciNet  MATH  Google Scholar 

  5. D. V. Tsarev, M. I. Petrovskiy, and I. V. Mashechkin, “Using NMF-based text summarization to improve supervised and unsupervised classification,” in 11th International Conference on Hybrid Intelligent Systems (HIS 2011), Malacca, Malaysia, 2011 (IEEE, 2011), pp. 185–189.

    Chapter  Google Scholar 

  6. I. V. Mashechkin, M. I. Petrovskii, and D. V. Tsarev, “Methods for calculation of text fragment relevance based on subject areamodels in the problemof automatic annotation,” Numer. Methods and Program 14 (1), 91–102 (2013).

    Google Scholar 

  7. C. D. Manning, P. Raghavan, and H. Schutze, “Introduction to Information Retrieval” (Cambridge University Press, Cambridge, 2008).

    Book  MATH  Google Scholar 

  8. D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature. 401 (6755), 788–791 (1999).

    Article  Google Scholar 

  9. W. Xu, X. Liu, and Y. Gong, “Document clustering based on non-negative matrix factorization,” Proc. of the 26th Annual Intern. ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM, 267–273 (2003).

    Google Scholar 

  10. M. W. Berry, M. Browne, A. N. Langville, et al., “Algorithms and applications for approximate nonnegative matrix factorization,” Comput. Statist. & Data Anal. 52 (1), 155–173 (2007).

    Article  MathSciNet  MATH  Google Scholar 

  11. A. Mirzal, Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations. arXiv:1010.5290. Submitted October 26, 2010.

    Google Scholar 

  12. Enron Email Dataset. http://www.cs.cmu.edu/~./enron/

  13. Natural Language Toolkit (NLTK). http://www.nltk.org

  14. A. Schclar, L. Rokach, A. Abramson, et al., “User authentication based on representative users,” IEEE Trans. on Systems,Man, and Cybernetics, Part C: Applications and Reviews 42 (6), 1669–1678 (2012).

    Article  Google Scholar 

  15. H. Gascon, S. Uellenbeck, C. Wolf, et al., “Continuous authentication on mobile devices by analysis of typing motion behavior,” in Proceedings of GI Conference “Sicherheit”," Vienna, Austria, 2014 (Köllen Druck+Verlag, Bonn, 2014), p. 1–12.

    Google Scholar 

  16. Y. Song, M. Ben Salem, S. Hershkop, et al., “System level user behavior biometrics using Fisher features and Gaussianmixture models,” in Proceedings of IEEE CS Security and PrivacyWorkshops (SPW), San Francisco, USA, 2013 (IEEE, Washington, 2013), p. 52–59.

    Google Scholar 

  17. R: Data Analysis and Visualization. http://r-analytics.blogspot.ru/2011/11/r_08.html

  18. Y. Li, B. Zhang, Y. Cao, et al., “Study on the BeiHang keystroke dynamics database,” in Proceedings of 2011 IEEE International Joint Conference on Biometrics (IJCB), Washington, USA, 2011 (IEEE, Washington, 2011), pp. 1–5.

    Google Scholar 

  19. K. O. Bailey, Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion (Air Force Institute of Technology, Ohio, 2013).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. V. Mashechkin.

Additional information

This work was supported by the RF Ministry of Education and Science, agreement no. 14.604.21.0056, unique identifier RFMEFI60414X0056.

Original Russian Text © I.V. Mashechkin, M.I. Petrovskii, D.V. Tsarev, 2016, published in Vestnik Moskovskogo Universiteta, Seriya 15: Vychislitel’naya Matematika i Kibernetika, 2016, No. 4, pp. 33–39.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mashechkin, I.V., Petrovskii, M.I. & Tsarev, D.V. Machine learning methods for analyzing user behavior when accessing text data in information security problems. MoscowUniv.Comput.Math.Cybern. 40, 179–184 (2016). https://doi.org/10.3103/S0278641916040051

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0278641916040051

Keywords

Navigation