Machine learning methods for analyzing user behavior when accessing text data in information security problems

Mashechkin, I. V.; Petrovskii, M. I.; Tsarev, D. V.

doi:10.3103/S0278641916040051

Machine learning methods for analyzing user behavior when accessing text data in information security problems

Published: 12 November 2016

Volume 40, pages 179–184, (2016)
Cite this article

Moscow University Computational Mathematics and Cybernetics Aims and scope Submit manuscript

I. V. Mashechkin¹,
M. I. Petrovskii¹ &
D. V. Tsarev¹

92 Accesses
2 Citations
Explore all metrics

Abstract

A new method for detecting user access to irrelevant documents based on estimating the document text membership in typical subject areas of the analyzed user is proposed. The typical subject areas are formed using subject area modeling implemented via orthonormal nonnegative matrix factorization. An experimental study with real corporate correspondence formed from an Enron data set demonstrates the high classification accuracy of the proposed method, compared to traditional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised Machine Learning Text Classification: A Review

ICIS: A Model for Context-Based Classification of Sensitive Personal Information

Document Security Identification Based on Multi-classifier

References

H. Zafar and J. G. Clark, “Current state of information security research in IS” Commun. of the Associat. Inf. Syst. 24 (1), 557–596 (2009).
Google Scholar
R. E. Crossler, A. C. Johnston, P. B. Lowry, et al., “Future directions for behavioral information security research,” Computers & Security 32, 90–101 (2013).
Article Google Scholar
R. V. Yampolskiy and V. Govindaraju, “Behavioural biometrics: a survey and classification,” Intern. J. Biometrics 1 (1), 81–113 (2008).
Article Google Scholar
D. V. Tsarev, M. I. Petrovskiy, I. V. Mashechkin, et al., “Automatic text summarization using latent semantic analysis,” Program. Comput. Software 37 (6), 299–305 (2011).
Article MathSciNet MATH Google Scholar
D. V. Tsarev, M. I. Petrovskiy, and I. V. Mashechkin, “Using NMF-based text summarization to improve supervised and unsupervised classification,” in 11th International Conference on Hybrid Intelligent Systems (HIS 2011), Malacca, Malaysia, 2011 (IEEE, 2011), pp. 185–189.
Chapter Google Scholar
I. V. Mashechkin, M. I. Petrovskii, and D. V. Tsarev, “Methods for calculation of text fragment relevance based on subject areamodels in the problemof automatic annotation,” Numer. Methods and Program 14 (1), 91–102 (2013).
Google Scholar
C. D. Manning, P. Raghavan, and H. Schutze, “Introduction to Information Retrieval” (Cambridge University Press, Cambridge, 2008).
Book MATH Google Scholar
D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature. 401 (6755), 788–791 (1999).
Article Google Scholar
W. Xu, X. Liu, and Y. Gong, “Document clustering based on non-negative matrix factorization,” Proc. of the 26th Annual Intern. ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM, 267–273 (2003).
Google Scholar
M. W. Berry, M. Browne, A. N. Langville, et al., “Algorithms and applications for approximate nonnegative matrix factorization,” Comput. Statist. & Data Anal. 52 (1), 155–173 (2007).
Article MathSciNet MATH Google Scholar
A. Mirzal, Converged Algorithms for Orthogonal Nonnegative Matrix Factorizations. arXiv:1010.5290. Submitted October 26, 2010.
Google Scholar
Enron Email Dataset. http://www.cs.cmu.edu/~./enron/
Natural Language Toolkit (NLTK). http://www.nltk.org
A. Schclar, L. Rokach, A. Abramson, et al., “User authentication based on representative users,” IEEE Trans. on Systems,Man, and Cybernetics, Part C: Applications and Reviews 42 (6), 1669–1678 (2012).
Article Google Scholar
H. Gascon, S. Uellenbeck, C. Wolf, et al., “Continuous authentication on mobile devices by analysis of typing motion behavior,” in Proceedings of GI Conference “Sicherheit”," Vienna, Austria, 2014 (Köllen Druck+Verlag, Bonn, 2014), p. 1–12.
Google Scholar
Y. Song, M. Ben Salem, S. Hershkop, et al., “System level user behavior biometrics using Fisher features and Gaussianmixture models,” in Proceedings of IEEE CS Security and PrivacyWorkshops (SPW), San Francisco, USA, 2013 (IEEE, Washington, 2013), p. 52–59.
Google Scholar
R: Data Analysis and Visualization. http://r-analytics.blogspot.ru/2011/11/r_08.html
Y. Li, B. Zhang, Y. Cao, et al., “Study on the BeiHang keystroke dynamics database,” in Proceedings of 2011 IEEE International Joint Conference on Biometrics (IJCB), Washington, USA, 2011 (IEEE, Washington, 2011), pp. 1–5.
Google Scholar
K. O. Bailey, Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion (Air Force Institute of Technology, Ohio, 2013).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computational Mathematics and Cybernetics, Moscow State University, Moscow, 119991, Russia
I. V. Mashechkin, M. I. Petrovskii & D. V. Tsarev

Authors

I. V. Mashechkin
View author publications
You can also search for this author in PubMed Google Scholar
M. I. Petrovskii
View author publications
You can also search for this author in PubMed Google Scholar
D. V. Tsarev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I. V. Mashechkin.

Additional information

This work was supported by the RF Ministry of Education and Science, agreement no. 14.604.21.0056, unique identifier RFMEFI60414X0056.

Original Russian Text © I.V. Mashechkin, M.I. Petrovskii, D.V. Tsarev, 2016, published in Vestnik Moskovskogo Universiteta, Seriya 15: Vychislitel’naya Matematika i Kibernetika, 2016, No. 4, pp. 33–39.

About this article

Cite this article

Mashechkin, I.V., Petrovskii, M.I. & Tsarev, D.V. Machine learning methods for analyzing user behavior when accessing text data in information security problems. MoscowUniv.Comput.Math.Cybern. 40, 179–184 (2016). https://doi.org/10.3103/S0278641916040051

Download citation

Received: 06 April 2016
Published: 12 November 2016
Issue Date: October 2016
DOI: https://doi.org/10.3103/S0278641916040051

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning methods for analyzing user behavior when accessing text data in information security problems

Abstract

Access this article

Similar content being viewed by others

Supervised Machine Learning Text Classification: A Review

ICIS: A Model for Context-Based Classification of Sensitive Personal Information

Document Security Identification Based on Multi-classifier

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Navigation

Machine learning methods for analyzing user behavior when accessing text data in information security problems

Abstract

Access this article

Similar content being viewed by others

Supervised Machine Learning Text Classification: A Review

ICIS: A Model for Context-Based Classification of Sensitive Personal Information

Document Security Identification Based on Multi-classifier

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Search

Navigation