skip to main content
10.1145/3229607.3229612acmconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article
Free Access

Stream-based Machine Learning for Network Security and Anomaly Detection

Authors Info & Claims
Published:07 August 2018Publication History

ABSTRACT

Data Stream Machine Learning is rapidly gaining popularity within the network monitoring community as the big data produced by network devices and end-user terminals goes beyond the memory constraints of standard monitoring equipment. Critical network monitoring applications such as the detection of anomalies, network attacks and intrusions, require fast and continuous mechanisms for on-line analysis of data streams. In this paper we consider a stream-based machine learning approach for network security and anomaly detection, applying and evaluating multiple machine learning algorithms in the analysis of continuously evolving network data streams. The continuous evolution of the data stream analysis algorithms coming from the data stream mining domain, as well as the multiple evaluation approaches conceived for benchmarking such kind of algorithms makes it difficult to choose the appropriate machine learning model. Results of the different approaches may significantly differ and it is crucial to determine which approach reflects the algorithm performance the best. We therefore compare and analyze the results from the most recent evaluation approaches for sequential data on commonly used batch-based machine learning algorithms and their corresponding stream-based extensions, for the specific problem of on-line network security and anomaly detection. Similar to our previous findings when dealing with off-line machine learning approaches for network security and anomaly detection, our results suggest that adaptive random forests and stochastic gradient descent models are able to keep up with important concept drifts in the underlying network data streams, by keeping high accuracy with continuous re-training at concept drift detection times.

References

  1. R. Fontugne, P. Borgnat, P. Abry, and K. Fukuda, "Mawilab: combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking," in Proceedings of the 6th ACM CoNEXT Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Chandola, A. Banerjee, and V. Kumar, "Anomaly detection: A survey," ACM Comput. Surv., vol. 41, no. 3, pp. 15:1--15:58, Jul. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Ahmed, A. Naser Mahmood, and J. Hu, "A survey of network anomaly detection techniques," J. Netw. Comput. Appl., vol. 60, no. C, pp. 19--31, Jan. 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. W. Zhang, Q. Yang, and Y. Geng, "A survey of anomaly detection methods in networks," in 2009 CNMT, Jan 2009, pp. 1--3.Google ScholarGoogle Scholar
  5. T. T. T. Nguyen and G. Armitage, "A survey of techniques for internet traffic classification using machine learning," IEEE Communications Surveys Tutorials, vol. 10, no. 4, pp. 56--76, Fourth 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Vanerio and P. Casas, "Ensemble-learning approaches for network security and anomaly detection," in Proceedings of the ACM SIGCOMM Big-DAMA Workshop. New York, NY, USA: ACM, 2017, pp. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Casas, F. Soro, J. Vanerio, G. Settanni, and A. D'Alconzo, "Network security and anomaly detection with big-dama, a big data analytics framework," in 2017 IEEE 6th CloudNet Conference, Sept 2017, pp. 1--7.Google ScholarGoogle Scholar
  8. P. Casas, J. Vanerio, and K. Fukuda, "Gml learning, a generic machine learning model for network measurements analysis," in 2017 13th International Conference on Network and Service Management (CNSM), Nov 2017, pp. 1--9.Google ScholarGoogle Scholar
  9. P. Casas and J. Vanerio, "Super learning for anomaly detection in cellular networks," in 2017 IEEE 13th WiMob Conference, Oct 2017, pp. 1--8.Google ScholarGoogle Scholar
  10. V. Carela-Español, P. Barlet-Ros, A. Bifet, and K. Fukuda, "A streaming flow-based technique for traffic classification applied to 12+ 1 years of internet traffic," Telecommunication Systems, vol. 63, no. 2, pp. 191--204, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. M. Domingos and G. Hulten, "Catching up with the data: Research issues in mining data streams." in DMKD, 2001.Google ScholarGoogle Scholar
  12. M. Stonebraker, U. Çetintemel, and S. Zdonik, "The 8 requirements of real-time stream processing," ACM Sigmod Record, vol. 34, no. 4, pp. 42--47, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Hulten, P. Domingos, and L. Spencer, Mining massive data streams. University of Washington, 2005.Google ScholarGoogle Scholar
  14. J. Gama, R. Sebastião, and P. P. Rodrigues, "Issues in evaluation of stream learning algorithms," in Proceedings of the 15th ACM SIGKDD Conference. ACM, 2009, pp. 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Gama, R. Sebastião, and P. P. Rodrigues, "On evaluating stream learning algorithms," Machine learning, vol. 90, no. 3, pp. 317--346, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. R. Hoens, R. Polikar, and N. V. Chawla, "Learning from streaming data with concept drift and imbalance: an overview," Progress in Artificial Intelligence, vol. 1, no. 1, pp. 89--101, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. G. Hulten and P. Domingos, "Vfml--a toolkit for mining high-speed time-changing data streams," Software toolkit, p. 51, 2003.Google ScholarGoogle Scholar
  18. X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen et al., "Mllib: Machine learning in apache spark," The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1235--1241, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer, "Moa: Massive online analysis," Journal of Machine Learning Research, vol. 11, no. May, pp. 1601--1604, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. O. Rittho, R. Klinkenberg, S. Fischer, I. Mierswa, and S. Felske, "Yale: Yet another learning environment," in LLWA 01-Tagungsband der GI-Workshop-Woche Lernen-Lehren-Wissen-Adaptivität, no. 763. Citeseer, 2001, pp. 84--92.Google ScholarGoogle Scholar
  21. G. D. F. Morales and A. Bifet, "Samoa: scalable advanced massive online analysis." Journal of Machine Learning Research, vol. 16, no. 1, pp. 149--153, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Bifet and R. Gavalda, "Learning from time-changing data with adaptive windowing," in Proceedings of the 2007 SIAM Conference, 2007, pp. 443--448.Google ScholarGoogle Scholar
  23. A. Bifet, G. de Francisci Morales, J. Read, G. Holmes, and B. Pfahringer, "Efficient online evaluation of big data stream classifiers," in Proceedings of the 21th ACM SIGKDD Conference. ACM, 2015, pp. 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Brzezinski and J. Stefanowski, "Prequential auc: properties of the area under the roc curve for data streams with concept drift," Knowledge and Information Systems, vol. 52, no. 2, pp. 531--562, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, and A. Bouchachia, "A survey on concept drift adaptation," ACM CSUR, vol. 46, no. 4, p. 44, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Casas, A. D'Alconzo, T. Zseby, and M. Mellia, "Big-dama: big data analytics for network traffic monitoring and analysis," in Proceedings of the 2016 ACM SIGCOMM LANCOMM Workshop. ACM, 2016, pp. 1--3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas, "Apache flink: Stream and batch processing in a single engine," Bulletin of the IEEE Computer Society TC on Data Engineering, vol. 36, no. 4, 2015.Google ScholarGoogle Scholar

Index Terms

  1. Stream-based Machine Learning for Network Security and Anomaly Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        Big-DAMA '18: Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks
        August 2018
        58 pages
        ISBN:9781450359047
        DOI:10.1145/3229607

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 August 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate7of11submissions,64%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader