Abstract
Conventional network traffic detection methods based on data mining could not efficiently handle high throughput traffic with concept drift. Data stream mining techniques are able to classify evolving data streams although most techniques require completely labeled data. This paper proposes an improved data stream mining algorithm for online network traffic classification that is able to incrementally learn from both labeled and unlabeled flows. The algorithm uses the concept of incremental k-means and self-training semi-supervised method to continuously update the classification model after receiving new flow instances. The experimental results show that the proposed algorithm is able to classify 325 thousands flow instances per second and achieves up to 91–94 % average accuracy, even when using 10 % of labeled input flows. It is also able to maintain high accuracy even in the presence of concept drifts. Although there are drifts detected in the datasets evaluated using the Drift Detection Method, our proposed method with incremental learning is able to achieve up to 91–94 % accuracy compared to 60–69 % without incremental learning.
Similar content being viewed by others
References
Abdulsalam H (2008) Streaming Random Forest. PhD thesis, School of Computing, Queen’s University, Kingston, Ontario, Canada
Aggarwal CC, Jiawei H, Jianyong W, Philip SY (2004) On demand classification of data streams. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’04, New York, NY, USA. ACM, pp 503–508
Aggarwal Charu C, Han Jiawei, Wang Jianyong, Yu Philip S (2006) A framework for on-demand classification of evolving data streams. IEEE Trans Knowl Data Eng 18(5):577–588
Angelov Plamen P, Zhou Xiaowei (2008) Evolving fuzzy-rule-based classifiers from data streams. IEEE Trans Fuzzy Syst 16(6):1462–1475
Baena-García M, José del Campo-Ávila J, Raúl F, Albert B, Gavaldà R, Morales-Bueno R (2006) Early drift detection method. 6:77–86
Bertini Jr, João R, de Andrade Alneu, Lopes AA, Liang Z (2012) Partially labeled data stream classification with the semi-supervised K-associated graph. J Braz Comp Soc 18(4):299–310
Bifet A, Holmes G, Kirkby R, Pfahringer B (2011) Data stream mining: a practical approach. Technical report, University of Waikato
Bifet A, Holmes G, Pfahringer B, Gavalda R (2009) Improving adaptive bagging methods for evolving data streams. In: Advances in Machine Learning. Springer, pp 23–37
Bifet A, Pfahringer B, Read J, Holmes G (2013) Efficient data stream classification via probabilistic adaptive windows. In: Proceedings of the 28th Annual ACM Symposium on Applied Computing, pp 801–806
Dainotti Alberto, Pescape Antonio, Claffy Kimberly C (2012) Issues and future directions in traffic classification. IEEE Netw 26(1):35–40
de Souza EN, Matwin S, Fernandes S (2014) Traffic classification with on-line ensemble method. In: Global Information Infrastructure and Networking Symposium (GIIS), IEEE, pp 1–4
Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’00, New York, NY, USA, ACM, pp 71–80
Erman Jeffrey, Mahanti Anirban, Arlitt Martin, Cohen Ira, Williamson Carey (2007) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64(9):1194–1213
Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Advances in Artificial Intelligence-SBIA 2004, Springer, pp 286–295
Gringoli F, Salgarelli L, Cascarano N, Risso F, Claffy KC (2009) GT: picking up the truth from the ground for internet traffic. ACM SIGCOMM Comp Commun Rev 39:13–18
Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’01, New York, NY, USA, ACM, pp 97–106
Li S (2012) Towards ultra high-speed online network traffic classifcation enhanced with machine learning algorithms and openflow accelerators. PhD thesis, University of Massachusetts Lowell
Li W, Moore AW (2007) (2007) A machine learning approach for efficient traffic classification. In: 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, IEEE, MASCOTS’07., pp 310–317
Liu Jing, Guo-sheng Xu, Zheng Shi-hui, Xiao Da, Li-ze Gu (2014) Data streams classification with ensemble model based on decision-feedback. J China Univ Posts Telecommun 21(1):79–85
Loo HR, Andromeda Trias, Marsono MN (2014) Online data stream learning and classification with limited labels. Proc Elect Eng Comp Sci Inform 1(1):161–164
Loo HR, Marsono MN (2015) Online data stream classification with incremental semi-supervised learning. In: 2nd IKDD Conference on Data Science, CODS’15, ACM, pp 132–133
Lughofer E, Angelov P (2009) Detecting and reacting on drifts and shifts in on-line data streams with evolving fuzzy systems. In: Proceedings of the Joint 2009 International Fuzzy Systems Association World Congress and 2009 European Society of Fuzzy Logic and Technology Conference, Lisbon, Portugal, July 20–24, IFSA, Lisbon, pp 931–937
Lughofer Edwin, Angelov Plamen (2011) Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl Soft Comp 11(2):2057–2068
Masud Mohammad M, Woolam Clay, Gao Jing, Khan Latifur, Han Jiawei, Hamlen Kevin W, Oza Nikunj C (2012) Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl Inform Syst 33(1):213–244
Mingliang G, Xiaohong H, Xu T, Ma Y, Zhenhua W (2009) Data stream mIning based real-time high speed traffic classification. In: Proceedings of 2nd IEEE International Conference on Broadband Network and Multimedia Technology, 2009. IC-BNMT’09., pp 700–705
Minku L, Yao Xin (2012) DDD: a new ensemble approach for dealing with concept drift. IEEE Trans Knowl Data Eng 24(4):619–633
Monemi Alireza, Zarei Roozbeh, Marsono Muhammad Nadzir (2013) Online NetFPGA decision tree statistical traffic classifier. Comp Commun 36(12):1329–1340
Moore A, Zuev D, Crogan M (2005) Discriminators for use in flow-based classification. Technical report, Department of Computer Science, Queen Mary, University of London
Qian Feng, Guang-min Hu, Yao Xing-miao (2008) Semi-supervised internet network traffic classification using a Gaussian mixture model. AEU Int J Elect Commun 62(7):557–564
Raahemi B, Mumtaz A (2008) A two-stage window-based architecture for classification of peer-to-peer traffic using fast decision tree. In: Proceedings of the 4th International conference on data mining DMIN2008. Las Vegas, Nevada, USA, pp 144–149
Raahemi B, Zhong W, Liu J (2008) Peer-to-peer traffic identification by mining IP layer data streams using concept-adapting very fast decision tree. In: 20th IEEE Internationtal Conference on Tools with Artificial Intelligence, vol 1, pp 525–532
Shrivastav A, Tiwari J (2010) Network traffic classification using semi-supervised approach. In: IEEE 2010 Second International Conference on Machine Learning and Computing (ICMLC), pp 345–349
Street WN, Kim YS (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’01, New York, NY, USA. ACM, pp 377–382
Tian X, Sun Q, Huang X, Ma Y (2008) Dynamic online traffic classification using data stream mining. In: International Conference on MultiMedia and Information Technology, MMIT’08, IEEE, pp 104–107
Waikato (2015) MOA massive online analysis. http://moa.cs.waikato.ac.nz/
Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’03, New York, NY, USA. ACM, pp 226–235
Zhang T, Raghu R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: ACM SIGMOD Record, ACM, vol 25, pp 103–114
Zhen Liu, Qiong Liu (2012) A new feature selection method for internet traffic classification using ml. Phys Procedia 33:1338–1345
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Loo, H.R., Marsono, M.N. Online network traffic classification with incremental learning. Evolving Systems 7, 129–143 (2016). https://doi.org/10.1007/s12530-016-9152-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12530-016-9152-x