Skip to main content
Erschienen in: Journal of Network and Systems Management 1/2015

01.01.2015

Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers

verfasst von: Roozbeh Zarei, Alireza Monemi, Muhammad Nadzir Marsono

Erschienen in: Journal of Network and Systems Management | Ausgabe 1/2015

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Peer-to-peer (P2P) classifications based on flow statistics have been proven accurate in detecting P2P traffic. A machine learning classification is affected by the quality and recency of the training dataset used. Hence, to classify P2P traffic on-line requires the removal of these limitations. In this paper, an automated training dataset generation for an on-line P2P traffic classification is proposed to allow frequent classifier retraining. A two-stage training dataset generator (TSTDG) is proposed by combining a 3-class heuristic and a 3-class statistical classification to automatically generate a training dataset. In the heuristic stage, traffic is classified as P2P, non-P2P, or unknown. In the statistical stage, a dual Decision Tree is built based on a dataset generated in the heuristic stage to reduce the amount of classified unknown traffic. The final training dataset is generated based on all flows that are classified in these two stages. The proposed system has been evaluated on traces captured from a campus network. The overall results show that the TSTDG can generate an accurate training dataset by classifying around 94 % of total flows with high accuracy (98.59 %) and a low false positive rate (1.27 %).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Flows are distinguished based on [Source IP, Destination IP, Src Port, Dst Post, Protocol].
 
2
J48 is an open source C++ implementation of the C4.5 algorithm
 
Literatur
1.
Zurück zum Zitat Chen, Z., Yang, B., Chen, Y., Abraham, A., Grosan, C., Peng, L.: Online hybrid traffic classifier for peer-to-peer systems based on network processors. Appl. Soft. Comput. 9(2), 685–694 (2009)CrossRef Chen, Z., Yang, B., Chen, Y., Abraham, A., Grosan, C., Peng, L.: Online hybrid traffic classifier for peer-to-peer systems based on network processors. Appl. Soft. Comput. 9(2), 685–694 (2009)CrossRef
2.
Zurück zum Zitat Soysal, M., Schmidt, E.G.: Machine learning algorithms for accurate flow-based network traffic classification: evaluation and comparison. Perform. Eval. 67(6), 451–467 (2010)CrossRef Soysal, M., Schmidt, E.G.: Machine learning algorithms for accurate flow-based network traffic classification: evaluation and comparison. Perform. Eval. 67(6), 451–467 (2010)CrossRef
3.
Zurück zum Zitat Bernaille, L., Teixeira, R., Salamatian, K.: Early application identification. In: Proceedings of the 2006 ACM CoNEXT Conference (CoNEXT ’06), pp. 6:1–6:12. Lisboa, Portugal (2006) Bernaille, L., Teixeira, R., Salamatian, K.: Early application identification. In: Proceedings of the 2006 ACM CoNEXT Conference (CoNEXT ’06), pp. 6:1–6:12. Lisboa, Portugal (2006)
4.
Zurück zum Zitat Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. SIGMETRICS Perform. Eval. Rev. 33(1), 50–60 (2005)CrossRef Moore, A.W., Zuev, D.: Internet traffic classification using Bayesian analysis techniques. SIGMETRICS Perform. Eval. Rev. 33(1), 50–60 (2005)CrossRef
5.
Zurück zum Zitat Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)CrossRef Williams, N., Zander, S., Armitage, G.: A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification. SIGCOMM Comput. Commun. Rev. 36(5), 5–16 (2006)CrossRef
6.
Zurück zum Zitat Xu, K., Zhang, M., Ye, M., Chiu, D.M., Wu, J.: Identify P2P traffic by inspecting data transfer behavior. Comput. Commun. 33(10), 1141–1150 (2010)CrossRef Xu, K., Zhang, M., Ye, M., Chiu, D.M., Wu, J.: Identify P2P traffic by inspecting data transfer behavior. Comput. Commun. 33(10), 1141–1150 (2010)CrossRef
7.
Zurück zum Zitat Lu, W., Tavallaee, M., Ghorbani, A.A.: Hybrid traffic classification approach based on Decision Tree. In: Proceedings of the 28th IEEE Conference on Global Telecommunications (GLOBECOM’09), pp. 5679–5684. Honolulu, Hawaii, USA (2009) Lu, W., Tavallaee, M., Ghorbani, A.A.: Hybrid traffic classification approach based on Decision Tree. In: Proceedings of the 28th IEEE Conference on Global Telecommunications (GLOBECOM’09), pp. 5679–5684. Honolulu, Hawaii, USA (2009)
8.
Zurück zum Zitat Keralapura, R., Nucci, A., Chuah, C.N.: A novel self-learning architecture for p2p traffic classification in high speed networks. Comput. Netw. 54(7), 1055–1068 (2010)CrossRefMATH Keralapura, R., Nucci, A., Chuah, C.N.: A novel self-learning architecture for p2p traffic classification in high speed networks. Comput. Netw. 54(7), 1055–1068 (2010)CrossRefMATH
9.
Zurück zum Zitat Erman, J., Mahanti, A., Arlitt, M., Cohen, I., Williamson, C.: Offline/realtime traffic classification using semi-supervised learning. Perform. Eval. 64(9–12), 1194–1213 (2007)CrossRef Erman, J., Mahanti, A., Arlitt, M., Cohen, I., Williamson, C.: Offline/realtime traffic classification using semi-supervised learning. Perform. Eval. 64(9–12), 1194–1213 (2007)CrossRef
10.
Zurück zum Zitat Li, W., Moore, A.W.: A machine learning approach for efficient traffic classification. In: Proceedings of 15th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 310–317. Washington, DC, USA (2007) Li, W., Moore, A.W.: A machine learning approach for efficient traffic classification. In: Proceedings of 15th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, pp. 310–317. Washington, DC, USA (2007)
11.
Zurück zum Zitat Tian, X., Sun, Q., Huang, X., Ma, Y.: A dynamic online traffic classification methodology based on data stream mining. In: Proceedings of the 2009 WRI world congress on computer science and information engineering—Volume 01, CSIE ’09, pp. 298–302. IEEE Computer Society, Washington, DC, USA (2009) Tian, X., Sun, Q., Huang, X., Ma, Y.: A dynamic online traffic classification methodology based on data stream mining. In: Proceedings of the 2009 WRI world congress on computer science and information engineering—Volume 01, CSIE ’09, pp. 298–302. IEEE Computer Society, Washington, DC, USA (2009)
12.
Zurück zum Zitat Mula-Valls, O.: A practical retraining mechanism for network traffic classification in operational environments. Master thesis, Universitat Politècnica de Catalunya (2011) Mula-Valls, O.: A practical retraining mechanism for network traffic classification in operational environments. Master thesis, Universitat Politècnica de Catalunya (2011)
13.
Zurück zum Zitat Mingliang, G., Xiaohong, H., Xu, T., Yan, M., Zhenhua, W.: Data stream mining based real-time highspeed traffic classification. In: Proceedings of the 2nd IEEE international conference on broadband network multimedia technology (IC-BNMT’09), pp. 700–705. Beijing, China (2009) Mingliang, G., Xiaohong, H., Xu, T., Yan, M., Zhenhua, W.: Data stream mining based real-time highspeed traffic classification. In: Proceedings of the 2nd IEEE international conference on broadband network multimedia technology (IC-BNMT’09), pp. 700–705. Beijing, China (2009)
14.
Zurück zum Zitat Raahemi, B., Zhong, W., Liu, J.: Peer-to-peer traffic identification by mining IP layer data streams using concept-adapting very fast Decision Tree. In: Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’08), vol. 1, pp. 525–532. Dayton, OH, USA (2008) Raahemi, B., Zhong, W., Liu, J.: Peer-to-peer traffic identification by mining IP layer data streams using concept-adapting very fast Decision Tree. In: Proceedings of the 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’08), vol. 1, pp. 525–532. Dayton, OH, USA (2008)
15.
Zurück zum Zitat Nguyen, T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. Commun. Surv. Tutor. IEEE 10(4), 56–76 (2008)CrossRef Nguyen, T.T., Armitage, G.: A survey of techniques for internet traffic classification using machine learning. Commun. Surv. Tutor. IEEE 10(4), 56–76 (2008)CrossRef
16.
Zurück zum Zitat Hassan, M., Marsono, M.: A three-class heuristics technique: generating training corpus for peer-to-peer traffic classification. In: Proceedings of the 2010 IEEE 4th International Conference on Internet Multimedia Services Architecture and Application (IMSAA 2010), pp. 1–5. Bangalore, India (2010) Hassan, M., Marsono, M.: A three-class heuristics technique: generating training corpus for peer-to-peer traffic classification. In: Proceedings of the 2010 IEEE 4th International Conference on Internet Multimedia Services Architecture and Application (IMSAA 2010), pp. 1–5. Bangalore, India (2010)
17.
Zurück zum Zitat Sears, W., Yu, Z., Guan, Y.: An adaptive reputation-based trust framework for peer-to-peer applications. In: Proceedings of the Fourth IEEE International Symposium on Network Computing and Applications (NCA’05), pp. 13–20. Cambridge, MA, USA (2005) Sears, W., Yu, Z., Guan, Y.: An adaptive reputation-based trust framework for peer-to-peer applications. In: Proceedings of the Fourth IEEE International Symposium on Network Computing and Applications (NCA’05), pp. 13–20. Cambridge, MA, USA (2005)
18.
Zurück zum Zitat Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proceedings of the 13th international conference on World Wide Web (WWW ’04), pp. 512–521. New York, NY, USA (2004) Sen, S., Spatscheck, O., Wang, D.: Accurate, scalable in-network identification of p2p traffic using application signatures. In: Proceedings of the 13th international conference on World Wide Web (WWW ’04), pp. 512–521. New York, NY, USA (2004)
19.
Zurück zum Zitat Karagiannis, T., Broido, A., Faloutsos, M., claffy, k.c.: Transport layer identification of P2P traffic. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, pp. 121–134. Taormina, Sicily, Italy (2004) Karagiannis, T., Broido, A., Faloutsos, M., claffy, k.c.: Transport layer identification of P2P traffic. In: Proceedings of the 4th ACM SIGCOMM Conference on Internet Measurement, pp. 121–134. Taormina, Sicily, Italy (2004)
20.
Zurück zum Zitat Perényi, M., Dang, T.D., Gefferth, A., Molnr, S.: Identification and analysis of peer-to-peer traffic. J. Commun. 1(7), 36–46 (2006) Perényi, M., Dang, T.D., Gefferth, A., Molnr, S.: Identification and analysis of peer-to-peer traffic. J. Commun. 1(7), 36–46 (2006)
21.
Zurück zum Zitat Crotti, M., Dusi, M., Gringoli, F., Salgarelli, L.: Traffic classification through simple statistical fingerprinting. SIGCOMM Comput. Commun. Rev. 37(1), 5–16 (2007)CrossRef Crotti, M., Dusi, M., Gringoli, F., Salgarelli, L.: Traffic classification through simple statistical fingerprinting. SIGCOMM Comput. Commun. Rev. 37(1), 5–16 (2007)CrossRef
22.
Zurück zum Zitat Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. IEEE/ACM Trans. Netw. 12, 219–232 (2004)CrossRef Sen, S., Wang, J.: Analyzing peer-to-peer traffic across large networks. IEEE/ACM Trans. Netw. 12, 219–232 (2004)CrossRef
23.
Zurück zum Zitat Raahemi, B., Hayajneh, A., Rabinovitch, P.: Peer-to-peer IP traffic classification using Decision Tree and IP layer attributes. Int. J. Bus. Data Commun. Netw. 3(4), 60–74 (2007)CrossRef Raahemi, B., Hayajneh, A., Rabinovitch, P.: Peer-to-peer IP traffic classification using Decision Tree and IP layer attributes. Int. J. Bus. Data Commun. Netw. 3(4), 60–74 (2007)CrossRef
26.
Zurück zum Zitat Karagiannis, T., Papagiannaki, K., Faloutsos, M.: Blinc: multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev. 35(4), 229–240 (2005)CrossRef Karagiannis, T., Papagiannaki, K., Faloutsos, M.: Blinc: multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev. 35(4), 229–240 (2005)CrossRef
28.
Zurück zum Zitat Madhukar, A., Williamson, C.: A longitudinal study of p2p traffic classification. In: Proceedings of the 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS ’07), pp. 179–188. Washington, DC, USA (2006) Madhukar, A., Williamson, C.: A longitudinal study of p2p traffic classification. In: Proceedings of the 2007 15th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS ’07), pp. 179–188. Washington, DC, USA (2006)
29.
Zurück zum Zitat John, W., Tafvelin, S.: Heuristics to classify internet backbone traffic based on connection patterns. In: Proceedings of the 22nd International Conference on Information Networking (ICOIN’08), pp. 1–5. Busan, Korea (2008) John, W., Tafvelin, S.: Heuristics to classify internet backbone traffic based on connection patterns. In: Proceedings of the 22nd International Conference on Information Networking (ICOIN’08), pp. 1–5. Busan, Korea (2008)
30.
Zurück zum Zitat Raahemi, B., Hayajneh, A., Rabinovitch, P.: Classification of peer-to-peer traffic using neural networks. In: Artificial Intelligence and Pattern Recognition, pp. 411–417 (2007) Raahemi, B., Hayajneh, A., Rabinovitch, P.: Classification of peer-to-peer traffic using neural networks. In: Artificial Intelligence and Pattern Recognition, pp. 411–417 (2007)
31.
Zurück zum Zitat Zhang, M., John, W., Claffy, K.C., Brownlee, N.: State of the art in traffic classification: a research review. In: Proceedings of the Tenth Passive and Active Measurement Conference (PAM’09). Seoul, Korea (2009) Zhang, M., John, W., Claffy, K.C., Brownlee, N.: State of the art in traffic classification: a research review. In: Proceedings of the Tenth Passive and Active Measurement Conference (PAM’09). Seoul, Korea (2009)
32.
Zurück zum Zitat Zarei, R., Monemi, A., Marsono, M.: Retraining mechanism for on-line peer-to-peer traffic classification. In: intelligent Informatics, Advances in Intelligent Systems and Computing. vol. 182, pp. 373–382. Springer,Berlin Heidelberg (2013) Zarei, R., Monemi, A., Marsono, M.: Retraining mechanism for on-line peer-to-peer traffic classification. In: intelligent Informatics, Advances in Intelligent Systems and Computing. vol. 182, pp. 373–382. Springer,Berlin Heidelberg (2013)
34.
Zurück zum Zitat Moore, A.W., Papagiannaki, K.: Toward the accurate identification of network applications. In: the Proceedings of Sixth Passive and Active Measurement Workshop (PAM ’05), pp. 41–54. Boston, USA (2005) Moore, A.W., Papagiannaki, K.: Toward the accurate identification of network applications. In: the Proceedings of Sixth Passive and Active Measurement Workshop (PAM ’05), pp. 41–54. Boston, USA (2005)
36.
Zurück zum Zitat Wang, Y., Yu, S.Z.: Machine learned real-time traffic classifiers. In: Proceedings of the 2008 Symposium on Intelligent Information Technology Application (IITA ’08), pp. 449–454. Shanghai, China (2008) Wang, Y., Yu, S.Z.: Machine learned real-time traffic classifiers. In: Proceedings of the 2008 Symposium on Intelligent Information Technology Application (IITA ’08), pp. 449–454. Shanghai, China (2008)
37.
Zurück zum Zitat Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, MineNet ’06, pp. 281–286 (2006) Erman, J., Arlitt, M., Mahanti, A.: Traffic classification using clustering algorithms. In: Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, MineNet ’06, pp. 281–286 (2006)
Metadaten
Titel
Automated Dataset Generation for Training Peer-to-Peer Machine Learning Classifiers
verfasst von
Roozbeh Zarei
Alireza Monemi
Muhammad Nadzir Marsono
Publikationsdatum
01.01.2015
Verlag
Springer US
Erschienen in
Journal of Network and Systems Management / Ausgabe 1/2015
Print ISSN: 1064-7570
Elektronische ISSN: 1573-7705
DOI
https://doi.org/10.1007/s10922-013-9279-z

Weitere Artikel der Ausgabe 1/2015

Journal of Network and Systems Management 1/2015 Zur Ausgabe