Skip to main content
Erschienen in: Peer-to-Peer Networking and Applications 2/2009

01.06.2009

Exploiting unlabeled data to improve peer-to-peer traffic classification using incremental tri-training method

verfasst von: Bijan Raahemi, Weicai Zhong, Jing Liu

Erschienen in: Peer-to-Peer Networking and Applications | Ausgabe 2/2009

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Unlabeled training examples are readily available in many applications, but labeled examples are fairly expensive to obtain. For instance, in our previous works on classification of peer-to-peer (P2P) Internet traffics, we observed that only about 25% of examples can be labeled as “P2P”or “NonP2P” using a port-based heuristic rule. We also expect that even fewer examples can be labeled in the future as more and more P2P applications use dynamic ports. This fact motivates us to investigate the techniques which enhance the accuracy of P2P traffic classification by exploiting the unlabeled examples. In addition, the Internet data flows dynamically in large volumes (streaming data). In P2P applications, new communities of peers often join and old communities of peers often leave, requiring the classifiers to be capable of updating the model incrementally, and dealing with concept drift. Based on these requirements, this paper proposes an incremental Tri-Training (iTT) algorithm. We tested our approach on a real data stream with 7.2 Mega labeled examples and 20.4 Mega unlabeled examples. The results show that iTT algorithm can enhance accuracy of P2P traffic classification by exploiting unlabeled examples. In addition, it can effectively deal with dynamic nature of streaming data to detect the changes in communities of peers. We extracted attributes only from the IP layer, eliminating the privacy concern associated with the techniques that use deep packet inspection.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
“length” denotes the length field after “ > ” in Table 1.
 
Literatur
1.
Zurück zum Zitat Azzouna NB, Guillemin F (2004) Impact of peer-to-peer applications on wide area network traffic: an experimental approach. IEEE Global Telecommunications Conference 3:1544–1548 Azzouna NB, Guillemin F (2004) Impact of peer-to-peer applications on wide area network traffic: an experimental approach. IEEE Global Telecommunications Conference 3:1544–1548
2.
Zurück zum Zitat Kamei S, Kimura T (2003) Practicable network design for handling growth in the volume of peer-to-peer traffic. IEEE Pacific Rim Conference on Communications. Computers and signal Processing 2:597–600 Kamei S, Kimura T (2003) Practicable network design for handling growth in the volume of peer-to-peer traffic. IEEE Pacific Rim Conference on Communications. Computers and signal Processing 2:597–600
4.
Zurück zum Zitat Zander S, Nguyen T, Armitage G (2005) Self-learning IP traffic classification based on statistical flow characteristics. Springer-Verlag Lecture Notes in Computer Science 3431:325–328 Springer Berlin Zander S, Nguyen T, Armitage G (2005) Self-learning IP traffic classification based on statistical flow characteristics. Springer-Verlag Lecture Notes in Computer Science 3431:325–328 Springer Berlin
5.
Zurück zum Zitat Zuev D, Moore AW (2005) Traffic classification using a statistical approach. Springer-Verlag Lecture Notes in Computer Science 3431:321–324 Springer Berlin Zuev D, Moore AW (2005) Traffic classification using a statistical approach. Springer-Verlag Lecture Notes in Computer Science 3431:321–324 Springer Berlin
6.
Zurück zum Zitat Raahemi B, Hayajneh A, Rabinovitch P (2007) Classification of peer-to-peer traffic using neural networks. Proceedings of Artificial Intelligence and Pattern Recognition, Orlando, USA, July, pp.411–417. Raahemi B, Hayajneh A, Rabinovitch P (2007) Classification of peer-to-peer traffic using neural networks. Proceedings of Artificial Intelligence and Pattern Recognition, Orlando, USA, July, pp.411–417.
7.
Zurück zum Zitat Raahemi B, Hayajneh A, Rabinovitch P (2007) Peer-to-peer IP traffic classification using decision tree and IP layer attributes. International Journal of Business Data Communications and Networks 3(4):60–74 Raahemi B, Hayajneh A, Rabinovitch P (2007) Peer-to-peer IP traffic classification using decision tree and IP layer attributes. International Journal of Business Data Communications and Networks 3(4):60–74
8.
Zurück zum Zitat Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541CrossRefMathSciNet Zhou ZH, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541CrossRefMathSciNet
9.
10.
Zurück zum Zitat Crovella M, Krishnamurthy B (2006) Internet measurement: infrastructure, traffic and applications. John Wiley and Sons Ltd, West Sussex, England Crovella M, Krishnamurthy B (2006) Internet measurement: infrastructure, traffic and applications. John Wiley and Sons Ltd, West Sussex, England
11.
Zurück zum Zitat Sen S, Spatscheck O, Wang D (2004) Accurate, scalable in-network identification of P2P traffic using application signatures. Proc. of the 13th International World Wide Web Conference, NY, USA, pp. 512–521 Sen S, Spatscheck O, Wang D (2004) Accurate, scalable in-network identification of P2P traffic using application signatures. Proc. of the 13th International World Wide Web Conference, NY, USA, pp. 512–521
12.
Zurück zum Zitat Karagiannis T, Broido A, Faloutsos M, Klaffy K (2004) Transport layer identification of P2P traffic. Proc. of the 4th ACM SIGCOMM Conference on Internet Measurement, Italy, pp. 121–134 Karagiannis T, Broido A, Faloutsos M, Klaffy K (2004) Transport layer identification of P2P traffic. Proc. of the 4th ACM SIGCOMM Conference on Internet Measurement, Italy, pp. 121–134
13.
Zurück zum Zitat Moore W, Zuev D (2005) Internet traffic classification using Bayesian analysis techniques, in Proc. ACM Sigmetrics, Alberta, Canada, June 2005, pp.50–59 Moore W, Zuev D (2005) Internet traffic classification using Bayesian analysis techniques, in Proc. ACM Sigmetrics, Alberta, Canada, June 2005, pp.50–59
14.
Zurück zum Zitat Auld T, Moore W, Gull F (2007) Bayesian neural network for Internet traffic classification. IEEE Trans. on Neural Network 18(1):223–239CrossRef Auld T, Moore W, Gull F (2007) Bayesian neural network for Internet traffic classification. IEEE Trans. on Neural Network 18(1):223–239CrossRef
15.
Zurück zum Zitat Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, pp. 92–100 Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, pp. 92–100
16.
Zurück zum Zitat Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA, pp.327–334 Goldman S, Zhou Y (2000) Enhancing supervised learning with unlabeled data. Proceedings of the 17th International Conference on Machine Learning, San Francisco, CA, pp.327–334
17.
Zurück zum Zitat Joachims T (1999) Transductive inference for text classification using support vector machines. Proceedings of the 16th International Conference on Machine Learning, Bled, Slovenia, pp. 200–209 Joachims T (1999) Transductive inference for text classification using support vector machines. Proceedings of the 16th International Conference on Machine Learning, Bled, Slovenia, pp. 200–209
18.
Zurück zum Zitat Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, USA, pp.19–26 Blum A, Chawla S (2001) Learning from labeled and unlabeled data using graph mincuts. Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, USA, pp.19–26
19.
Zurück zum Zitat Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. Proceedings of the 20th International Conference on Machine Learning, Washington, DC, pp. 912–919 Zhu X, Ghahramani Z, Lafferty J (2003) Semi-supervised learning using Gaussian fields and harmonic functions. Proceedings of the 20th International Conference on Machine Learning, Washington, DC, pp. 912–919
20.
Zurück zum Zitat Zhu X (2005) Semi-supervised learning literature survey. Computer Sciences Technical Report 1530, University of Wisconsin-Madison Zhu X (2005) Semi-supervised learning literature survey. Computer Sciences Technical Report 1530, University of Wisconsin-Madison
21.
Zurück zum Zitat Peirce D, Cardie C (2001) Limitations of co-training for natural language learning from large data sets. Proceedings of the 6th Conference on Empirical Methods in Natural Language Proceedings, Pittsburgh, PA, pp. 1–9 Peirce D, Cardie C (2001) Limitations of co-training for natural language learning from large data sets. Proceedings of the 6th Conference on Empirical Methods in Natural Language Proceedings, Pittsburgh, PA, pp. 1–9
22.
Zurück zum Zitat Levin A, Viola P, Freund Y (2003) Unsupervised improvement of visual detectors using co-training. Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, pp. 626–633 Levin A, Viola P, Freund Y (2003) Unsupervised improvement of visual detectors using co-training. Proceedings of the 9th IEEE International Conference on Computer Vision, Nice, France, pp. 626–633
23.
Zurück zum Zitat Sarkar A (2001) Applying co-training methods to statistical parsing. Proceedings of the 2nd Annual Meeting of the North American Chapter of the Association for computational Linguistics, Pittsburgh, PA, pp. 95–102 Sarkar A (2001) Applying co-training methods to statistical parsing. Proceedings of the 2nd Annual Meeting of the North American Chapter of the Association for computational Linguistics, Pittsburgh, PA, pp. 95–102
24.
Zurück zum Zitat Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101 Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101
25.
Zurück zum Zitat Orrego A (2004) SAWTOOTH: Learning from huge amounts of data, Master’s thesis, West Virginia University Orrego A (2004) SAWTOOTH: Learning from huge amounts of data, Master’s thesis, West Virginia University
Metadaten
Titel
Exploiting unlabeled data to improve peer-to-peer traffic classification using incremental tri-training method
verfasst von
Bijan Raahemi
Weicai Zhong
Jing Liu
Publikationsdatum
01.06.2009
Verlag
Springer US
Erschienen in
Peer-to-Peer Networking and Applications / Ausgabe 2/2009
Print ISSN: 1936-6442
Elektronische ISSN: 1936-6450
DOI
https://doi.org/10.1007/s12083-008-0022-6

Weitere Artikel der Ausgabe 2/2009

Peer-to-Peer Networking and Applications 2/2009 Zur Ausgabe

Premium Partner