nach oben

Erschienen in:

2011 | OriginalPaper | Buchkapitel

Detection of Illegitimate Emails Using Boosting Algorithm

verfasst von : Sarwat Nizamani, Nasrullah Memon, Uffe Kock Wiil

Erschienen in: Counterterrorism and Open Source Intelligence

Verlag: Springer Vienna

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this paper, we report on experiments to detect illegitimate emails using boosting algorithm. We call an email illegitimate if it is not useful for the receiver or for the society. We have divided the problem into two major areas of illegitimate email detection: suspicious email detection and spam email detection. For our desired task, we have applied a boosting technique. With the use of boosting we can achieve high accuracy of traditional classification algorithms. When using boosting one has to choose a suitable weak learner as well as the number of boosting iterations. In this paper, we propose suitable weak learners and parameter settings for the boosting algorithm for the desired task. We have initially analyzed the problem using base learners. Then we have applied boosting algorithm with suitable weak learners and parameter settings such as the number of boosting iterations. We propose a Naive Bayes classifier as a suitable weak learner for the boosting algorithm. It achieves maximum performance with very few boosting iterations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Counterterrorism Mining for Individuals Semantically-Similar to Watchlist Members

Nächstes Kapitel Cluster Based Text Classification Model

Appavu, S., Rajaram, R.: Suspicious email detection via decision tree: A data mining approach. J. Comput. Inform. Technol. 15, 161–169 (2007)

Appavu, S., Rajaram, R.: Association rule mining for suspicious email detection: A data mining approach. IEEE International Conference on Intelligence and Security Informatics, pp. 316–323. (2007)

Appavu, S., Rajaram, R.: Learning to Classify threatening e-mail. Int. J. Artif. Intell. Soft Comput. 1, 39–51 (2008)CrossRef

Allanach, J., Tu, H., Singh, S., Willet, P., Pattipati, K.: Detecting, Tracking and Counteracting Terrorist Networks Via Hidden Markov Model. In: IEEE Aerospace Conference, pp. 3246–3257 (2004)

Bylander, T., Tate, L.: Using Validation Sets to Avoid Overfitting in AdaBoost. In: 19th International Florida Artificial Intelligence Research Society Conference, pp. 544–549. (2006)

Carnegie Mellom Universiy. http://www.cs.cmu.edu/\~enron/\AQPlease provide Publication year for reference “(6)".

Clayton, R.: Email traffic: A quantitative snapshot. In: CEAS 2007-Fourth Conference on Email and Anti-Spam, Mountain View, California USA (2007)

Ferris Research Report: Spam Control: Problems and opportunities”, http://www.ferris.com. Accessed on 25-08-2010

Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Machine Learning: 13th International Conference on Machine Learning, pp. 148–156. (1996)

10.

Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)CrossRefMATHMathSciNet

11.

Fette, I., Sadeh, N., Tomasic, A.: Learning to Detect Phishing Emails. Technical Report. Carnegie Mellon Cyber Laboratory (2006)

12.

Federal Energy Regulatory Commission. A report downloaded from http://www.ferc.gov/. Accessed on 20-08-2010

13.

Graham, P.: A plan for Spam. http://www.paulgraham.com/spam.html. An Internet article. Accessed on 23-08-2010

14.

Joachims, T: A Statistical Learning Model of Text Classification for Support Vector Machines. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (2001)

15.

Lim, M.J.H.: Computational Intelligence in Email Traffic Analysis. Ph.D. Dissertation, University of Tasmania. (2008)

16.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Ian H. Witten, I. H.: The WEKA Data Mining Software: An Update; SIGKDD Explorations, vol. 11(1). (2009)

17.

McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. Technical Report. Workshop on Learning for Text Categorization, pp. 41–48. (1998)

18.

Meir, R., Rastch, G.: An Introduction to Boosting and Leveraging. Advanced lectures on Machine Learning, pp. 118–183. Springer, New York (2003)

19.

Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes. In: 3rd Conference on Email and Anti-Spam, pp. 1702–1761. (2006)

20.

National Commission on Terrorist Attacks Upon the United States. http://govinfo.library.unt.edu/911/report/911Report.pdf, (2004). Accessed on 25-08-2010

21.

Quinlan, J.R.: Induction of Decision Trees. J. Mach. Learn. 1, 81–106 (1986)

22.

Quinlan, J.R.: C4.5: Programs for machine learning. Machine Learning, vol. 16, pp. 235–240. Springer, Berlin (1993)

23.

Renuka, D.K., Hamsapriya, T.: Email Classification for Spam Detection using Word Stemming. Int. J. Comput. Appl. 1, 45–47 (2010)

24.

pc]Please provide Publication year for reference “(25)".Schlimmer, J.C., Fisher, D.: A case study of incremental concept induction. In: 5th National Conference on Artificial Intelligence, pp. 496–501. (1986)

25.

Spambase dataset. Downloaded from UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets/Spambase

26.

Shawkat, A., S., Xiang, Y.: Spam classification using adaptive boosting algorithm. In: IEEE 6th Conference on Computer and Information Science, pp. 972–976. (2007)

27.

Tan, P.N., Michael Steinbach, M., Kumar, V.: Introduction to Data Mining. pp. 285–290. (2006)

28.

Utgoff, P.E.: ID5: An incremental ID3. In: 5th International Conference on Machine Learning, pp. 107–120. (1988)

29.

Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4, 161–186. (1989)CrossRef

30.

Utgoff, P.E., Berkman, N.C., Clouse, J.A.: Decision tree induction based on efficient tree restructuring. Mach. Learn. 29, 5–44 (1997)CrossRefMATH

31.

Vapnik, V.: The Nature of Statistical Theory. Springer, New York (1995)CrossRefMATH

32.

Weber, R., Waldstein, I., Deshpande, A., Proctor, M.J.: Integrated approach to detect inconspicuous contents. LNAI. 304–315. (2005)

33.

Youn, S., Dennis, M.: A comparative study for email classification. Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 387–391. Springer, Berlin (2007)

34.

Youn, S., Dennis, M.: Efficient spam email filtering using an adaptive ontology. In: IEEE 4th International Conference on Information Technology: New Generations (ITNG), pp. 249–254. (2007)

Titel: Detection of Illegitimate Emails Using Boosting Algorithm
verfasst von: Sarwat Nizamani
Nasrullah Memon
Uffe Kock Wiil
Verlag: Springer Vienna
Buch: Counterterrorism and Open Source Intelligence
Print ISBN: 978-3-7091-0387-6

Electronic ISBN: 978-3-7091-0388-3

Copyright-Jahr: 2011
DOI: https://doi.org/10.1007/978-3-7091-0388-3_13

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"