Skip to main content

2011 | OriginalPaper | Buchkapitel

Detection of Illegitimate Emails Using Boosting Algorithm

verfasst von : Sarwat Nizamani, Nasrullah Memon, Uffe Kock Wiil

Erschienen in: Counterterrorism and Open Source Intelligence

Verlag: Springer Vienna

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this paper, we report on experiments to detect illegitimate emails using boosting algorithm. We call an email illegitimate if it is not useful for the receiver or for the society. We have divided the problem into two major areas of illegitimate email detection: suspicious email detection and spam email detection. For our desired task, we have applied a boosting technique. With the use of boosting we can achieve high accuracy of traditional classification algorithms. When using boosting one has to choose a suitable weak learner as well as the number of boosting iterations. In this paper, we propose suitable weak learners and parameter settings for the boosting algorithm for the desired task. We have initially analyzed the problem using base learners. Then we have applied boosting algorithm with suitable weak learners and parameter settings such as the number of boosting iterations. We propose a Naive Bayes classifier as a suitable weak learner for the boosting algorithm. It achieves maximum performance with very few boosting iterations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat Appavu, S., Rajaram, R.: Suspicious email detection via decision tree: A data mining approach. J. Comput. Inform. Technol. 15, 161–169 (2007) Appavu, S., Rajaram, R.: Suspicious email detection via decision tree: A data mining approach. J. Comput. Inform. Technol. 15, 161–169 (2007)
2.
Zurück zum Zitat Appavu, S., Rajaram, R.: Association rule mining for suspicious email detection: A data mining approach. IEEE International Conference on Intelligence and Security Informatics, pp. 316–323. (2007) Appavu, S., Rajaram, R.: Association rule mining for suspicious email detection: A data mining approach. IEEE International Conference on Intelligence and Security Informatics, pp. 316–323. (2007)
3.
Zurück zum Zitat Appavu, S., Rajaram, R.: Learning to Classify threatening e-mail. Int. J. Artif. Intell. Soft Comput. 1, 39–51 (2008)CrossRef Appavu, S., Rajaram, R.: Learning to Classify threatening e-mail. Int. J. Artif. Intell. Soft Comput. 1, 39–51 (2008)CrossRef
4.
Zurück zum Zitat Allanach, J., Tu, H., Singh, S., Willet, P., Pattipati, K.: Detecting, Tracking and Counteracting Terrorist Networks Via Hidden Markov Model. In: IEEE Aerospace Conference, pp. 3246–3257 (2004) Allanach, J., Tu, H., Singh, S., Willet, P., Pattipati, K.: Detecting, Tracking and Counteracting Terrorist Networks Via Hidden Markov Model. In: IEEE Aerospace Conference, pp. 3246–3257 (2004)
5.
Zurück zum Zitat Bylander, T., Tate, L.: Using Validation Sets to Avoid Overfitting in AdaBoost. In: 19th International Florida Artificial Intelligence Research Society Conference, pp. 544–549. (2006) Bylander, T., Tate, L.: Using Validation Sets to Avoid Overfitting in AdaBoost. In: 19th International Florida Artificial Intelligence Research Society Conference, pp. 544–549. (2006)
7.
Zurück zum Zitat Clayton, R.: Email traffic: A quantitative snapshot. In: CEAS 2007-Fourth Conference on Email and Anti-Spam, Mountain View, California USA (2007) Clayton, R.: Email traffic: A quantitative snapshot. In: CEAS 2007-Fourth Conference on Email and Anti-Spam, Mountain View, California USA (2007)
9.
Zurück zum Zitat Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Machine Learning: 13th International Conference on Machine Learning, pp. 148–156. (1996) Freund, Y., Schapire, R.E.: Experiments with a New Boosting Algorithm. In: Machine Learning: 13th International Conference on Machine Learning, pp. 148–156. (1996)
10.
Zurück zum Zitat Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)CrossRefMATHMathSciNet Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)CrossRefMATHMathSciNet
11.
Zurück zum Zitat Fette, I., Sadeh, N., Tomasic, A.: Learning to Detect Phishing Emails. Technical Report. Carnegie Mellon Cyber Laboratory (2006) Fette, I., Sadeh, N., Tomasic, A.: Learning to Detect Phishing Emails. Technical Report. Carnegie Mellon Cyber Laboratory (2006)
14.
Zurück zum Zitat Joachims, T: A Statistical Learning Model of Text Classification for Support Vector Machines. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (2001) Joachims, T: A Statistical Learning Model of Text Classification for Support Vector Machines. In: 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (2001)
15.
Zurück zum Zitat Lim, M.J.H.: Computational Intelligence in Email Traffic Analysis. Ph.D. Dissertation, University of Tasmania. (2008) Lim, M.J.H.: Computational Intelligence in Email Traffic Analysis. Ph.D. Dissertation, University of Tasmania. (2008)
16.
Zurück zum Zitat Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Ian H. Witten, I. H.: The WEKA Data Mining Software: An Update; SIGKDD Explorations, vol. 11(1). (2009) Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Ian H. Witten, I. H.: The WEKA Data Mining Software: An Update; SIGKDD Explorations, vol. 11(1). (2009)
17.
Zurück zum Zitat McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. Technical Report. Workshop on Learning for Text Categorization, pp. 41–48. (1998) McCallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. Technical Report. Workshop on Learning for Text Categorization, pp. 41–48. (1998)
18.
Zurück zum Zitat Meir, R., Rastch, G.: An Introduction to Boosting and Leveraging. Advanced lectures on Machine Learning, pp. 118–183. Springer, New York (2003) Meir, R., Rastch, G.: An Introduction to Boosting and Leveraging. Advanced lectures on Machine Learning, pp. 118–183. Springer, New York (2003)
19.
Zurück zum Zitat Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes. In: 3rd Conference on Email and Anti-Spam, pp. 1702–1761. (2006) Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam Filtering with Naive Bayes – Which Naive Bayes. In: 3rd Conference on Email and Anti-Spam, pp. 1702–1761. (2006)
21.
Zurück zum Zitat Quinlan, J.R.: Induction of Decision Trees. J. Mach. Learn. 1, 81–106 (1986) Quinlan, J.R.: Induction of Decision Trees. J. Mach. Learn. 1, 81–106 (1986)
22.
Zurück zum Zitat Quinlan, J.R.: C4.5: Programs for machine learning. Machine Learning, vol. 16, pp. 235–240. Springer, Berlin (1993) Quinlan, J.R.: C4.5: Programs for machine learning. Machine Learning, vol. 16, pp. 235–240. Springer, Berlin (1993)
23.
Zurück zum Zitat Renuka, D.K., Hamsapriya, T.: Email Classification for Spam Detection using Word Stemming. Int. J. Comput. Appl. 1, 45–47 (2010) Renuka, D.K., Hamsapriya, T.: Email Classification for Spam Detection using Word Stemming. Int. J. Comput. Appl. 1, 45–47 (2010)
24.
Zurück zum Zitat pc]Please provide Publication year for reference “(25)".Schlimmer, J.C., Fisher, D.: A case study of incremental concept induction. In: 5th National Conference on Artificial Intelligence, pp. 496–501. (1986) pc]Please provide Publication year for reference “(25)".Schlimmer, J.C., Fisher, D.: A case study of incremental concept induction. In: 5th National Conference on Artificial Intelligence, pp. 496–501. (1986)
26.
Zurück zum Zitat Shawkat, A., S., Xiang, Y.: Spam classification using adaptive boosting algorithm. In: IEEE 6th Conference on Computer and Information Science, pp. 972–976. (2007) Shawkat, A., S., Xiang, Y.: Spam classification using adaptive boosting algorithm. In: IEEE 6th Conference on Computer and Information Science, pp. 972–976. (2007)
27.
Zurück zum Zitat Tan, P.N., Michael Steinbach, M., Kumar, V.: Introduction to Data Mining. pp. 285–290. (2006) Tan, P.N., Michael Steinbach, M., Kumar, V.: Introduction to Data Mining. pp. 285–290. (2006)
28.
Zurück zum Zitat Utgoff, P.E.: ID5: An incremental ID3. In: 5th International Conference on Machine Learning, pp. 107–120. (1988) Utgoff, P.E.: ID5: An incremental ID3. In: 5th International Conference on Machine Learning, pp. 107–120. (1988)
29.
Zurück zum Zitat Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4, 161–186. (1989)CrossRef Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4, 161–186. (1989)CrossRef
30.
Zurück zum Zitat Utgoff, P.E., Berkman, N.C., Clouse, J.A.: Decision tree induction based on efficient tree restructuring. Mach. Learn. 29, 5–44 (1997)CrossRefMATH Utgoff, P.E., Berkman, N.C., Clouse, J.A.: Decision tree induction based on efficient tree restructuring. Mach. Learn. 29, 5–44 (1997)CrossRefMATH
32.
Zurück zum Zitat Weber, R., Waldstein, I., Deshpande, A., Proctor, M.J.: Integrated approach to detect inconspicuous contents. LNAI. 304–315. (2005) Weber, R., Waldstein, I., Deshpande, A., Proctor, M.J.: Integrated approach to detect inconspicuous contents. LNAI. 304–315. (2005)
33.
Zurück zum Zitat Youn, S., Dennis, M.: A comparative study for email classification. Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 387–391. Springer, Berlin (2007) Youn, S., Dennis, M.: A comparative study for email classification. Advances and Innovations in Systems, Computing Sciences and Software Engineering, pp. 387–391. Springer, Berlin (2007)
34.
Zurück zum Zitat Youn, S., Dennis, M.: Efficient spam email filtering using an adaptive ontology. In: IEEE 4th International Conference on Information Technology: New Generations (ITNG), pp. 249–254. (2007) Youn, S., Dennis, M.: Efficient spam email filtering using an adaptive ontology. In: IEEE 4th International Conference on Information Technology: New Generations (ITNG), pp. 249–254. (2007)
Metadaten
Titel
Detection of Illegitimate Emails Using Boosting Algorithm
verfasst von
Sarwat Nizamani
Nasrullah Memon
Uffe Kock Wiil
Copyright-Jahr
2011
Verlag
Springer Vienna
DOI
https://doi.org/10.1007/978-3-7091-0388-3_13