Skip to main content
Erschienen in: Cognitive Computation 3/2014

01.09.2014

A Two-Stage Methodology Using K-NN and False-Positive Minimizing ELM for Nominal Data Classification

verfasst von: Anton Akusok, Yoan Miche, Jozsef Hegedus, Rui Nian, Amaury Lendasse

Erschienen in: Cognitive Computation | Ausgabe 3/2014

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

This paper focuses on the problem of making decisions in the context of nominal data under specific constraints. The underlying goal driving the methodology proposed here is to build a decision-making model capable of classifying as many samples as possible while avoiding false positives at all costs, all within the smallest possible computational time. Under such constraints, one of the best type of model is the cognitive-inspired extreme learning machine (ELM), for the final decision process. A two-stage decision methodology using two types of classifiers, a distance-based one, K-NN, and the cognitive-based one, ELM, provides a fast means of obtaining a classification decision on a sample, keeping false positives as low as possible while classifying as many samples as possible (high coverage). The methodology only has two parameters, which, respectively, set the precision of the distance approximation and the final trade-off between false-positive rate and coverage. Experimental results using a specific dataset provided by F-Secure Corporation show that this methodology provides a rapid decision on new samples, with a direct control over the false positives and thus on the decision capabilities of the model.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
Details of the implementation are not given in this paper, but can be found from the publications and deliverables of the Finnish ICT SHOK Programme Future Internet: http://​www.​futureinternet.​fi
 
Literatur
1.
Zurück zum Zitat Lele S, Richtsmeier JT. Euclidean distance matrix analysis: a coordinate-free approach for comparing biological shapes using landmark data. Am J Phys Anthropol. 1991;86(3):415–27.PubMedCrossRef Lele S, Richtsmeier JT. Euclidean distance matrix analysis: a coordinate-free approach for comparing biological shapes using landmark data. Am J Phys Anthropol. 1991;86(3):415–27.PubMedCrossRef
2.
Zurück zum Zitat Broder AZ, Glassman SC, Manasse MS, Zweig G. Syntactic clustering of the Web. Comput Netw ISDN Syst. 1997;29(8–13):1157–66.CrossRef Broder AZ, Glassman SC, Manasse MS, Zweig G. Syntactic clustering of the Web. Comput Netw ISDN Syst. 1997;29(8–13):1157–66.CrossRef
3.
Zurück zum Zitat Broder AZ. On the resemblance and containment of documents. In: Compression and complexity of sequences (SEQUENCES’97). IEEE Computer Society; 1997. p. 21–29. Broder AZ. On the resemblance and containment of documents. In: Compression and complexity of sequences (SEQUENCES’97). IEEE Computer Society; 1997. p. 21–29.
4.
Zurück zum Zitat Robiah Y, Rahayu SS, Zaki MM, Shahrin S, Faizal MA, Marliza R. A new generic taxonomy on hybrid malware detection technique. arXiv.org cs.CR. Robiah Y, Rahayu SS, Zaki MM, Shahrin S, Faizal MA, Marliza R. A new generic taxonomy on hybrid malware detection technique. arXiv.org cs.CR.
5.
Zurück zum Zitat Srivastava A, Giffin J. Automatic discovery of parasitic malware. In: Jha S, Sommer R, Kreibich C, editors. Recent advances in intrusion detection (RAID’10). Berlin: Springer; 2010. p. 97–117.CrossRef Srivastava A, Giffin J. Automatic discovery of parasitic malware. In: Jha S, Sommer R, Kreibich C, editors. Recent advances in intrusion detection (RAID’10). Berlin: Springer; 2010. p. 97–117.CrossRef
6.
Zurück zum Zitat Bailey M, Andersen J, Morleymao Z, Jahanian F. Automated classification and analysis of internet malware. In: Recent advances in intrusion detection (RAID’07); 2007. Bailey M, Andersen J, Morleymao Z, Jahanian F. Automated classification and analysis of internet malware. In: Recent advances in intrusion detection (RAID’07); 2007.
8.
Zurück zum Zitat Willems C, Holz T, Freiling F. Toward automated dynamic malware analysis using CWSandbox. IEEE Secur Priv. 2007;5:32–9.CrossRef Willems C, Holz T, Freiling F. Toward automated dynamic malware analysis using CWSandbox. IEEE Secur Priv. 2007;5:32–9.CrossRef
9.
Zurück zum Zitat Yoshioka K, Hosobuchi Y, Orii T, Matsumoto T. Vulnerability in public malware sandbox analysis systems. In: Proceedings of the 2010 10th IEEE/IPSJ international symposium on applications and the internet. Washington, DC: IEEE Computer Society; 2010. pp. 265–268. Yoshioka K, Hosobuchi Y, Orii T, Matsumoto T. Vulnerability in public malware sandbox analysis systems. In: Proceedings of the 2010 10th IEEE/IPSJ international symposium on applications and the internet. Washington, DC: IEEE Computer Society; 2010. pp. 265–268.
10.
Zurück zum Zitat Jaccard P. Étude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de la Société Vaudoise des Sciences Naturelles. 1901;37:547–79. Jaccard P. Étude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de la Société Vaudoise des Sciences Naturelles. 1901;37:547–79.
11.
Zurück zum Zitat Tan P-N, Steinbach M, Kumar V. Introduction to data mining. 1st ed. Boston: Addison Wesley; 2005. Tan P-N, Steinbach M, Kumar V. Introduction to data mining. 1st ed. Boston: Addison Wesley; 2005.
13.
Zurück zum Zitat Carter JL, Wegman MN. Universal classes of hash functions. J Comput Syst Sci. 1979;18(2):143–54.CrossRef Carter JL, Wegman MN. Universal classes of hash functions. J Comput Syst Sci. 1979;18(2):143–54.CrossRef
14.
Zurück zum Zitat Broder AZ, Charikar M, Frieze AM, Mitzenmacher M. Min-wise independent permutations. J Compu Syst Sci. 1998;60:327–36. Broder AZ, Charikar M, Frieze AM, Mitzenmacher M. Min-wise independent permutations. J Compu Syst Sci. 1998;60:327–36.
15.
Zurück zum Zitat Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21–7.CrossRef Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21–7.CrossRef
16.
Zurück zum Zitat Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70:489–501.CrossRef Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70:489–501.CrossRef
17.
Zurück zum Zitat Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern. 2012;42(2):513–29.CrossRef Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern. 2012;42(2):513–29.CrossRef
18.
Zurück zum Zitat Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N. Can threshold networks be trained directly? IEEE Trans Circuits Syst II Express Briefs. 2006;53(3):187–91.CrossRef Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N. Can threshold networks be trained directly? IEEE Trans Circuits Syst II Express Briefs. 2006;53(3):187–91.CrossRef
19.
Zurück zum Zitat Huang G-B, Chen L, Siew C-K. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006;17(4):879–92.PubMedCrossRef Huang G-B, Chen L, Siew C-K. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006;17(4):879–92.PubMedCrossRef
20.
23.
Zurück zum Zitat Cambria E, Hussain A. Sentic computing: techniques, tools, and applications, springerBriefs in cognitive computation. Springer: Dordrecht; 2012. doi:10.1007/978-94-007-5070-8. Cambria E, Hussain A. Sentic computing: techniques, tools, and applications, springerBriefs in cognitive computation. Springer: Dordrecht; 2012. doi:10.​1007/​978-94-007-5070-8.
25.
Zurück zum Zitat Rao CR, Mitra SK. Generalized inverse of matrices and its applications. New York: Wiley; 1971. Rao CR, Mitra SK. Generalized inverse of matrices and its applications. New York: Wiley; 1971.
26.
Zurück zum Zitat Myers R. Classical and modern regression with applications. 2nd ed. Pacific Grove, CA: Duxbury Press; 1990. Myers R. Classical and modern regression with applications. 2nd ed. Pacific Grove, CA: Duxbury Press; 1990.
27.
Zurück zum Zitat Bontempi G, Birattari M, Bersini H, Recursive lazy learning for modeling and control. In: European conference on machine learning. 1998; pp. 292–303. Bontempi G, Birattari M, Bersini H, Recursive lazy learning for modeling and control. In: European conference on machine learning. 1998; pp. 292–303.
31.
Zurück zum Zitat Miche Y, Bas P, Jutten C, Simula O, Lendasse A. A methodology for building regression models using extreme learning machine: OP-ELM. In: Verleysen M, editor. ESANN 2008, European Symposium on Artificial Neural Networks, Bruges, Belgium, d-side publ. Belgium: Evere; 2008. p. 247–52. Miche Y, Bas P, Jutten C, Simula O, Lendasse A. A methodology for building regression models using extreme learning machine: OP-ELM. In: Verleysen M, editor. ESANN 2008, European Symposium on Artificial Neural Networks, Bruges, Belgium, d-side publ. Belgium: Evere; 2008. p. 247–52.
33.
Zurück zum Zitat van Heeswijk M, Miche Y, Oja E, Lendasse A. Solving large regression problems using an ensemble of GPU-accelerated ELMs. In: Verleysen M, editor. ESANN2010: 18th European symposium on artificial neural networks, computational intelligence and machine learning, d-side publications. Belgium: Bruges; 2010. p. 309–14. van Heeswijk M, Miche Y, Oja E, Lendasse A. Solving large regression problems using an ensemble of GPU-accelerated ELMs. In: Verleysen M, editor. ESANN2010: 18th European symposium on artificial neural networks, computational intelligence and machine learning, d-side publications. Belgium: Bruges; 2010. p. 309–14.
34.
Zurück zum Zitat Lan Y, Soh YC, Huang G-B. Constructive hidden nodes selection of extreme learning machine for regression. Neurocomputing. 2010;73(16–18):3191–9.CrossRef Lan Y, Soh YC, Huang G-B. Constructive hidden nodes selection of extreme learning machine for regression. Neurocomputing. 2010;73(16–18):3191–9.CrossRef
Metadaten
Titel
A Two-Stage Methodology Using K-NN and False-Positive Minimizing ELM for Nominal Data Classification
verfasst von
Anton Akusok
Yoan Miche
Jozsef Hegedus
Rui Nian
Amaury Lendasse
Publikationsdatum
01.09.2014
Verlag
Springer US
Erschienen in
Cognitive Computation / Ausgabe 3/2014
Print ISSN: 1866-9956
Elektronische ISSN: 1866-9964
DOI
https://doi.org/10.1007/s12559-014-9253-4

Weitere Artikel der Ausgabe 3/2014

Cognitive Computation 3/2014 Zur Ausgabe

Premium Partner