Skip to main content
Top

2021 | OriginalPaper | Chapter

Explainable Multiple Instance Learning with Instance Selection Randomized Trees

Authors : Tomáš Komárek, Jan Brabec, Petr Somol

Published in: Machine Learning and Knowledge Discovery in Databases. Research Track

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Multiple Instance Learning (MIL) aims at extracting patterns from a collection of samples, where individual samples (called bags) are represented by a group of multiple feature vectors (called instances) instead of a single feature vector. Grouping instances into bags not only helps to formulate some learning problems more naturally, it also significantly reduces label acquisition costs as only the labels for bags are needed, not for the inner instances. However, in application domains where inference transparency is demanded, such as in network security, the sample attribution requirements are often asymmetric with respect to the training/application phase. While in the training phase it is very convenient to supply labels only for bags, in the application phase it is generally not enough to just provide decisions on the bag-level because the inferred verdicts need to be explained on the level of individual instances. Unfortunately, the majority of recent MIL classifiers does not focus on this real-world need. In this paper, we address this problem and propose a new tree-based MIL classifier able to identify instances responsible for positive bag predictions. Results from an empirical evaluation on a large-scale network security dataset also show that the classifier achieves superior performance when compared with prior art methods.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Footnotes
1
For example, a seemingly legitimate request to google.com might be in reality related to malicious activity when it is issued by malware checking Internet connection. Similarly, requesting ad servers in low volumes is considered as a legitimate behavior, but higher numbers might indicate Click-fraud infection.
 
2
Term extremely in Extremely Randomized Trees [11] corresponds to setting \(T=1\).
 
5
MI-SVM is trained with Algorithm 1 for complete feature space (\(\mathbf {s}\) is vector of ones).
 
6
36 virtual Intel Xeon CPUs @ 2.9 GHz and 60 Gb of memory.
 
7
It was shown in the work of BLRT [14], and we confirm that for ISRT in Sect. 4.2, that tuning of these parameters usually does not bring any additional performance.
 
8
While precision answers to the question: “With how big percentage of false alarms the network administrators will have to deal with?”, false positive rate gives answer to: “How big percentage of clean users will be bothered?”.
 
9
This way of identifying malicious communications is not so effective in production, since new threats are not on the deny list yet and need to be first discovered.
 
11
AUC is agnostic to class imbalance and classifier’s decision threshold value.
 
12
The best model is assigned the lowest rank (i.e. one).
 
13
The performance of any two classifiers is significantly different if the corresponding average ranks differ by at least the critical difference, which is (for 12 datasets, four methods and \(\alpha =0.05\)) approximately 1.35.
 
Literature
5.
go back to reference Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984) Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
12.
go back to reference Ho, T.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)CrossRef Ho, T.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998)CrossRef
15.
go back to reference Li, K., Chen, R., Gu, L., Liu, C., Yin, J.: A method based on statistical characteristics for detection malware requests in network traffic. In: 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), pp. 527–532 (2018). https://doi.org/10.1109/DSC.2018.00084 Li, K., Chen, R., Gu, L., Liu, C., Yin, J.: A method based on statistical characteristics for detection malware requests in network traffic. In: 2018 IEEE Third International Conference on Data Science in Cyberspace (DSC), pp. 527–532 (2018). https://​doi.​org/​10.​1109/​DSC.​2018.​00084
16.
go back to reference Machlica, L., Bartos, K., Sofka, M.: Learning detectors of malicious web requests for intrusion detection in network traffic (2017) Machlica, L., Bartos, K., Sofka, M.: Learning detectors of malicious web requests for intrusion detection in network traffic (2017)
18.
go back to reference Pevny, T., Somol, P.: Discriminative models for multi-instance problems with tree structure. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, pp. 83–91. AISec 2016. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2996758.2996761 Pevny, T., Somol, P.: Discriminative models for multi-instance problems with tree structure. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, pp. 83–91. AISec 2016. Association for Computing Machinery, New York, NY, USA (2016). https://​doi.​org/​10.​1145/​2996758.​2996761
21.
go back to reference Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th International Conference on Machine Learning, pp. 807–814. ICML 2007. Association for Computing Machinery, New York, NY, USA (2007). https://doi.org/10.1145/1273496.1273598 Shalev-Shwartz, S., Singer, Y., Srebro, N.: Pegasos: primal estimated sub-gradient solver for SVM. In: Proceedings of the 24th International Conference on Machine Learning, pp. 807–814. ICML 2007. Association for Computing Machinery, New York, NY, USA (2007). https://​doi.​org/​10.​1145/​1273496.​1273598
Metadata
Title
Explainable Multiple Instance Learning with Instance Selection Randomized Trees
Authors
Tomáš Komárek
Jan Brabec
Petr Somol
Copyright Year
2021
DOI
https://doi.org/10.1007/978-3-030-86520-7_44

Premium Partner