2015 | OriginalPaper | Buchkapitel
Are Your Training Datasets Yet Relevant?
An Investigation into the Importance of Timeline in Machine Learning-Based Malware Detection
verfasst von : Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon
Erschienen in: Engineering Secure Software and Systems
Verlag: Springer International Publishing
Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.
Wählen Sie Textabschnitte aus um mit Künstlicher Intelligenz passenden Patente zu finden. powered by
Markieren Sie Textabschnitte, um KI-gestützt weitere passende Inhalte zu finden. powered by
In this paper, we consider the
relevance of timeline
in the construction of datasets, to highlight its impact on the performance of a machine learning-based malware detection scheme. Typically, we show that simply picking a random set of known malware to train a malware detector, as it is done in many assessment scenarios from the literature, yields
significantly biased
results. In the process of assessing the extent of this impact through various experiments, we were also able to confirm a number of intuitive assumptions about Android malware. For instance, we discuss the existence of Android malware lineages and how they could impact the performance of malware detection in the wild.