Introduction
Generating extensive YARA rules and associated issues
Generating optimised YARA rules and associated issues
Background: techniques employed for malware analysis
YARA rules
Fuzzy hashing
SSDEEP
SDHASH
mvHASH-B
Import hashing
Fuzzy rules
Related work: recent malware analysis techniques applied to ransomware
Data collection: collection of malware (ransomware) and goodware samples
Experimental evaluation of employed techniques: malware analysis using fuzzy hashing, import hashing and YARA rules
Fuzzy hashing: methodology
Fuzzy hashing: experiment
Fuzzy | WannaCry ransomware | Locky ransomware | Cerber ransomware | Cryptowall ransomware | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
hashing | ||||||||||||
Matching criteria | SSDEEP detection rate (%) | SDHASH detection rate (%) | mvHASH-B detection rate (%) | SSDEEP detection rate (%) | SDHASH detection rate (%) | mvHASH-B detection rate (%) | SSDEEP detection rate (%) | SDHASH detection rate (%) | mvHASH-B detection rate (%) | SSDEEP detection rate (%) | SDHASH detection rate (%) | mvHASH-B detection rate (%) |
Fuzzy similarity scores (1–100%) | 91.2 | 93.6 | 90 | 42 | 58.4 | 72.4 | 33.6 | 71.2 | 94.8 | 28 | 52.4 | 83.6 |
Fuzzy similarity scores > 10% | 91.2 | 93.6 | 90 | 42 | 38.4 | 64 | 33.6 | 62.8 | 90.4 | 28 | 32.8 | 56.8 |
Fuzzy similarity scores > 20% | 91.2 | 90 | 84.4 | 41.6 | 35.6 | 36.4 | 33.6 | 37.6 | 36.8 | 28 | 24 | 20.8 |
Fuzzy similarity scores > 30% | 90.8 | 90 | 84.4 | 41.6 | 30.4 | 33.6 | 33.6 | 28.4 | 36 | 28 | 20.4 | 20.4 |
Import hashing: methodology
Import hashing: experiment
YARA rules: methodology
YARA rules: experiment
Proposed technique-I: malware analysis using proposed enhanced YARA rules
Enhanced YARA rules: methodology
Ransomware category | Import hashing detection rate (%) |
---|---|
WannaCry ransomware | 87.6 |
Locky ransomware | 31.6 |
Cerber ransomware | 61.6 |
CryptoWall ransomware | 27.2 |
Ransomware category | YARA rules* detection rate (%) |
---|---|
WannaCry ransomware | 89.6 |
Locky ransomware | 54.4 |
Cerber ransomware | 77.2 |
CryptoWall ransomware | 27.6 |
Ransomware category | Detection rate of enhanced YARA rules based on SSDEEP (similarity score > 30%) (%) | Detection rate of enhanced YARA rules based on SDHASH (similarity score > 30%) (%) | Detection rate of enhanced YARA rules based on mvHASH-B- (similarity score > 30%) (%) |
---|---|---|---|
WannaCry ransomware | 93.2 | 92.8 | 92 |
Locky ransomware | 59.6 | 58 | 58.4 |
Cerber ransomware | 77.2 | 77.2 | 77.2 |
CryptoWall ransomware | 38.4 | 34.8 | 34.4 |
Ransomware category | SSDEEP fuzzy hashing detection rate (%) | SDHASH fuzzy hashing detection rate (%) | mvHASH-B fuzzy hashing detection rate (%) | IMPHASH import hashing detection rate (%) | YARA rules detection rate (%) | Enhanced YARA rules detection rate (%) |
---|---|---|---|---|---|---|
WannaCry ransomware | 90.8 | 90 | 84.4 | 87.6 | 89.6 | 93.2 |
Locky ransomware | 41.6 | 30.4 | 33.6 | 31.6 | 54.4 | 59.6 |
Cerber ransomware | 33.6 | 28.4 | 36 | 61.6 | 77.2 | 77.2 |
CryptoWall ransomware | 28 | 20.4 | 20.4 | 27.2 | 27.6 | 38.4 |
Enhanced YARA rules: experiment
Comparative evaluation of the analysis results of enhanced YARA rules with different analysis methods
Comparison based on similarity detection results
Evaluation metric | SSDEEP fuzzy hashing (%) | SDHASH fuzzy hashing (%) | mvHASH-B fuzzy hashing (%) | IMPHASH import hashing (%) | YARA rules (%) | Enhanced YARA rules (%) |
---|---|---|---|---|---|---|
Accuracy | 74.25 | 71.15 | 71.80 | 76.00 | 79.80 | 83.55 |
Precision | 100 | 90.19 | 90.08 | 100 | 95.99 | 96.27 |
Recall | 48.50 | 42.30 | 43.60 | 52.00 | 62.20 | 67.10 |
F1-Score | 65.32 | 57.59 | 58.76 | 68.42 | 75.49 | 79.08 |
Comparison based on evaluation metrics
Proposed technique-II: malware analysis using proposed embedded YARA rules
Embedded YARA rules: methodology
Embedded YARA rules: development of fuzzy rules
Ransomware category | YARA rules similarity detection rate (%) | Enhanced YARA Rules (with fuzzy hash) similarity detection rate (%) | Embedded YARA rules (with fuzzy hash and fuzzy rules) similarity detection rate (%) |
---|---|---|---|
WannaCry ransomware | 89.6 | 93.2 | 95.2 |
Locky ransomware | 54.4 | 59.6 | 65.6 |
Cerber ransomware | 77.2 | 77.2 | 82.8 |
CryptoWall ransomware | 27.6 | 38.4 | 50.4 |
Embedded YARA rules: experiment
Evaluation metric | Basic YARA rules (%) | Enhanced YARA rules (%) | Embedded YARA rules (%) |
---|---|---|---|
Accuracy | 79.80 | 83.55 | 86.75 |
Precision | 95.99 | 96.27 | 96.58 |
Recall | 62.20 | 67.10 | 73.50 |
F1-Score | 75.49 | 79.08 | 83.48 |
Advantages and limitations of the proposed technique
Advantages of the proposed embedded YARA rules
-
Extending search scope Fuzzy rules can combine multiple parameters and their complex conditions to produce one approximated output.
-
Extending result scope In addition to alert samples as malware by YARA rules, fuzzy rules reveal the degree of similarity of malware (Less Likely Malware, Likely Malware, and Most Likely Malware).
-
Aiding in analysis It can help security experts in analysing or classifying samples based on their fuzzy membership results to apply appropriate actions on specific groups without a deep dive investigation into the samples.
-
Improving detection rate Fuzzy hashing can complement YARA rules as it attempts to find structural similarity between the two files in their entirety in circumstances where the selected IoC strings cannot be found in the sample. Thus, it can still trigger fuzzy rules and detect more malware samples than YARA rules.
-
Maintaining performance Fuzzy hashing is one of the fastest analysis methods and it generates a compact hash, which does not affect the overall performance of the combined analysis process.
-
Accuracy improvement In case of fuzzy hashing found exactly matched sample(s), the strong similarity score 1 or 100% is generated, which increases the accuracy of the overall result and the further processing results of clustering or classification.
Limitations of the proposed embedded YARA rules
-
Dependency of fuzzy rules The result of fuzzy rules is dependent on the values of YARA rules and fuzzy hash indicators, thus if both fails to discover any sample then the fuzzy outcome will also be missed out.
-
Not a rule optimisation approach This proposed approach does not focus on generating optimised YARA rules rather its focus is to increase the effectiveness of existing YARA rules. Therefore, the success of this proposed approach is also dependent on the superiority of rules itself.
-
Trusted code If YARA rules are created utilising trusted code then this will increase the number of false positive [15], which will affect outcomes of the proposed approach.
-
Fuzzy structural similarity Fuzzy hashing can only discover structural or syntactic similarity, but not behavioural or semantic similarity, therefore, it is only a complementary method to YARA rules but does not offer the same effectiveness as YARA rules.
-
Fuzzy similarity scores Similarity scores provided by fuzzy hashing could be analysed and utilised differently by different security analysts, resulting in different conclusions based on the same similarity scores.
-
Scalability of advanced YARA rules Writing advanced YARA rules at scale is a challenging task in general [15], and this also applies to embedded YARA rules.