1 Introduction
-
A set of generic metrics to describe, quantify and compare the scanning, propagation capabilities and efficiency of malware from a communication network perspective.
-
Three prototypical malware categories featuring:
-
Different scanning behavior that allows us to extract details for effective detection mechanisms.
-
Different propagation behavior that allows us to extract details for effective detection and protection mechanisms.
-
-
A mesh-network-based smart grid simulation environment with a backhaul network infrastructure that is attacked using different malware categories. The results allow inference of different malware behavior and anomalies that help us define defensive solutions.
-
A specific anomaly detection method based on flow anomaly detection that is able to detect covert malicious packets inside legitimate flows.
-
Several countermeasures that support the defense against these malware types.
2 System model and operational environment
2.1 Topology
2.2 Communication
Field nodes | Gateways | |
---|---|---|
Number of nodes | 49 per subnet | 1 per subnet |
Total number of nodes | 192 | 4 |
Sending cycle [s] | 60 | 1 |
Distance [m] | 100 |
\(>\,700\)
|
Connection to | Local gateways | Neighbor gateways |
Legitimate data [kbyte] | 100 | 100 |
3 Attack model
3.1 Pandemic malware
-
Automatically initiates unsolicited TCP connections: Infected nodes connect to potential new victims and transfer a self-carried payload, regardless of native network traffic.
-
Small payload size: A simple monomorphic payload of 500 bytes represents that this malware type has no advanced on-board features and no obfuscation capability.
3.2 Endemic malware
-
Automatically initiates unsolicited TCP connections: Infected nodes connect to discovered victims and transfer a self-carried payload.
-
Large payload size: A complex large payload (5000 bytes) represents more capable on-board features that are transferred in a polymorphic payload. These features include hiding the malware from detection mechanisms.
3.3 Contagion malware
-
Passive scanning: This method leaves no traceable scanning anomalies in the network.
-
No unsolicited TCP connections: This malware type exploits application-layer vulnerabilities of the target to transfer its payload using legitimate TCP connections between host and target before connection close.
-
Large payload size: A complex large payload (5000 bytes) represents more capable on-board features that are transferred in a metamorphic payload. These features include hiding the malware from detection mechanisms and obfuscating it amongst native network traffic.
Pandemic | Endemic | Contagion | |
---|---|---|---|
Scanning | Topological | Hit-list | Passive |
Behavior | Scan | Scan | Scan |
Scan-rate [1/s] | 100 | 1 | N.A. |
Propagation | Self-carried | Self-carried | Embedded |
Payload [Byte] | 500 | 5000 | 5000 |
Payload | Mono- | Poly- | Meta- |
morphism | morphic | morphic | morphic |
4 Metrics
4.1 Node infection ratio
-
Notation:
-
s = Number of sub-networks
-
\(n_{inf}\)(i) = Number of infected nodes in sub-network (i)
-
\(n_{host}\)(i) = Existing number of nodes in sub-network (i)
4.2 Infection times
-
\(T_{first.GW}\) = Time until first gateway is infected
-
\(T_{last.GW}\) = Time until last gateway is infected
-
\(T_{\textit{75}\%.\textit{nodes}}\) = Time until 75% of all nodes are infected
-
\(T_{last.node}\) = Time until the last field node is infected. \(T_{last.node}\le T_{all.nodes}\)
-
\(T_{all.nodes}\) = Time until all nodes on the network are infected. May be \(\infty \)
4.3 Scanning stealthiness
-
Notation:
-
\(n_{host}\)(i) = Existing number of nodes in sub-network (i)
-
\(n_{addr}\)(i) = Number of theoretically available addresses per sub-network (i)
-
\(n_{scn}\)(i) = Number of all scans for sub-network (i)
4.4 Scanning efficiency
-
One source node per sub-network scans all existing nodes and every scan is a success. No packet loss occurs and none of the other hosts participate in scanning. In this case all the scanning effort needs to be taken over by one node, decreasing the scanning speed. In addition, the scanning source is easily detected if the network is observed.
-
The scanning is highly coordinated. One possibility is to use sequential scanning where each node only scans and infects one node, then stops, and the following node continues. Sequential scanning is slow and fails if a scanning packet gets lost. An alternative is to use some control traffic (C&C) to coordinate the scanning. But such control traffic requires additional effort and also reduces the stealthiness.
4.5 Propagation stealthiness
-
Notation:
-
\(T_{active}\) = Time of malware activity, used to normalize infectious traffic with overall traffic such that: \(B_{mal} = 0\) for all \(\hbox {t} > \hbox {T}_{\mathrm{active}}\)
-
\(B_{mal}(T_{active},i)\) = Bytes associated with unsolicited traffic in sub-network i during interval (0, \(\hbox {T}_{\mathrm{active}}\))
-
\(B_{total}(T_{active},i)\) = Bytes of total traffic in sub-network i during interval (\(0, \hbox {T}_{\mathrm{active}}\))
-
Notation:
-
\(F_{mal}(T_{active},i)\) = Number of TCP flows with un-solicited traffic in sub-network i during interval (0, \(\hbox {T}_{\mathrm{active}}\))
-
\(F_{total}(T_{active},i)\) = Number of TCP flows in effective overall traffic in sub-network i during interval \((0, \hbox {T}_{\mathrm{active}}\))
4.6 Summary of all notations
Notation | Explanation |
---|---|
s
| Number of sub-networks |
\(n_{inf}\)(i) | Number of infected nodes in sub-network (i) |
\(n_{host}\)(i) | Existing number of nodes in sub-network (i) |
\(T_{first.GW}\)
| Time until first gateway is infected |
\(T_{last.GW}\)
| Time until last gateway is infected |
\(T_{\textit{75}\%.\textit{nodes}}\)
| Time until 75% of all nodes are infected |
\(T_{last.node}\)
| Time until the last field node is infected. \(T_{last.node} \le T_{all.nodes}\) |
\(T_{all.nodes}\)
| Time until all nodes on the network are infected. May be \(\infty \) |
\(n_{host}\)(i) | Existing number of nodes per sub-network (i) |
\(n_{addr}\)(i) | Number of theoretically available address per sub-network (i) |
\(n_{scn}\)(i) | Number of all scans per sub-networks (i), cf. Figure 4 |
\(T_{active}\)
| Time of malware activity, to normalize infectious traffic to overall traffic |
\(B_{mal}(T_{active},i)\)
| Bytes associated with unsolicited (malicious) traffic in subnet i during interval (0, \(T_{active}\)) |
\(B_{total}(T_{active},i)\)
| Bytes of effective overall traffic (legitimate and malicious) in subnet i during (0, \(T_{active}\)) |
\(F_{mal}(T_{active},i)\)
| Number of TCP flows associated with unsolicited traffic in subnet i during interval (0, \(T_{active}\)) |
\(F_{total}(T_{active},i)\)
| Number of TCP flows in effective overall traffic in subnet i during interval (0, \(T_{active}\)) |
5 Results
5.1 Pandemic malware
-
Aggressive scanning: High scanning ratio (\(R_{scn} = 94.42\%\)) leads to detectable anomalies, thus few nodes are not scanned (\(R_{uscn} = 5.58\%\)). The scanning efficiency (\(E_{scn} = 1.66\%\)) is very low because every infected node scans the entire sub-network. Figure 6 shows that scanning traffic is dominant over any other network traffic, consuming the bandwidth and leading to postponed legitimate connections and highly visible anomalies.
-
Unsolicited connections: Pandemic malware opens outbound TCP connections regardless of native network traffic, i.e., the percentage of malicious bytes inside overall traffic, \(U_{tr} = 2.54\%\). However, the percentage of maliciously established TCP connections compared to overall TCP is \(U_{TCP} = 60.46\%\) for the timeframe \(0 - T_{active}\). Unsolicited connections that do not match patterns expected from legitimate applications are an indication that illegitimate services are using the data link.
-
\(R_{inf} = 100\%\); All nodes in this network have been infected.
-
\(R_{clean} = 0\%\); No nodes have slipped through the infection process.
-
Infection Times:
-
\(T_{first.GW} = 1.97\,\hbox {s}\); Entire first sub-network can be controlled.
-
\(T_{last.GW} = 3.12\,\hbox {s}\); All sub-networks can be controlled.
-
\(T_{\textit{75}\%.\textit{nodes}}= 7.69\hbox { sec}\); 75% of all field nodes can be controlled selectively.
-
\(T_{last.node} =T_{all.nodes}\)
-
\(T_{all.nodes} = 63.03\,\hbox {s}\); 100% of all field nodes can be controlled selectively.
-
5.2 Endemic malware
-
Permutation-hit-list scanning: Endemic malware operates on an inconspicuous scanning strategy. This low scanning output \(R_{scn} = 16.51\%\) leads to fewer detectable anomalies. \(R_{uscn} = 83.49\%\). Additionally the hit-list is optimized, but still, rescanning occurs leading to a scan efficiency of \(E_{scn} = 10.91\%\) which is much higher compared to pandemic malware. Figure 8 shows that scanning traffic is very low compared to payload and legitimate traffic.
-
Unsolicited connections: The malware opens outbound TCP connections to transfer the payload. The payload being large (5000 bytes) compared to pandemic malware (500 bytes) produces higher peaks that could be detected, i.e., \(U_{tr} = 6.49\%\). The percentage of malicious TCP connections compared to overall TCP is \(U_{TCP} = 64.70\%\) for the timeframe 0 - \(T_{active}\). These unsolicited connections do not match expected patterns from legitimate smart grid applications, thus are a sign that illegitimate services are using the data link.
-
\(R_{inf} = 89.11\%\) of all nodes have been infected.
-
\(R_{clean} = 10.89\%\) of all nodes have been missed by the infection process.
-
Infection Times:
-
\(T_{first.GW} = 11.25\,\hbox {s}\); Entire first sub-network can be controlled.
-
\(T_{last.GW} = 15.99\,\hbox {s}\); All sub-networks can be controlled.
-
\(T_{\textit{75}\%.\textit{nodes}}= 28.00\,\hbox {s}\); 75% of all field nodes can be controlled selectively.
-
\(T_{last.node} = 55.54\,\hbox {s}\); 89% of all field nodes can be controlled selectively.
-
\(T_{all.nodes} = \infty \)
-
5.3 Contagion malware
-
Passive scanning: This malware type does not scan the network, thus the scanning output \(R_{scn} =0\%\), not scanned nodes \(R_{uscn} = 100\%\), and \(E_{scn}\) = not applicable. No anomalies can be detected from scanning.
-
No unsolicited connections: This malware type appends on legitimate connections and no unsolicited connections can be detected. Therefore, \(U_{tr}\) and \(U_{TCP}\) are both 0%.
-
\(R_{inf} = 89.60\%\) of all nodes have been infected.
-
\(R_{clean} = 10.40\%\) of all nodes have not been infected.
-
Infection Times:
-
\(T_{first.GW} = 126.83\,\hbox {s}\); Entire first sub-network can be controlled.
-
\(T_{last.GW} = 148.52\,\hbox {s}\); All sub-networks can be controlled.
-
\(T_{\textit{75}\%.\textit{nodes}}= 201.50\,\hbox {s}\); 75% of all nodes can be controlled selectively.
-
\(T_{last.node} = 316.97\,\hbox {s}\); 89% of all field nodes can be controlled selectively.
-
\(T_{all.nodes} = \infty \)
-
-
Waiting periods: The infection graph, cf. Figure 9, shows long waiting periods between infections. They originate from the propagation strategy that appends the payload at the end of legitimate TCP connections to decrease anomaly output. Therefore, all legitimate data must be transferred first by the smart grid application to avoid postponed legitimate traffic, then the malware hijacks the TCP flow from the local smart grid application and appends its payload before the connection is closed. This may lead to delays caused by the regular reporting cycle of field nodes, cf. Table 1.
6 Countermeasures
6.1 Pandemic malware countermeasures
-
Restricted Virtual Local Area Networks (VLAN).
-
Perimeter firewalls between network segments.
-
Data origin authentication using asymmetric cryptography (digital signatures).
6.2 Endemic malware countermeasures
-
All defensive measures mentioned for pandemic malware.
-
Intrusion detection systems that can detect unsolicited traffic and scanning behavior.
6.3 Contagion malware countermeasures
-
All defensive measures mentioned above.
-
Firewalls or intrusion detection systems checking every connection for its legitimate expected behavior.
6.4 Anomaly detection specific to contagion malware
-
Notation:
-
\(F_{covert}(T_{active},i)\) = Number of flows in sub-network i containing malicious data during time interval [\(0, \hbox {T}_{\mathrm{active}}\)]
6.5 Results summary
Metric | Pandemic | Endemic | Contagion |
---|---|---|---|
\(R_{inf}\) [%] | 100.00 | 89.11 | 89.60 |
\(R_{clean}\) [%] | 0.00 | 10.89 | 10.40 |
\(T_{first.GW}\) [s] | 1.97 | 11.25 | 126.83 |
\(T_{last.GW}\) [s] | 3.12 | 15.99 | 148.52 |
\(T_{\textit{75}\%.\textit{nodes}}\) [s] | 7.69 | 28.00 | 201.50 |
\(T_{last.node}\) [s] |
\(T_{all.nodes}\)
| 55.54 | 316.97 |
\(T_{all.nodes}\) [s] | 63.03 |
\(\infty \)
|
\(\infty \)
|
\(R_{scn}\) [%] | 94.42 | 16.51 | 0.00 |
\(R_{uscn}\) [%] | 5.58 | 83.49 | 100.00 |
\(E_{scn}\) [%] | 1.66 | 10.91 | N.A. |
\(U_{tr}\) [%] | 2.54 | 6.49 | 0 |
\(U_{TCP}\) [%] | 60.46 | 64.70 | 0 |
\(A_{flow}\) [%] | N.A. | N.A. | 15.70 |