Introduction
-
Due to the processing limitation, the generated and correlated logs are mostly coarse-grained. The coarse-grained logs lead to data loss and lack of precise correlation between the low-level operating system events with the network events and rebuilding the attack vectors.
-
Due to the processing limitation, the available solutions use short-time windows for correlating the alerts, and hence they are vulnerable to detect slow attacks and long-term attack vectors.
-
Since, detecting low-level APT attacks needs processing large event logs, and detecting slow APTs makes the processing problem much harder, one of our contributions is to propose an approach to detect both low-level and slow APT attacks.
-
Using a long sliding window for detecting the slow APTs. We propose a Vermiform sliding window to analyze and correlate system events instead of using a fixed-size time-window.
-
Using Scalable Semantic Analytics Stack (SANSA) [17] as a big inference engine based on Spark for scalable semantic correlation.
-
Although SANSA is a good inference engine for processing huge number of events, its processing power is limited. We use event abstraction concept to reduce the number of events, to speed up the inference time, and to detect the very slow APTs (whose attack duration is several years instead of several months). By abstracting the old events, we consider them as a history in the detection process instead of being disposed by the movement of the timing window.
Preliminaries
-
Malicious: An attack vector \(\nu _i\) is malicious if it violates the security policies implicitly or explicitly.
-
Minimal: An attack vector \(\nu _i\) is minimal if the exclusion of any event \(e_i\) from \(\nu _i\) reduces the maliciousness of \(\nu _i\) [16]. Suppose that function \(\zeta :Event^{\mathcal {I}} \rightarrow {\mathbb {N}}\) shows the value of maliciousness of an event set, then \(E_i \subseteq Event^{\mathcal {I}}\) is more malicious than \(E_j \subseteq Event^{\mathcal {I}}\), if and only if, \(\zeta (E_i)>\zeta (E_j)\) [16]. In other words the minimality means:$$\begin{aligned} \forall e_i \in \nu _i, \zeta (\nu _i - \{e_i\}) < \zeta (\nu _i). \end{aligned}$$(7)
-
Connected: An attack vector \(\nu _i\) is connected, if the relations between all the events of \(\nu _i\) construct a connected directed acyclic graph. In other words:$$\begin{aligned}&(\nu _i,\sim ) \ is\ Partial\ Order \wedge \not \exists \nu _1, \nu _2, \nu _1 \subseteq \nu _i \wedge \nonumber \\&\quad \nu _2 \subseteq \nu _i \wedge \nu _i=[\nu _1 \cup \nu _2]~ \wedge ~[\nu _1 \cap \nu _2] =\varnothing ~ \wedge \nonumber \\&\quad \not \exists e_1, e_2, e_1 \in \nu _1 \wedge e_2 \in \nu _2 \wedge (e_1 \sim e_2 \vee e_2 \sim e_1). \end{aligned}$$(8)
Background and problem statement
APT characteristics
-
Using trusted events and agents to perform malicious activities: this method takes the advantages of some techniques such as malicious code injection into trusted applications (e.g., Gauss APT [15]), or using stolen digital certificates (e.g., Stuxnet APT [15]), or using genuine recognized removable media to bypass the data loss prevention (DLP) system (e.g., Project Sauron APT [20]), and human errors to infiltrate the victim’s system.
-
Performing the malicious actions gradually: some APTs (e.g., Carbanak APT [15]), especially the malware that use data exfiltration, steal the sensitive data gradually to hide from intrusion and anomaly detection systems. For example, to exfiltrate 1 GB of data from the victim’s system, the malware breaks the data into several tiny parts (e.g., less than 1 MB) and exfiltrates them slowly in several days.
Related works
Problem statement
Proposed approach
Semantic correlation
-
TBox: This box defines the system ontology and the relations between the system entities. For example, the ontology of Windows operating system is shown in Fig. 5. As shown in this figure, class Object consists of three subclasses KernelObj, UserObj, and GDIObj. For another example, subject Thread is a part of subject Process.
-
ABox: This box consists of four sub-boxes as follows:Instances or Individual Storage: All instances of the system ontology are stored in Individual Storage. For example, all intercepted events, subject instances, and object instances are stored in this box. In other words, Process \(p_i\) and Object \(o_j\) are samples of instances, which are stored in Individual Storage.Memory/Manipulation Storage (MStore): This sub-box uses two functions: Memory or me and Manipulation or ma. Function \(me:Subject^{\mathcal {I}} \longrightarrow {\mathcal {P}}(Object^{\mathcal {I}})\) determines the objects that are read explicitly or implicitly by a specific subject. Function \(ma:Subject^{\mathcal {I}} \longrightarrow {\mathcal {P}}(Object^{\mathcal {I}})\) determines the objects that are written explicitly or implicitly or deleted by a specific subject. For example \(o_i \in me(s_i)\) means subject \(s_i\) has read object \(o_i\), or \(o_j \in ma(s_i)\) means object \(o_j\) has been written by subject \(s_i\). These two functions are defined for detecting the violation of confidentiality and integrity, respectively.Security Policy (PStore): The security policy, which is defined in "Preliminaries", is stored in PStore. It is necessary to note that by using the ontology, we can define high-level and more abstract security policies and then infer the low-level security policies. The general format of security policy is shown in Algorithm 1.×For example, in Fig. 2, the main security policy is \(o_1 \notin me(s_2)\), which means subject \(s_2\) should not read object \(o_1\). For another example, consider Fig. 6, if no data would be extracted from local network to the Internet, then the security policy can be defined as \(LNO(o_i) \wedge PNS(s_j) \longrightarrow o_i \notin me(s_j)\), where:\(LNO \sqsubseteq Object\) and \(PNS \sqsubseteq Subject\)In this scenario, data can be exfiltrated using different approaches (e.g., through network buffer or USB drive or CD-ROM). Since we use system ontology, it is not necessary to define several security policies, because all data transmission devices (e.g., network buffer or USB drive or CD-ROM) are a type of PNO objects. For more details about security policy, readers are referred to [16].Event Relations: Two events can be related to each other by the relations between their subjects, objects, or actions. For example, relation \(e_i \overset{}{\sim } e_j\), which is defined for two events \(e_i\) and \(e_j\), is stored in this sub-box. This sub-box contains the events that are related to each other based on some relation rules. The relation rules are described in the rest of this section in RBox subsection.
-
RBox: This box consists of two sub-boxes as follows:Relation Rules: As mentioned before, two events can be related to each other based on their subjects, objects, or actions. All types of relations rules are described in Table 1. For example, as shown in this table, relation \(\ e_i \overset{wr}{\sim } e_j\) means \(a_i=W\) or Write and \(a_j=R\) or Read.According to this table, we can define approximately 500 relation rules for event correlation (precisely \((3+1) \times 4 \times (6\times 6)\) rules which some are meaningless). For example, relation rule \(e_i \overset{tewrip}{\sim } e_j\) is equal to three event relations \(e_i \overset{te}{\sim } e_j\), \(e_i \overset{wr}{\sim } e_j\), and \(e_i \overset{ip}{\sim } e_j\).Indirect Access Rules: Some event relations result in indirect change to the value of me and ma of subjects (e.g., as shown in Fig. 2). The related rules, which are used to detect indirect changes to the value of me and ma, are defined as Indirect Access Rules. These rules, which can be used for detecting low-level APTs, are defined in “Expanding: knowledge‑based inference” section.
Relation type | # | Event relation | Meaning |
---|---|---|---|
Object | 1 | \(\ e_i \overset{te}{\sim } e_j\) | \((t_i < t_j) \wedge (o_i=o_j)\) |
2 | \(e_i \overset{ti}{\sim } e_j\) | \(t_i < t_j) \wedge \left( (o_i \ne o_j) \wedge ( o_i \xrightarrow {partOf} o_j ) \right)\) | |
3 | \(e_i \overset{pi}{\sim } e_j\) | \((t_i < t_j) \wedge \left( (o_i \ne o_j) \wedge ( o_j \xrightarrow {partOf} o_i ) \right)\) | |
Subject | 4 | \(e_i \overset{it}{\sim } e_j\) | \((t_i < t_j) \wedge (s_i = s_j) \wedge Thread (s_i)\) |
5 | \(e_i \overset{ip}{\sim } e_j\) | \((t_i < t_j) \wedge Thread(s_i)\wedge Thread(s_j) \wedge \exists s_k, (Process(s_k) \wedge s_i \xrightarrow {partOf} s_k \wedge s_j \xrightarrow {partOf} s_k)\) | |
6 | \(e_i \overset{ih}{\sim } e_j\) | \((t_i < t_j) \wedge Thread(s_i) \wedge Thread(s_j) \wedge \exists s_k,s_m, (\ Process(s_k) \wedge Process(s_m) \wedge s_i \xrightarrow {partOf} s_k \wedge\) \(s_j \xrightarrow {partOf} s_m \wedge s_k \ne s_m\ ) \wedge \exists s_t, (\ Host(s_t)\wedge s_k \xrightarrow {partOf} s_t \wedge s_m \xrightarrow {partOf} s_t)\) | |
7 | \(e_i \overset{bh}{\sim } e_j\) | \((t_i < t_j) \wedge Thread(s_i) \wedge Thread(s_j) \wedge \exists s_k,s_m, ( Process(s_k) \wedge Process(s_m) \wedge s_i \xrightarrow {partOf} s_k \wedge\) \(s_j \xrightarrow {partOf} s_m \wedge s_k \ne s_m ) \wedge \ \exists s_t,s_p, (Host(s_t) \wedge Host(s_p) \wedge\) \( s_k \xrightarrow {partOf} s_t \wedge \ s_m \xrightarrow {partOf} s_p \wedge \ s_t \ne s_p\ ) \) | |
Action | 8 | \(\ e_i \overset{XY}{\sim } e_j\) | \((t_i < t_j) \wedge (a_i=X) \wedge (a_i=Y)\) |
Step | Component | Sample (based on Fig. 2). |
---|---|---|
Step 1 | Event Interception | \(e_1=\langle s_1, o_1, R, t_1\rangle ,\ e_2= \langle s_1, o_2, W, t_2 \rangle ,\ e_3= \langle s_2,o_2, R, t_3 \rangle\) |
Step 2 | Event Normalization | Using OWL-DL to define the events and initiate the me and ma. |
Memory/Manipulation Storage (MStore) | \(me(s_1)=\{o_1\}\), \(ma(s_1)=\{o_2\}\), \(me(s_2)=\{o_2\}\), \(ma(s_2)=\varnothing\) | |
Step 3 | Individual Storage | \(Event(e_1), Subject(q_1), Object(o_1), ...\) |
System Ontology (TBox) | \(\sqsubseteq , \ partOf,\ parentOf\), . | |
Relation Rules (RBox) | \((time(e_i)< time(e_j) \wedge object(e_i)=object(e_j) \wedge action(e_i)=W\) \(\wedge action(e_j)=R ) \Longrightarrow e_i \overset{tewr}{\sim } e_j\) | |
Inference Engine | \(e_2 \overset{tewr}{\sim } e_3\) | |
Step 4 | ABox and MStore | \(e_2 \overset{tewr}{\sim } e_3, me(s_1)=\{o_1\}, ma(s_1)=\{o_2\}, me(s_2)=\{o_2\},ma(s_2)=\varnothing\) |
System Ontology (TBox) | \(\sqsubseteq , \ partOf,\ parentOf\), . | |
Indirect Access Rules (RBox) | \(\forall e_i,e_j, ( Event(e_i) \wedge Event(e_j) \wedge e_i \overset{tewr}{\sim } e_j \Rightarrow me(subject(e_j))=\) \(me(subject(e_j)) \cup me(subject(e_i)) )\) | |
Inference Engine | \(\varvec{ me(s_2)=me(s_2) \cup me(s_1)}\) | |
Step 5 | Memory/Manipulation Storage (MStore) | \(me(s_1)=\{o_1\},\ ma(s_1)=\{o_2\},me(s_2)=\{o_2\} \cup \varvec{\{o_1\}}\) |
Security Policy Store (PStore) | \(o_1 \notin me(s_2)\) | |
Policy Checker | Alert |
-
RBox: Big size of RBox can strongly increase the reasoning time. Since in the field of APT detection, the number of rules in RBox is limited (in maximum 500 rules as discussed in Relation rules subsection), RBox does not have a great impact on the reasoning time.
-
TBox: Since we use TBox for storing the ontology of the Windows operating system and the size of the Windows ontology is not very large, TBox does not make a challenge for reasoning time.
-
ABox: Any instances of OWL classes and relations (e.g., events, subjects, objects, actions, and relations) are stored in ABox. Since the number of collected events (and consequently the number of event relations) grows rapidly during the time, the size of ABox increases significantly by event interception and event correlation as well. This problem is more evident for slow APTs. In this paper, we propose an approach for overcoming this problem.
Step 1: event interception
Step 2: event normalization
Step 3: big event set processing
Step 4: policy checking
-
Confidentially violation: A subject \(s_i\) that reads an object \(o_i\) (\(o_i \in me(s_i)\)) violates the confidentiality, if and only if the security policy does not permit explicitly or implicitly reading object \(o_i\) by subject \(s_i\). In other words:$$\begin{aligned}&Subject(s_i)\wedge Object(o_i) \wedge o_i \in me(s_i) \wedge \\&\quad \langle s_i,o_i,R\rangle \in SP \Rightarrow confidentialityViolation=True \end{aligned}$$
-
Integrity violation: A subject \(s_i\) that writes an object \(o_i\) (\(o_i \in ma(s_i)\)) violates the integrity, if and only if the security policy does not permit explicitly or implicitly writing object \(o_i\) by subject \(s_i\). In other words:$$\begin{aligned}&Subject(s_i)\wedge Object(o_i) \wedge o_i \in ma(s_i) \wedge \nonumber \\&\quad \langle s_i,o_i,W\rangle \in SP \Rightarrow integrityViolation=True. \end{aligned}$$
-
Determining the events that cause to violate the security policy as the malicious events.
-
Tracing back the attack vectors that contain the malicious events.
-
Determining the first events of attack vectors.
-
The subjects of these events are the origins of APT attacks.
Big event knowledge-based processing
Vermiform window
-
Expanding: In expanding step, the new intercepted events are appended to the window immediately and the events of the window are correlated to each other every eight hours and then the inference engine checks the violation of security policy. The process of event correlation in expanding movement is discussed in “Expanding: knowledge‑based inference” section.
-
Shrinking: In some situations, the number of events in the Vermiform window becomes very high and we cannot append any new event to the window. In this state, the prevalent approach is eliminating some of the old events to provide a free space for the new events. Since eliminating the old events reduces the accuracy of the detection approach, we create a history or an abstraction from the old events, instead of eliminating them. The process of event abstraction is discussed in “Shrinking: event abstraction” section. Therefore, the main purpose of shrinking is to reduce the number of events in the sliding window.
Expanding: knowledge-based inference
# | Rule |
---|---|
1 | \(Subject(s_i) \wedge Subject(s_j) \wedge (s_i \xrightarrow {partOf} s_j ) \Longrightarrow me(s_j)= me(s_j) \cup me(s_i)\) |
2 | \(Subject(s_i) \wedge Subject(s_j) \wedge (s_i \xrightarrow {partOf} s_j ) \Longrightarrow ma(s_j)= ma(s_j) \cup ma(s_i)\) |
3 | \(Process(s_i) \wedge Process(s_j) \wedge (s_i \xrightarrow {parentOf} s_j ) \Longrightarrow me(s_j)= me(s_j) \cup me(s_i)\) |
4 | \(Process(s_i) \wedge Process(s_j) \wedge (s_i \xrightarrow {parentOf} s_j ) \Longrightarrow ma(s_j)= ma(s_j) \cup ma(s_i)\) |
5 | \(Event(e_i) \wedge Object(o_j) \wedge (o_j \xrightarrow {partOf} object(e_i) ) \wedge (action(e_i)=R) \Longrightarrow me(subject(e_i)):=me(subject(e_i)) \cup \{o_j\}\) |
6 | \(Event(e_i) \wedge Object(o_j) \wedge (o_j \xrightarrow {partOf} object(e_i) ) \wedge (action(e_i)=W ) \Longrightarrow ma(subject(e_i)):=ma(subject(e_i)) \cup \{o_j\}\) |
7 | \(Event(e_i) \wedge Object(o_j) \wedge (o_j \xrightarrow {partOf} object(e_i) ) \wedge (action(e_i)=D) \Longrightarrow ma(subject(e_i)):=ma(subject(e_i)) \cup \{o_j\}\) |
8 | \(Event(e_i) \wedge Event(e_j) \wedge action(e_i)=W \wedge action(e_j)=R \wedge object(e_i) = object(e_j) \wedge UT(subject(e_i)) \Rightarrow me(subject(e_j))= me(subject(e_j)) \cup me(subject(e_i))\) |
9 | \(Event(e_i) \wedge Event(e_j) \wedge action(e_i)=W \wedge action(e_j)=R \wedge object(e_i) \xrightarrow {partOf} object(e_j) \wedge UT(subject(e_i)) \Rightarrow me(subject(e_j))= me(subject(e_j)) \cup me(subject(e_i))\) |
10 | \(Event(e_i) \wedge Event(e_j) \wedge UT(subject(e_i)) \wedge (action(e_i)=W) \wedge (object(e_i)=subject(e_j)) \wedge (action(e_j)=W)\) |
\(\Rightarrow ma(subject(e_i)):=ma(subject(e_i)) \cup \{object(e_j)\}\) | |
11 | \(Event(e_i) \wedge Event(e_j) \wedge UT(subject(e_i)) \wedge (action(e_i)=W) \wedge (object(e_i)=subject(e_j)) \wedge (action(e_j)=D)\) |
\(\Rightarrow ma(subject(e_i)):=ma(subject(e_i)) \cup \{object(e_j)\}\) | |
12 | \(Event(e_i) \wedge Event(e_j) \wedge UT(subject(e_i)) \wedge (action(e_i)=C) \wedge (object(e_i)=subject(e_j)) \wedge (action(e_j)=W)\) |
\(\Rightarrow ma(subject(e_i)):=ma(subject(e_i)) \cup \{object(e_j)\}\) | |
13 | \(Event(e_i) \wedge Event(e_j) \wedge UT(subject(e_i)) \wedge (action(e_i)=C) \wedge (object(e_i)=subject(e_j)) \wedge (action(e_j)=D)\) |
\(\Rightarrow ma(subject(e_i)):=ma(subject(e_i)) \cup \{object(e_j)\}\) |
# | Rule | Description |
---|---|---|
1 | \(Subject(s_i) \wedge Subject(s_j) \wedge (s_i \xrightarrow {partOf} s_j ) \wedge UT(s_i) \Longrightarrow UT(s_j)\) | Untrusted subpart |
2 | \(Event(e_i) \wedge Event(e_j) \wedge e_i \overset{tewr}{\sim } e_j \wedge UT(subject(e_i)) \Rightarrow UT(object(e_j)\) | Untrusted input |
3 | \(Event(e_i) \wedge Event(e_j) \wedge e_i \overset{tiwr}{\sim } e_j \wedge UT(subject(e_i)) \Rightarrow UT(subject(e_j)\) | Untrusted input |
4 | \(Event(e_i) \wedge Event(e_j) \wedge e_i \overset{piwr}{\sim } e_j \wedge UT(subject(e_i)) \Rightarrow UT(subject(e_j)\) | Untrusted input |
5 | \(Event(e_i) \wedge UT(subject(e_i)) \wedge (action(e_i)= W) \wedge Subject(object(e_i)) \Rightarrow UT(object(e_i)\) | Injection |
-
\(T_{min}\) or minimum time of the Vermiform window in expanding: Since regular APT attacks duration is about several months (maximum one year), we should at least analyze and correlate the events intercepted in this interval. In other words:$$\begin{aligned} T_{min}=t_{r} -t_{h} \simeq 12~Months. \end{aligned}$$(10)
-
\(S_{max}\) or maximum size of the Vermiform window in expanding: As mentioned before, the maximum length of the window is when the window is fully expanded. This maximum length depends on the processing power of the inference engine. Since the processing power of our inference engine is limited (approximately one billion frames), the maximum number of frames, which can be in the Vermiform window, is approximately one billion. In other words:$$\begin{aligned} S_{max}=s_{max}-s_0 \simeq 1 ~billion~ frames. \end{aligned}$$(11)
-
\(S_{min}\) or minimum size of the Vermiform window in expanding: To succeed in discovering the APT attacks, the inference engine must ensure that:\(S_{min}\) is the number of generated frames from timestamp \(t_h\) to \(t_r\). Since this value is time dependent, and differs from one event set to another, and depends on the size of the victim’s computer network, the exact value of \(S_{min}\) is calculated for each event set such as ES as follows:$$\begin{aligned} S_{max} \ge S_{min}. \end{aligned}$$(12)where f is the frame function (which is defined in "Preliminaries") and ES is the set of all collected events in the expanding step of the Vermiform window.$$\begin{aligned} S_{min}(ES)= \sum _{t=t_{h}}^{t=t_{r}} f(ES ,t), \end{aligned}$$(13)
-
\(T_{max}\) or maximum time of the Vermiform window in expanding: To succeed in discovering the APT attacks, the inference engine must ensure that:\(T_{max}\) is the maximum time of the sliding window in expanding. Since this value differs from one event set to another and depends on the size of the victim’s computer network, the exact value of \(T_{max}\) is calculated for each event set such as ES as below:$$\begin{aligned} T_{max} \ge T_{min}. \end{aligned}$$(14)where ES is the set of all collected events in the expanding step of the Vermiform window.$$\begin{aligned} T_{max}= Max\{time(e_k)\ | e_k \in ES \} - Min \{time(e_k)\ | e_k \in ES \}, \end{aligned}$$(15)
Shrinking: event abstraction
-
There is no capacity to store more events than those appearing in WE in the current expanding step of the Vermiform window. In other words:where ES is the set of all collected events in the current expanding step of the Vermiform window.$$\begin{aligned} S_h(ES) < \sum _{t=0}^{t=\infty } f(WE ,t), \end{aligned}$$(18)
-
The size of AE as history in Vermiform window is much less than the maximum size of the current expanding step. In other words:$$\begin{aligned} S_h(ES) \geqslant \sum _{t=0}^{t=\infty } f(AE ,t). \end{aligned}$$(19)
-
Abstraction based on actions: To increase the expression power and accuracy, all the system actions were defined by six basic actions (Read, Write, Execute, Delete, Access, and Create). However, some of these actions (i.e., Create, Delete, and Access) can be replaced by some other actions (i.e., Read and Write). To this aim, we define a set of abstraction rules, which is shown in Table 5. According to these rules, actions Create and Delete are replaced by action Write, action Access is replaced by action Read, and action Execute is eliminated for event correlation; because it does not affect event correlation.
-
Abstraction based on subjects: Considering the ontology of the Windows operating system (which is shown in Fig. 5), we can replace the lower level subjects with the upper-level ones. For example, a Thread can be abstracted as a Process, or a Process can be abstracted as a User or a Host. Hence we define four abstraction rules based on the subjects’ relations (which are shown in Table 6). For example, rule \(R_1S\) means each event, which is generated by a thread such as \(s_i\), is supposed to be generated by its related process such as \(s_j\).
-
Abstraction based on objects: Similar to the subjects, we can replace the lower level objects by the upper-level ones. For example, a Socket can be abstracted as a File or a Device, and a Device can be abstracted as a KernelObj. Hence, we define an abstraction rule based on objects’ relations (which is shown in Table 7) to abstract the events or remove the redundant information. Rule \(R_1O\) says if subject \(s_i\) wants to read object \(o_i\) and this subject had read object \(o_j\) before that and \(o_i\) is a part of \(o_j\), then this event can be abstracted by another event such as \(e_i=\langle s_i,o_j,R, t_i\rangle\) with less details.
Abstraction rule | Abstraction step | Abstraction condition | Abstraction operation |
---|---|---|---|
\(R_1A\) | 1.1 | \(\forall e_i=\langle s_i,o_i,a_i, t_i\rangle , Event(e_i) \wedge action(e_i)= C\) | \(e_i=\langle s_i,o_i,W, t_i\rangle\) |
1.2 | \(\forall e_i=\langle s_i,o_i,a_i, t_i\rangle , Event(e_i) \wedge action(e_i)= D\) | \(e_i=\langle s_i,o_i,W, t_i\rangle\) | |
1.3 | \(\forall e_i=\langle s_i,o_i,a_i, t_i\rangle , Event(e_i) \wedge action(e_i)= A\) | \(e_i=\langle s_i,o_i,R, t_i\rangle\) | |
1.4 | \(\forall e_i=\langle s_i,o_i,a_i, t_i\rangle , Event(e_i) \wedge Process(s_i)\wedge Process(o_i)\wedge action(e_i)= E\) | Delete \(e_i\) and add \(s_i \xrightarrow {parentOf} o_i\) |
Abstraction rule | Abstraction condition | Abstraction operation |
---|---|---|
\(R_1S\) | \(\forall e_i=\langle s_i,o_i,a_i, t_i\rangle , Event(e_i) \wedge Thread(s_i) \wedge \exists s_j, Process(s_j) \wedge s_i \xrightarrow {partOf} s_j\) | \(e_i=\langle s_j,o_i,a_i, t_i\rangle\) |
\(R_2S\) | \(\forall e_i=\langle s_i,o_i,a_i, t_i\rangle , Event(e_i) \wedge Process(s_i) \wedge \exists s_j, Process(s_j) \wedge s_i \xrightarrow {parentOf} s_j\) | \(e_i=\langle s_j,o_i,a_i, t_i\rangle\) |
\(R_3S\) | \(\forall e_i=\langle s_i,o_i,a_i, t_i\rangle , Event(e_i) \wedge Process(s_i) \wedge \exists s_j, User(s_j) \wedge s_i \xrightarrow {partOf} s_j\) | \(e_i=\langle s_j,o_i,a_i, t_i\rangle\) |
\(R_4S\) | \(\forall e_i=\langle s_i,o_i,a_i, t_i\rangle , Event(e_i) \wedge Process(s_i) \wedge \exists s_j, Host(s_j) \wedge s_i \xrightarrow {partOf} s_j\) | \(e_i=\langle s_j,o_i,a_i, t_i\rangle\) |
Abstraction rule | Abstraction step | Abstraction condition | Abstraction operation |
---|---|---|---|
\(R_1O\) | 1.1 | \(\forall e_i=\langle s_i,o_i,R, t_i\rangle , Event(e_i) \wedge Object(o_i) \wedge \exists o_j, Object(o_j) \wedge o_i \xrightarrow {partOf} o_j\) | \(e_i=\langle s_i, partOf (o_j),R, t_i\rangle\) |
1.2 | \(\forall e_i=\langle s_i,o_i,W, t_i\rangle , Event(e_i) \wedge Object(o_i) \wedge \exists o_j, Object(o_j) \wedge o_i \xrightarrow {partOf} o_j\) | \(e_i=\langle s_i, partOf ( o_j),W, t_i\rangle\) |
Abstraction level | Abstraction rules | Average event reduction rate | Average impact on detecting very slow attacks | Average impact on detecting slow attacks |
---|---|---|---|---|
\(L_0\) | No abstraction | 0 % | 0 % | 0 % |
\(L_1\) | \(R_1A\) | 12 % | +2.8 % | 0 % |
\(L_2\) | \(L_1 \wedge R_1S\) | 7 % | +0.12 % | 0 % |
\(L_3\) | \(L_2 \wedge R_1O\) | 11 % | +1.5 % | −0.9 % |
\(L_4\) | \(L_3 \wedge R_2S\) | 18 % | +3.24 % | 0 % |
\(L_5\) | \(L_4 \wedge R_3S\) | 47 % | −1.11 % | −6.8 % |
\(L_6\) | \(L_5 \wedge R_4S\) | 51 % | −4.3 % | −12.12 |
# | System calls | Description |
---|---|---|
1 | VirtualAllocEx | \(e_1= \langle s_1, o_1, a_1, t_1 \rangle , a_1=C,\) \(Thread(s_1), s_1 \xrightarrow {partOf} p_1,\) \(Memory(o_1), o_1 \xrightarrow {partOf} p_2\) |
2 | WriteProcessMemory | \(e_2= \langle s_1, o_1, a_2, t_2 \rangle , a_2= W, t_2=t_1+\epsilon\) |
3 | CreateRemoteThread | \(e_3= \langle s_1, o_2, a_3, t_3 \rangle , a_3=C, Thread(o_2), o_2 \xrightarrow {partOf} p_2, o_1 \xrightarrow {partOf} o_2, t_3=t_2+\epsilon\) |
4 | SetThreadContext | \(e_4= \langle s_1, o_3, a_4, t_4 \rangle , a_4=W, Context(o_3),\) \(o_3 \xrightarrow {partOf} o_2, t_4=t_3+\epsilon ,\) |
5 | ResumeThread | \(e_5= \langle s_1, o_2, a_5, t_5 \rangle , a_5=E, t_5=t_4 + \epsilon\) |
6 | DeleteFile | \(e_6= \langle o_2, o_4, a_6, t_6 \rangle , a_6=D, File(o_4), t_6=t_5 + \epsilon\) |
# | \(L_0\) | \(L_1 (R_1A)\) | \(L_2 (L_1 \wedge R_1S)\) | \(L_3 (L_2 \wedge R_1O)\) | End of abstraction |
---|---|---|---|---|---|
1 | \(e_1= \langle s_1, o_1, C, t_1 \rangle\) | \(e_1= \langle s_1,o_1, {\varvec{W}}, t_1 \rangle\) | \(\varvec{ (Redundant)}\) | – | – |
2 | \(e_2= \langle s_1, o_1, W, t_2 \rangle\) | \(e_2= \langle s_1,o_1, W, t_2 \rangle\) | \(e_2= \langle \varvec{p_1},o_1, W, t_2 \rangle\) | \(e_2= \langle p_1,\varvec{p_2}, W, t_2 \rangle\) | \(\varvec{(Redundant)}\) |
3 | \(e_3= \langle s_1, o_2, C, t_3 \rangle\) | \(e_3= \langle s_1,o_2, {\varvec{W}}, t_3 \rangle\) | \(e_3= \langle \varvec{p_1}, o_2, W, t_3 \rangle\) | \(e_3= \langle p_1,\varvec{p_2}, W, t_3 \rangle\) | \(\varvec{ (Redundant)}\) |
4 | \(e_4= \langle s_1, o_3, W, t_4 \rangle\) | \(e_4= \langle s_1,o_3, W, t_4 \rangle\) | \(e_4= \langle \varvec{p_1}, o_3, W, t_4\rangle\) | \(e_4= \langle p_1, \varvec{p_2}, W, t_4 \rangle\) | \(e_4= \langle p_1, p_2, W, t_4 \rangle\) |
5 | \(e_5= \langle s_1, o_2, E, t_5 \rangle\) | \(\varvec{(Deleted)}\) | – | – | – |
6 | \(e_6= \langle o_2, o_4, D, t_6 \rangle\) | \(e_6= \langle o_2, o_4, {\varvec{W}}, t_6 \rangle\) | \(e_6= \langle \varvec{p_2}, o_4, W, t_6 \rangle\) | \(e_6= \langle p_2, \varvec{o_4}, W, t_6 \rangle\) | \(e_6= \langle p_2, o_4, W, t_6 \rangle\) |
Other restrictions
Evaluation
Dataset
-
These datasets contain regular and simple attacks whereas APT attacks have many complex behaviors.
-
These datasets do not contain any hybrid, low-level, and slow attacks, which are prevalent in APT attacks.
-
These datasets do not contain any host-based event logs and mostly contain network-based logs and attacks, which are sufficient for APT detection.
-
Attacks duration in these datasets are maximally limited to several weeks, which is not proper for evaluating the slow attacks.
-
The volume of these datasets is limited to several gigabytes, which is not proper for evaluating the scalability of detection approaches.
-
The architecture of the test bed that is used for creating this dataset is shown in Fig. 13.As shown in this figure, the test network contains four sub-networks. The first subnet is the Internet, which is the invasive way to the organization network. The second subnet is a CafeNet, which is connected to the Internet. The third one is Corporate network, which contains local services of an organization and is connected to CafeNet, and the fourth subnet is Critical network, which is an air-gapped network and isolated from the other networks.
-
This dataset contains nine APT scenarios, which are shown in Table 11. These scenarios are the abstracted scenarios of some well-known APT samples, which are reported in [15]. Some APT scenarios of this table were implemented based on the available source codes, reversed codes, and published vulnerabilities of these malwares on the Internet. Then, all APTs were run in the testbed. Some were run concurrently, and some, with overlapping scenarios, were run asynchronously.
-
The behaviors of malwares are intercepted in the operating system in two different ways. The kernel events are intercepted by implementing a Windows driver for hooking and a mini-filter driver for using call-back functions. The user events are intercepted by Easy hook library [60] through the code injection. The interception in hypervisor level is implemented by customizing a version of Ether [61] on Xen hypervisor. Also, the network events are collected by switch port mirroring.
-
The volume of the dataset is approximately 2 Terabytes.
-
The dataset contains low-level, and slow attacks.
-
The dataset contains both the network and host event logs.
-
The total number of intercepted events in the test network is about 1.646 billion events.
-
We use seven hosts (one of which belonged to the attacker) for the simulation and supposed one user per host.
-
Different attacks with different duration times were considered. We deployed one short attack with one-day duration, two almost slow attacks with one-month duration, four slow attacks with five, seven, and eleven months duration, and two very slow attacks with fourteen and fifteen months duration.
-
The simulated network contained 110 benign processes and 9 attack vectors. Each attack vector contained several sub-processes. Our dataset contains the operating system and network event logs for all processes.
-
The normal behaviors were generated by the real users in CafeNet and Corporate networks. Since the actions in Critical networks do not have many dependencies on the users, the normal behaviors of such networks were simulated by some softwares, which were running without interacting with users.
Attack type | ||||||||
---|---|---|---|---|---|---|---|---|
# | Adapted from | Multi-step | Low-level | Slow | Propagation channels | Purpose of function | Attack duration | Special feature |
1 | Project Sauron APT [20] | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | Internet and USB device | Sabotage | 7 months | Bypass air-gapped network |
2 | Flame APT [62] | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) (very slow) | Internet | Data theft | 11 months | Low-level data exfiltration |
3 | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | Internet and LAN spreading | Data wiping | 5 month | Code injection | |
4 | WannaCry APT [65] | \(\checkmark\) | \(\checkmark\) | – | Internet | Ransomware | 1 day | Exploits |
5 | \(\checkmark\) | \(\checkmark\) | \(\checkmark\)(very slow) | Internet and USB device | Data theft | 15 months | Exploits, social engineering | |
6 | \(\checkmark\) | \(\checkmark\) | – | Internet and USB device | Data theft | 1 month | Exploits, social engineering | |
7 | Poseidon APT [68] | \(\checkmark\) | \(\checkmark\) | – | Internet | Remote control | 1 month | Backdoor and code injection |
8 | Dark hotel [69] | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) (very slow) | Internet | Surveillance | 14 months | Stolen digital certificates |
9 | Dark hotel [69] | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | Internet | Surveillance | 9 months | Stolen digital certificates |
Experimental results
-
The processing time of the proposed approach for detecting hybrid and low-level APTs in the test networks was 6.1 hours.
Actual class\Predicted class | APT | Benign | Total | Recognition(%) |
---|---|---|---|---|
APT | 8(TP) | 1 (FN) | 9(P) | 88.88 (Sensitivity= \(\frac{TP}{P}\)) |
Benign | 12(FP) | 98(TN) | 110(N) | 89.09 (Specificity= \(\frac{TN}{N}\)) |
Total | 20 | 99 | 119 | 89.07(Accuracy= \(\frac{TP+TN}{P+N}\)) |
Other metrics(%) | 40.00 (Precision= \(\frac{TP}{TP+FP}\)) | 88.88 (Recall= \(\frac{TP}{TP+FN}\)) | 55.17 (F-Measure) | 10.92 (Error-rate= \(\frac{FP+FN}{P+N}\)) |
APT sample | Number of APT events (P) | Number of other events (N) | TPR (%) | TNR (%) | Accuracy (%) | Precision (%) | Detection result |
---|---|---|---|---|---|---|---|
1 | 9.1 million | 1.637 billion | 86.29 | 90.14 | 90.08 | 4.59 | APT |
2 | 13.7 million | 1.6322 billion | 83.17 | 87.58 | 87.50 | 5.24 | APT |
3 | 6.3 million | 1.6397 billion | 94.9 | 89.94 | 89.91 | 3.47 | APT |
4 | 73 thousand | 1.64527 billion | 95.06 | 97.48 | 97.39 | 0.16 | APT |
5 | 26 million | 1.62 billion | 98.32 | 77.97 | 78.28 | 6.67 | APT |
6 | 1.6 million | 1.6444 billion | 81.02 | 95.1 | 94.85 | 1.50 | APT |
7 | 2.1 million | 1.6439 billion | 91.73 | 90.38 | 90.33 | 1.19 | APT |
8 | 27.7 million | 1.6183 billion | 60.33 | 89.87 | 89.35 | 8.70 | Benign |
9 | 10.1 million | 1.6359 billion | 94.13 | 97.80 | 97.17 | 16.96 | APT |
Method | Accuracy | Sensitivity | Specificity | Precision |
---|---|---|---|---|
Lajevardi & Amini [16] | 77.31 | 44.44 | 80.00 | 15.38 |
Our proposed approach | 89.07 | 88.88 | 89.09 | 40.00 |
Discussion
Method | Attack detection method | Attack type | ||||
---|---|---|---|---|---|---|
Correlation type | Alert causal analysis | Hybrid correlationa | Multi-step attack detection | Low-level attack detection | Slow attack detection | |
Debar and Wespi [73] | Alert Correlation | – | – | – | – | – |
Valeur et al. [74] | Alert Correlation | – | – | \(\checkmark\) | – | – |
Wang and Chiou [75] | Alert Correlation | \(\checkmark\) | – | – | – | – |
Valdes and Skinner [76] | Alert Correlation | – | – | \(\checkmark\) | – | – |
Julisch 2001 [77] | Alert Correlation | \(\checkmark\) | – | – | – | – |
Julisch 2003 [78] | Alert Correlation | \(\checkmark\) | – | – | – | – |
Al-Mamory and Zhang [79] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Peng et al. [80] | Alert Correlation | \(\checkmark\) | – | – | – | – |
Qin and Lee [81] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Goldman et al. [82] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Viinikka et al. [83] | Alert Correlation | – | – | – | – | – |
Treinen and Thurimella [84] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Ourston et al. [41] | Alert Correlation | – | – | \(\checkmark\) | – | – |
Ren et al. [85] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Zhitang et al. [86] | Alert Correlation | – | – | \(\checkmark\) | – | – |
Ma et al. [87] | Alert Correlation | – | – | \(\checkmark\) | – | – |
Zhitang et al. [88] | Alert Correlation | – | – | \(\checkmark\) | – | – |
Farhadi et al. [89] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Manganiello et al. [90] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Soleimani and Ghorbani [91] | Alert Correlation | – | – | \(\checkmark\) | – | – |
Ramaki et al. [92] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Ghafir et al. [40] | Alert Correlation | – | – | \(\checkmark\) | – | – |
Lajevardi and Amini [16] | Event Correlation | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | – |
Mohamed and Belaton [38] | Alert Correlation | \(\checkmark\) | – | \(\checkmark\) | – | – |
Our proposed approach | Event Correlation | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) | \(\checkmark\) |
-
Correlation algorithms: Since the solution for detecting low-level APTs is based on reasoning, the processing time depends on the reasoning algorithm. We use OWL-DL (like a solution in [93]) for reasoning by SANSA. In this situation, the complexity class of reasoning is ExpTime-Complete. Since SANSA inference engine uses OWL-Horst [94] forward chaining inference for reasoning and we do not use all features of basic description logic (Attributive Language with Complements), we can use OWL-Horst instead of OWL-DL to reduce the reasoning time. In this situation the complexity class of reasoning is Nondeterministic Polynomial Complete (NP-Complete) and in a nontrivial case, it is Polynomial [94].
-
SANSA framework: To detect slow APTs, we encounter with analyzing a big number of frames (about several billion frames). Since we use SANSA as an inference engine, the processing time is highly dependent on the processing power of SANSA. Our experience shows SANSA can analyze several million frames in some seconds.
-
Processing infrastructure: Since SANSA uses Spark technology to process the big size of data, the infrastructure that is used by Spark has a key role in the processing time. We deployed our approach on a cluster with 8 computing nodes and 80 cores, 1 Terabyte of RAM, and an NFS (Network File System) server node with 10 Terabyte capacity. The operating system was Rocks cluster 7.
Conclusion
-
One of the main challenges in the field of APTs is attack prediction. Our approach in this paper cannot predict the APT attacks before the malware fulfills its malicious activity. Using machine learning algorithms, we can predict malicious activities before an APT completes its malicious activities.
-
The proposed approach has no solution to detect the APT attacks that violate the availability of the system and propose an approach for solving this weakness can be considered as future work.
-
The proposed approach has no solution to verify the generated alerts. Alert verification can help us to reduce the number of false alerts.
-
Another future work is to improve and release our dataset to be used by other APT detection approaches to have a better and precise evaluation result.
-
Proposing a stream-based approach for detecting the APT attacks that are not slow is another research topic, which could be considered in the future.