1 Introduction
-
We established a monitoring system that succeeded in being continuously compromised by various adversary infrastructures in a one-year experiment.
-
Our proposed system discloses adversaries’ activities on compromised websites: traffic direction to exploit websites, web access control to circumvent security inspection, phishing-based credential exfiltration, and mail-based drive-by download.
-
We verified through a field experiment that the monitored information instantly revealed unknown malicious IP addresses and domains without needing to conduct large-scale web crawling when they are used, and most of the information was not contained in public blacklists.
2 Related work
2.1 Honeytoken
2.2 Collecting and analyzing exploit codes and malware
3 Assumption and preliminary investigation
4 Design of monitoring system
4.1 Analytical procedure
-
Collecting malwareOur web client honeypot crawls malicious websites listed in the latest blacklist and collects the latest malware executables. The collected malware executables are sent to the malware sandbox.
-
Distributing bait credentialsOur malware sandbox analyzes the collected malware executables. In each analysis, the malware sandbox randomly generates unique credentials. If an analyzed malware executable has information-leaking functionality, it exfiltrates information, e.g., credentials, to a remote adversary.
-
Monitoring fraudulent access and compromised web contentOur WCMS honeypot actually behaves as an FTP server. It creates a user directory for each account corresponding to potentially leaked credential and awaits a masquerade attack with stolen credentials. It stores login, command, and file histories in each account, while a masquerader intrudes on it then operates files that are web content.
-
Inspecting compromised web contentThe web content on our WCMS honeypot, which is compromised by a masquerader, is assumed to be injected with redirect code leading to malicious websites for drive-by downloads. Our web client honeypot inspects the web content and collects information on unknown malicious websites as redirect destinations.
4.2 Building blocks
4.2.1 Web client honeypot
4.2.2 Malware sandbox
4.2.3 Credential honeytoken
4.2.4 WCMS honeypot
html
, php
, js
files) for each user directory of the FTP account and permits FTP users to access their own directory. Our WCMS honeypot stores FTP login, command, and file histories in each account. When an original file is changed, our WCMS honeypot preserves the changed file.5 Field experiment
5.1 Malware collection
5.2 Compromised accounts
5.3 Adversary infrastructure
5.3.1 Graph partitioning
Adversary infrastructure | # of adversary’s IP addresses | # of compromised accounts | # of malware executables | # of fraudulent login events |
---|---|---|---|---|
Group A | 401 | 273 | 168 | 4921 |
Group B | 205 | 15 | 15 | 803 |
Group C | 26 | 3 | 3 | 28 |
Group D | 33 | 4 | 4 | 215 |
Group E | 25 | 1 | 1 | 91 |
Group F | 9 | 7 | 6 | 68 |
Group G | 3 | 19 | 18 | 149 |
Group H | 2 | 1 | 1 | 2 |
Group I | 3 | 1 | 1 | 4 |
Group J | 4 | 1 | 1 | 4 |
Group K | 4 | 1 | 1 | 20 |
Group L | 4 | 1 | 1 | 11 |
Group M | 1 | 1 | 1 | 1 |
Group N | 1 | 1 | 1 | 1 |
Group O | 1 | 1 | 1 | 1 |
Group P | 1 | 1 | 1 | 1 |
Total | 722 | 331 | 224 | 6320 |
5.3.2 Property of adversary infrastructure
5.4 Malware leaking information
Malware family | # |
---|---|
A. McAfee | |
Generic BackDoor.* | 81 |
PWS-Zbot* | 52 |
BackDoor-FJW!* | 51 |
Other malware
| 38 |
Unknown
| 2 |
B. Kaspersky | |
Trojan-PSW.Win32.Tepfer.* | 69 |
Trojan-Downloader.Win32.Agent.* | 68 |
Trojan.Win32.Bublik.* | 14 |
Other malware
| 49 |
Unknown
| 24 |
C. Symantec | |
W32.Waledac.D* | 83 |
Trojan.Gen* | 47 |
SecShieldFraud* | 27 |
Other malware
| 44 |
Unknown
| 23 |
5.5 Credential leakage behavior
5.5.1 Internal behavior on infected host
A. Clients and targeting malware | # of malware executables |
---|---|
Targeted clients | |
\(A_1\)
| 190 (84.82%) |
\(A_2\)
| 179 (79.91%) |
\(A_3\)
| 190 (84.82%) |
\(A_4\)
| 190 (84.82%) |
B
| 27 (12.05%) |
C
| 27 (12.05%) |
D
| 23 (10.26%) |
E
| 163 (72.76%) |
F
| 20 (8.92%) |
None | 23 (10.26%) |
B. Combination of targeted clients | # of malware executables |
---|---|
# of targeted clients | |
0 | 23 (10.26%) |
1 | 5 (2.23%) |
2 | 2 (0.89%) |
3 | 1 (0.44%) |
4 | 35 (15.62%) |
5 | 136 (60.71%) |
6 | 3 (1.33%) |
7 | 2 (0.89%) |
8 | 8 (3.57%) |
9 | 9 (4.01%) |
5.5.2 Leakage communication
5.6 Adversary behavior on compromised server
STOR ’
fileA
’
,” where fileA is a new file and has never been retrieved before in a session (i.e., “RETR ’
fileA
’
” is not executed before STOR
command).STOR ’
fileB
’
” and executed after “RETR ’
fileB
’
” in a session.STOR
) a test file and trying to access it on the website (i.e., accessing “http://ipaddr/filepath,” ipaddr is the IP address of our WCMS honeypot and filepath is the uploaded file’s path assumed by an adversary), and the test file is immediately deleted (DELE
) just after the access.QUIT
” or session terminating) without file access commands (e.g., RETR
, STOR
) after login.5.7 Malicious web content
5.7.1 Injected code
diff
command. Almost all injected codes are obfuscated JavaScript. This malicious obfuscated code is decoded on a web browser. It then automatically conducts redirection, e.g., creating an iframe tag. In addition, characteristic marker comments for an adversary, such as
or
, are added with obfuscated JavaScript. Adversaries are generally continuously developing obfuscation algorithms in order to circumvent detection; therefore, we must capture obfuscated codes and understand them in a timely manner. Our WCMS honeypot can continuously and automatically collect injected malicious codes. These codes must be actual good samples for supporting the generation of signatures or algorithms for detecting compromised web content.5.7.2 Traffic redirection
5.7.3 Web access control by server-side content
.htaccess
, which is a configuration file for controlling web access, into their directories of the compromised web content. This .htaccess
is used for traffic redirection. To circumvent crawling-based inspection, it checks the referrer of the accessed web client and permits redirection to malicious websites when the web client has a certain referrer. We confirmed that the URLs of a portal website or search engines are described in the referrer check routine. This means that only web clients from certain portal sites, search engine sites, or social networking service sites can be redirected to malicious websites. In other words, a web client honeypot directly accesses a compromised website, but it is not able to detect malicious websites. It also uses HTTP-error based redirection by using an ErrorDocument directive. If .htaccess
has an “ ErrorDocument 404 redirect-URL” directive, a user is redirected to the redirect-URL by an HTTP-302 redirect when he/she falsely accesses non-existing or error URLs. This technique does not require injecting a redirect code into original web content; therefore, it can circumvent being recognized by the legitimate WCMS administrator. When a web client mistakenly accesses a URL that is not found, it is redirected to an arbitrary URL, i.e., http://example.com/exploit.php in this case.Blacklist | # of IP addresses | # of FQDNs |
---|---|---|
MDL: MalwareDomainList | 3498 | 3741 |
MP: MalwarePatrol | 5457 | 6425 |
UBL: UrlBlackList (malware) | 208,801 | 111,945 |
MDB: MalwareDomainBlockList | 3009 | 13,212 |
ZT: ZeuS Tracker | 1672 | 1971 |
CMX: CleanMX (viruses) | 65,456 | (n/a) |
Type of info. | Collected |
\(\cap \) MDL |
\(\cap \) MP |
\(\cap \) UBL |
\(\cap \) MDB |
\(\cap \) ZT |
\(\cap \) CMX |
---|---|---|---|---|---|---|---|
Masquerader | 722 | 5 | 2 | 10 | 3 | 1 | 30 |
TDS_A
| 9476 | 2 | 11 | 55 | 1 | 2 | 136 |
TDS_B
| 33 | 7 | 0 | 10 | 3 | 0 | 6 |
Blackhole
| 24 | 15 | 1 | 3 | 5 | 0 | 12 |
Redkit
| 97 | 69 | 3 | 15 | 8 | 2 | 16 |
Phoenix
| 29 | 3 | 0 | 13 | 1 | 2 | 8 |
Incognito
| 18 | 7 | 1 | 1 | 1 | 1 | 0 |
Neosploit
| 19 | 7 | 0 | 5 | 1 | 2 | 8 |
Total | 10,420 | 113 | 18 | 102 | 21 | 8 | 209 |
5.7.4 Phishing page
5.7.5 Mailing infrastructure
6 Evaluation
6.1 Comparison with public blacklists
Type of info. | Collected | \(\cap \) MDL | \(\cap \) MP | \(\cap \) UBL | \(\cap \) MDB | \(\cap \) ZT | \(\cap \) CMX |
---|---|---|---|---|---|---|---|
Masquerader | (n/a) | (n/a) | (n/a) | (n/a) | (n/a) | (n/a) | (n/a) |
TDS_A
| 84 | 0 | 0 | 31 | 5 | 0 | (n/a) |
TDS_B
| 525 | 3 | 0 | 19 | 11 | 0 | (n/a) |
Blackhole
| 127 | 3 | 0 | 0 | 0 | 0 | (n/a) |
Redkit
| 82 | 34 | 0 | 13 | 9 | 0 | (n/a) |
Phoenix
| 43 | 1 | 0 | 11 | 0 | 0 | (n/a) |
Incognito
| 32 | 2 | 0 | 5 | 5 | 0 | (n/a) |
Neosploit
| 7 | 1 | 0 | 11 | 0 | 0 | (n/a) |
Total | 900 | 44 | 0 | 81 | 30 | 0 | (n/a) |
6.2 Rapidity of malicious website discovery
whois
information. Note that it is possible that \(T_{{\mathrm{latency}}} < 0\) (\(T_{{\mathrm{register}}}\) is newer than \(T_{{\mathrm{discover}}}\)) because \(T_{{\mathrm{register}}}\) is rewritable when the owner of a domain changes the domain information. The distributions of discovery latency are shown in Fig. 14 excluding those of domains that we could not obtain information of \(T_{{\mathrm{register}}}\) from WHOIS
. The results show that our discovered malicious websites’ domains have considerably shorter discovery latency than those of blacklisted websites’ domains. Our proposed system can instantaneously discover malicious websites when they are used for an attack.