Article

Fighting unicode-obfuscated spam

Authors:
Changwei Liu

Indiana University

Indiana University
View Profile

,
Sid Stamm

Indiana University

Indiana University
View Profile

eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summitOctober 2007Pages 45–59https://doi.org/10.1145/1299015.1299020

Published:04 October 2007Publication History

eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit

Pages 45–59

ABSTRACT

In the last few years, obfuscation has been used more and more by spammers to make spam emails bypass filters. The standard method is to use images that look like text, since typical spam filters are unable to parse such messages; this is what is used in so-called "rock phishing". To fight image-based spam, many spam filters use heuristic rules in which emails containing images are flagged, and since not many legit emails are composed mainly of a big image, this aids in detecting image-based spam. The spammers are thus interested in circumventing these methods. Unicode transliteration is a convenient tool for spammers, since it allows a spammer to create a large number of homomorphic clones of the same looking message; since Unicode contains many characters that are unique but appear very similar, spammers can translate a message's characters at random to hide black-listed words in an effort to bypass filters. In order to defend against these unicode-obfuscated spam emails, we developed a prototype tool that can be used with Spam Assassin to block spam obfuscated in this way by mapping polymorphic messages to a common, more homogeneous representation. This representation can then be filtered using traditional methods. We demonstrate the ease with which Unicode polymorphism can be used to circumvent spam filters such as SpamAssassin, and then describe a de-obfuscation technique that can be used to catch messages that have been obfuscated in this fashion.

References

S. Ahmed, F. Mithun, "Word Stemming to Enhance Spam Filtering," in the Conference on Email and Anti-Spam (CEAS'04) 2004. http://www.ceas.cc/papers-2004/167.Google Scholar
R. Cockerham, "There are 600, 426, 974, 379, 824, 381, 952 ways to spell Viagra." http://cockeyed.com/lessons/viagra/viagra.html. Retrieved on 25 July 2007.Google Scholar
D. Cook, J. Hartnett, K. Manderson, J. Scanlan, "Catching Spam Before it Arrives:Domain Specific Dynamic Blacklists," http://crpit.com/confpapers/CRPITV54Cook.pdf. Google ScholarDigital Library
L. F. Cranor, B. A. LaMacchia, "Spam!" Communications of the ACM, August 1998. Google ScholarDigital Library
A. Y. Fu, W. Zhang, X. Deng, W. Liu, "Safeguard against unicode attacks: generation and Application of UC-simlist," in the 15th International World Wide Web Conference (WWW'06), May 2006. Google ScholarDigital Library
A. Y. Fu, X. Deng, W. Liu, G. Little, "The Methodology and an Application to Fight Against Unicode Attacks," in Proceedings of the Second Symposium on Usable Privacy and Security (SOUPS'06) July 2006. ACM Press. Google ScholarDigital Library
F. D. Garcia, J. H. Hoepman, J. V. Nieuwenhuizen, "Spam Filter Analysis," arXiv report, February 2004. Available at http://arxiv.org/PS_cache/cs/pdf/0402/0402046v1.pdfGoogle Scholar
S. L. Garfinkel and R. C. Miller, "Johnny 2: a user test of key continuity management with S/MIME and Outlook Express," Proceedings of the 2005 Symposium on Usable Privacy and Security, 2005, pp. 13--24 Google ScholarDigital Library
P. Graham, "Better Bayesian Filtering," Spam Conference, January 2003. Available at http://www.paulgraham.com/better.html.Google Scholar
E. Gabber, M. Jakobsson, Y. Matias, A. Mayer, "Curbing Junk E-mail via Secure Classification," Financial Cryptograpy, 1998. Google ScholarDigital Library
E. Gabrilovich, A. Gontmakher, "The Homograph Attack," Communications of the ACM, February 2002. Google ScholarDigital Library
J. Goodman, G. V. Cormack, D. Heckerman, "Spam and the Ongoing Battle for the Inbox," Communications of the ACM, February 2007. Google ScholarDigital Library
R. J. Hall, "Channels: Avoiding Unwanted Electronic Mail," Communications of the ACM, Volume 41 Issue 3, 1998. Google ScholarDigital Library
R. J. Hall, "A Countermeasure to Duplicate-detecting Anti-spam Techniques," Available at http://citeseer.ist.psu.edu/279802.html, accessed 25 July 2007.Google Scholar
M. Jakobsson, "Modeling and Preventing Phishing Attacks," Phishing Panel in Financial Cryptography 2005. Available at www.informatics.indiana.edu/markus/papers/phishing_jakobsson.pdf Google ScholarDigital Library
M. Jakobsson, J. Linn, J. Algesheimer, "How to Protect Against a Militant Spammer," http://www.informatics.indiana.edu/markus/papers/spam.pdf, accessed 1 July 2007.Google Scholar
M. Jakobsson and S. A. Myers (Eds.), Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Theft. ISBN 0-471-78245-9, Hardcover, 739 pages, December 2006. Google ScholarDigital Library
J. Nazario, "Phishing Corpus," http://monkey.org/~jose/blog/viewpage.php?page=phishing_corpus. Accessed 22 May 2007.Google Scholar
U. Shardanand, P. Maes, "Social Information Filtering: Algorithms for Automating 'Word of Mouth'," Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. May 1995. Google ScholarDigital Library
B. Thorson, "How Spammers Bypass E-mail Security," EE Times, 25 July 2007. http://www.eetimes.com/showArticle.jhtml? articleID=23900564Google Scholar
A. Tsow and M. Jakobsson, "Deceit and Deception: A Large User Study of Phishing," Technical Report TR649, Indiana University, August 2007. http://www.cs.indiana.edu/pub/techreports/TR649.pdfGoogle Scholar
S. Srikwan, M. Jakobsson, "Using Cartoons to Teach Internet Security." DIMACS Technical Report 2007-11, July 2007. http://www.informatics.indiana.edu/markus/documents/security-education.pdfGoogle Scholar
CRM114. http://crm114.sourceforge.net, Accessed 22 May 2007.Google Scholar
Anti-Phishing Group of City University of Hong Kong, http://antiphishing.cs.cityu.edu.hk.Google Scholar
Messaging Anti-Abuse Working Group, Email Metrics Program: "The Network Operator's Perspective, Report #4--3rd and 4th Quarters 2006," Available at http://www.maawg.org/about/MAAWGMetric_2006_3_4_report.pdfGoogle Scholar
SpamAssassin. http://wiki.apache.org/spamassassin, Accessed 22 May 2007.Google Scholar
SpamAssassin Readme file. http://www.cpan.org/modules/by-module/Mail/Mail-SpamAssassin-2.64.readme Accessed 22 May 2007.Google Scholar
SpamAssassin public Corpus, http://spamassassin.apache.org/publiccorpus, Accessed 25 May 2006.Google Scholar

Fighting unicode-obfuscated spam
1. Information systems
  1. World Wide Web
    1. Web applications
      1. Internet communications tools
2. Social and professional topics
  1. Computing / technology policy

Recommendations

Clustering Spam Emails into Campaigns
ICISSP 2015: Proceedings of the 1st International Conference on Information Systems Security and Privacy

Spam emails constitute a fast growing and costly problems associated with the Internet today. To fight effectively

against spammers, it is not enough to block spam messages. Instead, it is necessary to analyze the

behavior of spammer. This analysis is ...
Read More
Fighting against web spam: a novel propagation method based on click-through data
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

Combating Web spam is one of the greatest challenges for Web search engines. State-of-the-art anti-spam techniques focus mainly on detecting varieties of spam strategies, such as content spamming and link-based spamming. Although these anti-spam ...
Read More
Optimization of Anti-Spam Systems with Multiobjective Evolutionary Algorithms

In this paper anti-spam filtering is presented as a cumbersome service, as opposed to a software product perspective. The huge human effort for setting up, adaptation, maintenance, and tuning of filters for spam detection in anti-spam systems is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit
October 2007
90 pages
ISBN:9781595939395
DOI:10.1145/1299015
General Chair:
Lorrie Faith Cranor
Carnegie Mellon University
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 4 October 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SpamAssassin
deobfuscated emails
obfuscated emails
spam emails
unicode characters
Qualifiers
- Article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 19
  Total Citations
  View Citations
- 524
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fighting unicode-obfuscated spam

eCrime '07: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit

ABSTRACT

References

Cited By

Recommendations

Clustering Spam Emails into Campaigns

Fighting against web spam: a novel propagation method based on click-through data

Optimization of Anti-Spam Systems with Multiobjective Evolutionary Algorithms