Abstract
Rogue certificates are valid certificates issued by a legitimate certificate authority (CA) that are nonetheless untrustworthy; yet trusted by web browsers and users. With the current public key infrastructure, there exists a window of vulnerability between the time a rogue certificate is issued and when it is detected. Rogue certificates from recent compromises have been trusted for as long as weeks before detection and revocation. Previous proposals to close this window of vulnerability require changes in the infrastructure, Internet protocols, or end user experience. We present a method for detecting rogue certificates from trusted CAs developed from a large and timely collection of certificates. This method automates classification by building machine-learning models with Deep Neural Networks (DNN). Despite the scarcity of rogue instances in the dataset, DNN produced a classification method that is proven both in simulation and in the July 2014 compromise of the India CCA. We report the details of the classification method and illustrate that it is repeatable, such as with datasets obtained from crawling. We describe the classification performance under our current research deployment.
- Bernhard Amann, Robin Sommer, Matthias Vallentin, and Seth Hall. 2013. No attack necessary: The surprising dynamics of SSL trust relationships. In Proc. of ACSAC’13. ACM, 179--188. Google ScholarDigital Library
- Bernhard Amann, Matthias Vallentin, Seth Hall, and Robin Sommer. 2012. Extracting Certificates from Live Traffic: A Near Real-Time SSL Notary Service. Technical Report. TR-12-014, ICSI.Google Scholar
- ANSSI. 2013. Revocation of an IGC/A branch. http://www.ssi.gouv.fr/en/the-anssi/events/revocation-of-an- igc-a-branch-808.html. (Dec 2013).Google Scholar
- Michael Bailey, Jon Oberheide, Jon Andersen, Z. Morley Mao, Farnam Jahanian, and Jose Nazario. 2007. Automated classification and analysis of internet malware. In Recent Advances in Intrusion Detection. Springer, 178--197. Google ScholarDigital Library
- Ram Basnet, Srinivas Mukkamala, and Andrew H. Sung. 2008. Detection of phishing attacks: A machine learning approach. In Soft Computing Applications in Industry, Bhanu Prasad (Ed.). Studies in Fuzziness and Soft Computing, Vol. 226. Springer, Berlin, 373--383.Google Scholar
- Lujo Bauer, Scott Garriss, and Michael Reiter. 2011. Detecting and resolving policy misconfigurations in access-control systems. ACM Trans. Inform. Syst. Secur. 14, 1 (2011), 2. Google ScholarDigital Library
- Yoshua Bengio. 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1 (2009), 1--127. Google ScholarDigital Library
- Léon Bottou. 1991. Stochastic gradient learning in neural networks. Proc. Neuro-Nimes 91, 8 (1991).Google Scholar
- Leo Breiman. 2001. Random forests. Mach. Learn. 45, 1 (2001), 5--32. Google ScholarDigital Library
- Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, and Vitaly Shmatikov. 2014. Using Frankencerts for automated adversarial testing of certificate validation in SSL/TLS implementations. In Proc. of SP’14. IEEE Computer Society, 114--129. Google ScholarDigital Library
- Nitesh V. Chawla. 2005. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook. Springer, 853--867.Google Scholar
- Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 1 (Jun. 2002), 321--357. Google ScholarDigital Library
- Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. 20, 3 (1995), 273--297. Google ScholarDigital Library
- Sevtap Duman, Kaan Onarlioglu, Ali Osman Ulusoy, William Robertson, Engin Kirda, Erik-Oliver Blass, Travis Mayberry, Guevara Noubir, Kaan Onarlioglu, Michael Weissbacher, and others. 2014. TrueClick: Automatically distinguishing trick banners from genuine download links. In Proc. of ACSAC’14. ACM, 456--465. Google ScholarDigital Library
- Zakir Durumeric, James Kasten, Michael Bailey, and J. Alex Halderman. 2013. Analysis of the HTTPS certificate ecosystem. In Proc. of IMC’13. ACM, 291--304. Google ScholarDigital Library
- Kevin P. Dyer, Scott E. Coull, Thomas Ristenpart, and Thomas Shrimpton. 2012. Peek-a-boo, i still see you: Why efficient traffic analysis countermeasures fail. In Proc. of SP’12. IEEE, 332--346. Google ScholarDigital Library
- Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. 2010. Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11 (2010), 625--660. Google ScholarDigital Library
- Nicolas Falliere, Liam O. Murchu, and Eric Chien. 2011. W32. stuxnet dossier. White Paper, Symantec Corp., Security Response (2011).Google Scholar
- Wei Fan, Matthew Miller, Sal Stolfo, Wenke Lee, and Phil Chan. 2004. Using artificial anomalies to detect unknown and known network intrusions. Knowl. Inform. Syst. 6, 5 (2004), 507--527. Google ScholarDigital Library
- Dennis Fisher. 2011. DigiNotar Says Its CA Infrastructure Was Compromised. Retrieved from https://threatpost.com/diginotar-says-its-ca-infrastructure-was-compromised-083011/75594/.Google Scholar
- CA/Browser Forum. 2015. Baseline Requirements Certificate Policy for the Issuance and Management of Publicly-Trusted Certificates. Retrieved from https://cabforum.org/wp-content/uploads/CAB-Forum-BR-1.3.0.pdf.Google Scholar
- Laura Fumanelli, Marco Ajelli, Piero Manfredi, Alessandro Vespignani, and Stefano Merler. 2012. Inferring the structure of social contacts from demographic data in the analysis of infectious diseases spread. PLoS Comput. Biol. 8, 9 (Sep. 2012), e1002673.Google ScholarCross Ref
- Sujata Garera, Niels Provos, Monica Chew, and Aviel D. Rubin. 2007. A framework for detection and measurement of phishing attacks. In Proc. of WORM’07. ACM, 1--8. Google ScholarDigital Library
- Fabio Gonzalez, Dipankar Dasgupta, and Robert Kozma. 2002. Combining negative selection and classification techniques for anomaly detection. In Proc. of CEC’02, Vol. 1. IEEE, 705--710.Google ScholarCross Ref
- Guofei Gu, Roberto Perdisci, Junjie Zhang, Wenke Lee, and others. 2008. BotMiner: Clustering analysis of network traffic for protocol-and structure-independent botnet detection. In Proc. of USENIX Security’08. USENIX, 139--154. Google ScholarDigital Library
- Phillip Hallam-Baker. 2011. Comodo SSL Affiliate The Recent RA Compromise. Retrieved from https://blogs. comodo.com/uncategorized/the-recent-ra-compromise/.Google Scholar
- Ling Huang, Anthony D. Joseph, Blaine Nelson, Benjamin I. P. Rubinstein, and J. D. Tygar. 2011. Adversarial machine learning. In Proc. of AISec’11. ACM, 43--58. Google ScholarDigital Library
- Lin Shung Huang, Alex Rice, Erling Ellingsen, and Collin Jackson. 2014. Analyzing forged SSL certificates in the wild. In Proc. of SP’14. IEEE Computer Society, 83--97. Google ScholarDigital Library
- IETF. 2008. Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile. http://tools.ietf.org/html/rfc5280. (May 2008).Google Scholar
- Josh Karlin, Stephanie Forrest, and Jennifer Rexford. 2006. Pretty good BGP: Improving BGP by cautiously adopting routes. In Proc. of ICNP’06. IEEE, 290--299. Google ScholarDigital Library
- Timothy Kelley and L. Jean Camp. 2012. Online promiscuity: Prophylactic patching and the spread of computer transmitted infections. In Proc. of WEIS’12. Springer.Google Scholar
- Richard L. Barnes. 2011. DANE: Taking TLS authentication to the next level using DNSSEC. IETF J. (Oct. 2011).Google Scholar
- Jon Larimer and Kenny Root. 2012. Security and Privacy in Android Apps. Retrieved from https://developers.google.com/events/io/2012/sessions/gooio2012/107/.Google Scholar
- Ben Laurie, Adam Langley, and Emilia Kasper. 2013. RFC 6962: Certificate transparency. http://www.rfceditor.org/info/rfc6962.Google Scholar
- Saskia Le Cessie and Johannes C. Van Houwelingen. 1992. Ridge estimators in logistic regression. Appl. Stat. 41, 1 (1992), 191--201.Google ScholarCross Ref
- Tie-Yan Liu, Yiming Yang, Hao Wan, Hua-Jun Zeng, Zheng Chen, and Wei-Ying Ma. 2005. Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explor. Newslett. 7, 1 (2005), 36--43. Google ScholarDigital Library
- Justin Ma, Lawrence K. Saul, Stefan Savage, and Geoffrey M. Voelker. 2009. Identifying suspicious URLs: An application of large-scale online learning. In Proc. of ICML’09. ACM, 681--688. Google ScholarDigital Library
- Michelle L. Mazurek, Saranga Komanduri, Timothy Vidas, Lujo Bauer, Nicolas Christin, Lorrie Faith Cranor, Patrick Gage Kelley, Richard Shay, and Blase Ur. 2013. Measuring password guessability for an entire university. In Proc. of CCS’13. ACM, 173--186. Google ScholarDigital Library
- Ralph C. Merkle. 1988. A digital signature based on a conventional encryption function. In Proc. of CRYPTO’87. Springer, 369--378. Google ScholarDigital Library
- Microsoft. 2013. Microsoft Security Advisory 2798897: Fraudulent Digital Certificates Could Allow Spoofing. Retrieved from https://technet.microsoft.com/library/security/2798897.Google Scholar
- Microsoft. 2014a. Manage Trusted Root Certificates. Retrieved from https://technet.microsoft.com/en-us/library/cc754841.aspx.Google Scholar
- Microsoft. 2014b. Microsoft Security Advisory 2982792: Improperly Issued Digital Certificates Could Allow Spoofing. https://technet.microsoft.com/en-us/library/security/2982792.aspx. (Jul 2014).Google Scholar
- Mishari Al Mishari, Emiliano De Cristofaro, Karim El Defrawy, and Gene Tsudik. 2009. Harvesting SSL certificate data to identify web-fraud. arXiv preprint arXiv:0909.3688 (Sep 2009).Google Scholar
- Tyler Moore and Richard Clayton. 2007. Examining the impact of website take-down on phishing. In Proc. of APWG eCrime’07. APWG, 1--13. Google ScholarDigital Library
- Mozilla. 2015. CA:AddRootToFirefox: Installing Certificates into Firefox. Retrieved from https://wiki. mozilla.org/CA:AddRootToFirefox.Google Scholar
- Angelo P. E. Rosiello, E. Kirda, C. Kruegel, and F. Ferrandi. 2007. A layout-similarity-based approach for detecting phishing pages. In Proc. of SecureComm’07. Springer, 454--463.Google Scholar
- A. H. Schistad Solberg and R. Solberg. 1996. A large-scale evaluation of features for automatic detection of oil spills in ERS SAR images. In Proc. of IGARSS’96, Vol. 3. 1484--1486.Google Scholar
- Robin Sommer and Vern Paxson. 2010. Outside the closed world: On using machine learning for network intrusion detection. In Proc. of SP’10. IEEE, 305--316. Google ScholarDigital Library
- Andreas P. Streich, Mario Frank, David Basin, and Joachim M. Buhmann. 2009. Multi-assignment clustering for Boolean data. In Proc. of ICML’09. ACM, 969--976. Google ScholarDigital Library
- James Theiler and D. Michael Cai. 2003. Resampling approach for anomaly detection in multispectral images. In Proc. of SPIE Aerosense’03. International Society for Optics and Photonics, 230--240.Google Scholar
- Adam Toon. 2012. Models as Make-Believe: Imagination, Fiction and Scientific Representation. Palgrave Macmillan.Google Scholar
- Tor. 2011. The DigiNotar Debacle, and What You Should Do About It. Retrieved from https://blog.torproject.org/blog/diginotar-debacle-and-what-you-should-do-about-it.Google Scholar
- Gang Wang, Tianyi Wang, Haitao Zheng, and Ben Y. Zhao. 2014. Man vs. machine: Practical adversarial detection of malicious crowdsourcing workers. In Proc. of USENIX Security’14. USENIX, 239--254. Google ScholarDigital Library
- Michael Weisberg. 2013. Simulation and Similarity: Using Models to Understand the World. Oxford University Press.Google Scholar
- Dan Wendlandt, David G. Andersen, and Adrian Perrig. 2008. Perspectives: Improving SSH-style host authentication with multi-path probing. In Proc. of USENIX’08, Vol. 200. USENIX, 321--334. Google ScholarDigital Library
- Colin Whittaker, Brian Ryner, and Marria Nazif. 2010. Large-scale automatic classification of phishing pages. In Proc. of NDSS’10. ISOC.Google Scholar
- Guang Xiang, Jason Hong, Carolyn P. Rose, and Lorrie Cranor. 2011. CANTINA+: A feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14, 2 (Sep 2011), 21:1--21:28. Google ScholarDigital Library
Index Terms
- Detection of Rogue Certificates from Trusted Certificate Authorities Using Deep Neural Networks
Recommendations
Security Analysis on Practices of Certificate Authorities in the HTTPS Phishing Ecosystem
ASIA CCS '21: Proceedings of the 2021 ACM Asia Conference on Computer and Communications SecurityPhishing attacks are causing substantial damage albeit extensive effort in academia and industry. Recently, a large volume of phishing attacks transit toward adopting HTTPS, leveraging TLS certificates issued from Certificate Authorities (CAs), to make ...
Revocation Speedrun: How the WebPKI Copes with Fraudulent Certificates
PACMNETThe TLS ecosystem depends on certificates to bootstrap secure connections. Certificate Authorities (CAs) are trusted to issue these correctly. However, as a result of security breaches or attacks, certificates may be issued fraudulently and need to be ...
X.509 Certificate Error Testing
ARES '18: Proceedings of the 13th International Conference on Availability, Reliability and SecurityX.509 Certificates are used by a wide range of technologies to verify identities, while the SSL protocol is used to provide a secure encrypted tunnel through which data can be sent over a public network. Combined both of these technologies provides the ...
Comments