ABSTRACT
Programmable switches potentially make it easier to perform flexible network monitoring queries at line rate, and scalable stream processors make it possible to fuse data streams to answer more sophisticated queries about the network in real-time. However, processing such network monitoring queries at high traffic rates requires both the switches and the stream processors to filter the traffic iteratively and adaptively so as to extract only that traffic that is of interest to the query at hand. While the realization that network monitoring is a streaming analytics problem has been made earlier, our main contribution in this paper is the design and implementation of Sonata, a closed-loop system that enables network operators to perform streaming analytics for network monitoring applications at scale. To achieve this objective, Sonata allows operators to express a network monitoring query by considering each packet as a tuple. More importantly, Sonata allows them to partition the query across both the switches and the stream processor, and through iterative refinement, Sonata's runtime attempts to extract only the traffic that pertains to the query, thus ensuring that the stream processor can scale to satisfy a large number of queries for traffic at very high rates. We show with a simple example query involving DNS reflection attacks and traffic traces from one of the world's largest IXPs that Sonata can capture 95% of all traffic pertaining to the query, while reducing the overall data rate by a factor of about 400 and the number of required counters by four orders of magnitude.
- 1.D. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, C. Erwin, E. Galvez, M. Hatoun, A. Maskey, A. Rasin, et al. Aurora: A Data Stream Management System. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 666–666. ACM, 2003. Google ScholarDigital Library
- 2.Apache Spark. http://spark.apache.org/.Google Scholar
- 3.M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, et al. Spark SQL: Relational Data Processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383–1394. ACM, 2015. Google ScholarDigital Library
- 4.D. Black, K. McCloghrie, and J. Schoenwaelder. Uniform Resource Identifier (URI) Scheme for the Simple Network Management Protocol (SNMP). RFC 4088 (Proposed Standard), June 2005.Google Scholar
- 5.K. Borders, J. Springer, and M. Burnside. Chimera: A Declarative Language for Streaming Network Traffic Analysis. In Proceedings of the 21st USENIX Conference on Security Symposium, pages 365–379. USENIX, 2012. Google ScholarDigital Library
- 6.P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker. P4: Programming Protocol-independent Packet Processors. ACM SIGCOMM Computer Communication Review, 44(3):87–95, 2014. Google ScholarDigital Library
- 7.P. Bright. Spamhaus DDoS grows to Internet-threatening Size. ArsTechnica, March 2013.Google Scholar
- 8.T. Calders, N. Dexters, J. J. Gillis, and B. Goethals. Mining frequent itemsets in a stream. Information Systems, 39:233–255, 2014. Google ScholarDigital Library
- 9.D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring Streams: A New Class of Data Management Applications. In VLDB, pages 215–226, 2002. Google ScholarDigital Library
- 10.B. Claise. Cisco Systems NetFlow Services Export Version 9. RFC 3954 (Informational), October 2004.Google Scholar
- 11.B. Claise. Specification of the ip flow information export (ipfix) protocol for the exchange of ip traffic flow information. Technical report, 2008.Google Scholar
- 12.G. Cormode and M. Garofalakis. Approximate Continuous Querying Over Distributed Streams. ACM Transactions on Database Systems (TODS), 33(2):9, June 2008. Google ScholarDigital Library
- 13.C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: A Stream Database for Network Applications. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, pages 647–651. ACM, 2003. Google ScholarDigital Library
- 14.Five Year Traffic Growth at DE-CIX. https://www.de-cix.net/about/statistics/.Google Scholar
- 15.Deepfield Defender. http://deepfield.com/products/deepfield-defender/.Google Scholar
- 16.Google Cloud DataFlow. https://cloud.google.com/dataflow/.Google Scholar
- 17.D. Huang, Y. S. Koh, and G. Dobbie. Rare Pattern Mining on Data Streams. In International Conference on Data Warehousing and Knowledge Discovery, pages 303–314. Springer, 2012. Google ScholarDigital Library
- 18.L. Jose, M. Yu, and J. Rexford. Online measurement of large traffic aggregates on commodity switches. In Proceedings of Hot-ICE'11. USENIX, 2011. Google ScholarDigital Library
- 19.J. Jung, V. Paxson, A. W. Berger, and H. Balakrishnan. Fast Portscan Detection using Sequential Hypothesis Testing. In IEEE Symposium on Security and Privacy, pages 211–225, 2004.Google ScholarCross Ref
- 20.Z. Liu, G. Vorsanger, V. Braverman, and V. Sekar. Enabling a RISC Approach for Software-Defined Monitoring using Universal Streaming. In Proceedings of the 14th ACM Workshop on Hot Topics in Networks, page 21. ACM, 2015. Google ScholarDigital Library
- 21.N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. OpenFlow: Enabling innovation in campus networks. ACM SIGCOMM Computer Communication Review, 2008. Google ScholarDigital Library
- 22.R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosenstein, and R. Varma. Query Processing, Resource Management, and Approximation in a Data Stream Management System. In Conference on Innovative Data Systems Research (CIDR), January 2003.Google Scholar
- 23.S. Narayana, A. Sivaraman, V. Nathan, M. Alizadeh, D. Walker, J. Rexford, V. Jeyakumar, and C. Kim. Co-designing software and hardware for declarative network performance management. In HotNets, 2016. To appear.Google ScholarDigital Library
- 24.OpenSOC. http://opensoc.github.io/.Google Scholar
- 25.OpenSOC Scalability. https://goo.gl/CX2jWr.Google Scholar
- 26.P. Phaal, S. Panchen, and N. McKee. InMon corporation's sFlow. RFC3176 (September 2001), 2001.Google Scholar
- 27.O. Polychroniou, R. Sen, and K. A. Ross. Track Join: Distributed Joins with Minimal Network Traffic. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pages 1483–1494. ACM, 2014. Google ScholarDigital Library
- 28.Ryu SDN Framework. http://osrg.github.io/ryu/.Google Scholar
- 29.Apache Storm. http://storm.apache.org/.Google Scholar
- 30.M. Sullivan. Tribeca: A Stream Database Manager for Network Traffic Analysis. In VLDB, volume 96, page 594, 1996. Google ScholarDigital Library
- 31.S. Sun, Z. Huang, H. Zhong, D. Dai, H. Liu, and J. Li. Efficient Monitoring of Skyline Queries over Distributed Data Streams. Knowledge and information systems, 25(3):575–606, 2010. Google ScholarDigital Library
- 32.Tigon. http://tigon.io/.Google Scholar
- 33.UDP-Based Distributed Reflective Denial of Service Attacks. https://www.us-cert.gov/ncas/alerts/TA14-017A.Google Scholar
- 34.R. Viswanathan, G. Ananthanarayanan, and A. Akella. Clarinet: WAN-Aware Optimization for Analytics Queries. In OSDI, 2016. To appear.Google ScholarDigital Library
- 35.M. Yu, L. Jose, and R. Miao. Software Defined Traffic Measurement with OpenSketch. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 29–42, 2013. Google ScholarDigital Library
- 36.L. Yuan, C.-N. Chuah, and P. Mohapatra. Progme: Towards programmable network measurement. SIGCOMM Comput. Commun. Rev., 37(4):97–108, August 2007. Google ScholarDigital Library
- Network Monitoring as a Streaming Analytics Problem
Recommendations
PSoup: a system for streaming queries over streaming data
Abstract.Recent work on querying data streams has focused on systems where newly arriving data is processed and continuously streamed to the user in real time. In many emerging applications, however, ad hoc queries and/or intermittent connectivity also ...
Streaming analytics
SummerSchool '17: 1st Europe Summer School: Data ScienceEffective Big Data analytics need to rely on algorithms for querying and analyzing massive, continuous data streams (that is, data that is seen only once and in a fixed order) with limited memory and CPU-time resources. Such streams arise naturally in ...
Multi-Query Optimization in Wide-Area Streaming Analytics
SoCC '18: Proceedings of the ACM Symposium on Cloud ComputingWide-area data analytics has gained much attention in recent years due to the increasing need for analyzing data that are geographically distributed. Many of such queries often require real-time analysis on data streams that are continuously being ...
Comments