Abstract
This paper introduces Trill -- a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration: Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trill's throughput is high across the latency spectrum. For streaming data, Trill's throughput is 2-4 orders of magnitude higher than comparable streaming engines. For offline relational queries, Trill's throughput is comparable to a major modern commercial columnar DBMS.
Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this paper, we describe Trill's new design and architecture, and report experimental results that demonstrate Trill's high performance across diverse analytics scenarios. We also describe how Trill's ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.
- Badrish Chandramouli, Jonathan Goldstein, Songyun Duan. Temporal Analytics on Big Data for Web advertising. In ICDE, 2012. Google ScholarDigital Library
- Badrish Chandramouli, Jonathan Goldstein, Abdul Quamar. Scalable Progressive Analytics on Big Data in the Cloud. In VLDB, 2014. Google ScholarDigital Library
- R. Barga et al. Consistent Streaming Through Time: A Vision for Event Stream Processing. In CIDR, 2007.Google Scholar
- Reactive Extensions for .NET. http://aka.ms/rx.Google Scholar
- M. Barnett et al. Stat! - An Interactive Analytics Environment for Big Data. In SIGMOD, 2013. Google ScholarDigital Library
- P. Larson et al. Enhancements to SQL Server Column Stores. In VLDB, 2013.Google ScholarDigital Library
- J. Talbot et al. Phoenix++: Modular MapReduce for Shared-Memory Systems. In Intl. Workshop on MapReduce and its Applications, 2011. Google ScholarDigital Library
- Microsoft StreamInsight. http://aka.ms/stream.Google Scholar
- D. Maier, J. Li, P. Tucker, K. Tufte, V. Papadimos: Semantics of Data Streams and Operators. ICDT 2005: 37--52. Google ScholarDigital Library
- D. Abadi et al. The design of the Borealis stream processing engine. In CIDR, 2005.Google Scholar
- M. Hammad et al.: Nile: A Query Processing Engine for Data Streams. ICDE 2004: 851. Google ScholarDigital Library
- Actian Vectorwise DBMS. http://www.actian.com/.Google Scholar
- H. Lim et al. How to fit when no one size fits. In CIDR, 2013.Google Scholar
- Vertica. http://www.vertica.com/.Google Scholar
- B. Chandramouli et al. Accurate Latency Estimation in a Distributed Event Processing System. In ICDE, 2011. Google ScholarDigital Library
- P. Bernstein et al. Orleans: Distributed Virtual Actors for Programmability and Scalability. MSR Technical Report (MSR-TR-2014-41, 24). http://aka.ms/Ykyqft.Google Scholar
- Apache Hadoop 2.3.0 (YARN). http://aka.ms/Quslzk.Google Scholar
- B. Chun et al. REEF: Retainable Evaluator Execution Framework. PVLDB 6(12): 1370--1373 (2013). Google ScholarDigital Library
- Microsoft Avro Library. http://aka.ms/Nxbdwg.Google Scholar
- R. Chaiken et al. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2), 2008. Google ScholarDigital Library
- Expression Trees. http://aka.ms/K0fzli.Google Scholar
- D. Knuth, J. Morris, and V. Pratt. Fast pattern matching in strings. SIAM Journal on Computing (1977).Google Scholar
- B. Chandramouli et al. The Trill Incremental Analytics Engine. Microsoft Research Technical Report MSR-TR-2014-54, April 2014. http://aka.ms/trill-tr.Google Scholar
- M. Stonebraker et al. C-Store -- A Column-Oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
- P. A. Boncz, M. Zukowski, and N. Nes, MonetDB/X100: Hyper-pipelining query execution. CIDR, 2005, 225--237.Google Scholar
- M.-C. Albutiu et al. Massively parallel sort-merge joins in main memory multi-core database systems. In VLDB, 2012. Google ScholarDigital Library
- A. Pavlo et al. A comparison of approaches to large-scale data analysis. In SIGMOD, 2009. Google ScholarDigital Library
- C. Engle et al. Shark: Fast Data Analysis Using Coarse-grained Distributed Memory. In SIGMOD, 2012. Google ScholarDigital Library
- Apache Storm. http://storm.incubator.apache.org/.Google Scholar
- B. Babcock et al. Models and issues in data stream systems. In PODS 2002. Google ScholarDigital Library
- C. Jensen et al. Temporal Specialization. In ICDE, 1992. Google ScholarDigital Library
- The LINQ Project. http://aka.ms/rjhi00.Google Scholar
- BlinkDB. http://blinkdb.org/.Google Scholar
- M. Zaharia et al. Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In SOSP, 2013. Google ScholarDigital Library
- D. Murray et al. Naiad: A Timely Dataflow System. In SOSP, 2013. Google ScholarDigital Library
- SQL Server CLR integration. http://aka.ms/Bbtg44.Google Scholar
- E. Liarou et al. Enhanced Stream Processing in a DBMS Kernel. In EDBT, 2013. Google ScholarDigital Library
- T. Akidau et al. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. In VLDB, 2013. Google ScholarDigital Library
- U. Cetintemel et al. S-Store: A Streaming NewSQL System for Big Velocity Applications. In VLDB, 2014. Google ScholarDigital Library
Index Terms
- Trill: a high-performance incremental query processor for diverse analytics
Recommendations
Design and Analysis of a TRILL - OpenFlow Bridge
2016 IEEE Global Communications Conference (GLOBECOM)Cloud computing is expected to keep growing and therefore more and more data centers are being built. In parallel of this increasing amount of data centers, there are also more cloud providers. On one side, this heterogeneity in the cloud market allows a ...
Comments