skip to main content
research-article

Trill: a high-performance incremental query processor for diverse analytics

Published:01 December 2014Publication History
Skip Abstract Section

Abstract

This paper introduces Trill -- a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration: Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trill's throughput is high across the latency spectrum. For streaming data, Trill's throughput is 2-4 orders of magnitude higher than comparable streaming engines. For offline relational queries, Trill's throughput is comparable to a major modern commercial columnar DBMS.

Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this paper, we describe Trill's new design and architecture, and report experimental results that demonstrate Trill's high performance across diverse analytics scenarios. We also describe how Trill's ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.

References

  1. Badrish Chandramouli, Jonathan Goldstein, Songyun Duan. Temporal Analytics on Big Data for Web advertising. In ICDE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Badrish Chandramouli, Jonathan Goldstein, Abdul Quamar. Scalable Progressive Analytics on Big Data in the Cloud. In VLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Barga et al. Consistent Streaming Through Time: A Vision for Event Stream Processing. In CIDR, 2007.Google ScholarGoogle Scholar
  4. Reactive Extensions for .NET. http://aka.ms/rx.Google ScholarGoogle Scholar
  5. M. Barnett et al. Stat! - An Interactive Analytics Environment for Big Data. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Larson et al. Enhancements to SQL Server Column Stores. In VLDB, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Talbot et al. Phoenix++: Modular MapReduce for Shared-Memory Systems. In Intl. Workshop on MapReduce and its Applications, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Microsoft StreamInsight. http://aka.ms/stream.Google ScholarGoogle Scholar
  9. D. Maier, J. Li, P. Tucker, K. Tufte, V. Papadimos: Semantics of Data Streams and Operators. ICDT 2005: 37--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Abadi et al. The design of the Borealis stream processing engine. In CIDR, 2005.Google ScholarGoogle Scholar
  11. M. Hammad et al.: Nile: A Query Processing Engine for Data Streams. ICDE 2004: 851. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Actian Vectorwise DBMS. http://www.actian.com/.Google ScholarGoogle Scholar
  13. H. Lim et al. How to fit when no one size fits. In CIDR, 2013.Google ScholarGoogle Scholar
  14. Vertica. http://www.vertica.com/.Google ScholarGoogle Scholar
  15. B. Chandramouli et al. Accurate Latency Estimation in a Distributed Event Processing System. In ICDE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. Bernstein et al. Orleans: Distributed Virtual Actors for Programmability and Scalability. MSR Technical Report (MSR-TR-2014-41, 24). http://aka.ms/Ykyqft.Google ScholarGoogle Scholar
  17. Apache Hadoop 2.3.0 (YARN). http://aka.ms/Quslzk.Google ScholarGoogle Scholar
  18. B. Chun et al. REEF: Retainable Evaluator Execution Framework. PVLDB 6(12): 1370--1373 (2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Microsoft Avro Library. http://aka.ms/Nxbdwg.Google ScholarGoogle Scholar
  20. R. Chaiken et al. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Expression Trees. http://aka.ms/K0fzli.Google ScholarGoogle Scholar
  22. D. Knuth, J. Morris, and V. Pratt. Fast pattern matching in strings. SIAM Journal on Computing (1977).Google ScholarGoogle Scholar
  23. B. Chandramouli et al. The Trill Incremental Analytics Engine. Microsoft Research Technical Report MSR-TR-2014-54, April 2014. http://aka.ms/trill-tr.Google ScholarGoogle Scholar
  24. M. Stonebraker et al. C-Store -- A Column-Oriented DBMS. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. A. Boncz, M. Zukowski, and N. Nes, MonetDB/X100: Hyper-pipelining query execution. CIDR, 2005, 225--237.Google ScholarGoogle Scholar
  26. M.-C. Albutiu et al. Massively parallel sort-merge joins in main memory multi-core database systems. In VLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Pavlo et al. A comparison of approaches to large-scale data analysis. In SIGMOD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. Engle et al. Shark: Fast Data Analysis Using Coarse-grained Distributed Memory. In SIGMOD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Apache Storm. http://storm.incubator.apache.org/.Google ScholarGoogle Scholar
  30. B. Babcock et al. Models and issues in data stream systems. In PODS 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Jensen et al. Temporal Specialization. In ICDE, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. The LINQ Project. http://aka.ms/rjhi00.Google ScholarGoogle Scholar
  33. BlinkDB. http://blinkdb.org/.Google ScholarGoogle Scholar
  34. M. Zaharia et al. Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In SOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. D. Murray et al. Naiad: A Timely Dataflow System. In SOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. SQL Server CLR integration. http://aka.ms/Bbtg44.Google ScholarGoogle Scholar
  37. E. Liarou et al. Enhanced Stream Processing in a DBMS Kernel. In EDBT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Akidau et al. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. In VLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. U. Cetintemel et al. S-Store: A Streaming NewSQL System for Big Velocity Applications. In VLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Trill: a high-performance incremental query processor for diverse analytics
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 8, Issue 4
        December 2014
        132 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 December 2014
        Published in pvldb Volume 8, Issue 4

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader