research-article

Trill: a high-performance incremental query processor for diverse analytics

Authors:
Badrish Chandramouli

Microsoft Research

Microsoft Research
View Profile

,
Jonathan Goldstein

Microsoft Research

Microsoft Research
View Profile

,
Mike Barnett

Microsoft Research

Microsoft Research
View Profile

,
Robert DeLine

Microsoft Research

Microsoft Research
View Profile

,
Danyel Fisher

Microsoft Research

Microsoft Research
View Profile

,
John C. Platt

Microsoft Research

Microsoft Research
View Profile

,
James F. Terwilliger

Microsoft Research

Microsoft Research
View Profile

,
John Wernsing

Microsoft Research

Microsoft Research
View Profile

Proceedings of the VLDB Endowment Volume 8 Issue 4pp 401–412https://doi.org/10.14778/2735496.2735503

Published:01 December 2014Publication History

Proceedings of the VLDB Endowment

Abstract

This paper introduces Trill -- a new query processor for analytics. Trill fulfills a combination of three requirements for a query processor to serve the diverse big data analytics space: (1) Query Model: Trill is based on a tempo-relational model that enables it to handle streaming and relational queries with early results, across the latency spectrum from real-time to offline; (2) Fabric and Language Integration: Trill is architected as a high-level language library that supports rich data-types and user libraries, and integrates well with existing distribution fabrics and applications; and (3) Performance: Trill's throughput is high across the latency spectrum. For streaming data, Trill's throughput is 2-4 orders of magnitude higher than comparable streaming engines. For offline relational queries, Trill's throughput is comparable to a major modern commercial columnar DBMS.

Trill uses a streaming batched-columnar data representation with a new dynamic compilation-based system architecture that addresses all these requirements. In this paper, we describe Trill's new design and architecture, and report experimental results that demonstrate Trill's high performance across diverse analytics scenarios. We also describe how Trill's ability to support diverse analytics has resulted in its adoption across many usage scenarios at Microsoft.

References

Badrish Chandramouli, Jonathan Goldstein, Songyun Duan. Temporal Analytics on Big Data for Web advertising. In ICDE, 2012. Google ScholarDigital Library
Badrish Chandramouli, Jonathan Goldstein, Abdul Quamar. Scalable Progressive Analytics on Big Data in the Cloud. In VLDB, 2014. Google ScholarDigital Library
R. Barga et al. Consistent Streaming Through Time: A Vision for Event Stream Processing. In CIDR, 2007.Google Scholar
Reactive Extensions for .NET. http://aka.ms/rx.Google Scholar
M. Barnett et al. Stat! - An Interactive Analytics Environment for Big Data. In SIGMOD, 2013. Google ScholarDigital Library
P. Larson et al. Enhancements to SQL Server Column Stores. In VLDB, 2013.Google ScholarDigital Library
J. Talbot et al. Phoenix++: Modular MapReduce for Shared-Memory Systems. In Intl. Workshop on MapReduce and its Applications, 2011. Google ScholarDigital Library
Microsoft StreamInsight. http://aka.ms/stream.Google Scholar
D. Maier, J. Li, P. Tucker, K. Tufte, V. Papadimos: Semantics of Data Streams and Operators. ICDT 2005: 37--52. Google ScholarDigital Library
D. Abadi et al. The design of the Borealis stream processing engine. In CIDR, 2005.Google Scholar
M. Hammad et al.: Nile: A Query Processing Engine for Data Streams. ICDE 2004: 851. Google ScholarDigital Library
Actian Vectorwise DBMS. http://www.actian.com/.Google Scholar
H. Lim et al. How to fit when no one size fits. In CIDR, 2013.Google Scholar
Vertica. http://www.vertica.com/.Google Scholar
B. Chandramouli et al. Accurate Latency Estimation in a Distributed Event Processing System. In ICDE, 2011. Google ScholarDigital Library
P. Bernstein et al. Orleans: Distributed Virtual Actors for Programmability and Scalability. MSR Technical Report (MSR-TR-2014-41, 24). http://aka.ms/Ykyqft.Google Scholar
Apache Hadoop 2.3.0 (YARN). http://aka.ms/Quslzk.Google Scholar
B. Chun et al. REEF: Retainable Evaluator Execution Framework. PVLDB 6(12): 1370--1373 (2013). Google ScholarDigital Library
Microsoft Avro Library. http://aka.ms/Nxbdwg.Google Scholar
R. Chaiken et al. SCOPE: easy and efficient parallel processing of massive data sets. PVLDB, 1(2), 2008. Google ScholarDigital Library
Expression Trees. http://aka.ms/K0fzli.Google Scholar
D. Knuth, J. Morris, and V. Pratt. Fast pattern matching in strings. SIAM Journal on Computing (1977).Google Scholar
B. Chandramouli et al. The Trill Incremental Analytics Engine. Microsoft Research Technical Report MSR-TR-2014-54, April 2014. http://aka.ms/trill-tr.Google Scholar
M. Stonebraker et al. C-Store -- A Column-Oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
P. A. Boncz, M. Zukowski, and N. Nes, MonetDB/X100: Hyper-pipelining query execution. CIDR, 2005, 225--237.Google Scholar
M.-C. Albutiu et al. Massively parallel sort-merge joins in main memory multi-core database systems. In VLDB, 2012. Google ScholarDigital Library
A. Pavlo et al. A comparison of approaches to large-scale data analysis. In SIGMOD, 2009. Google ScholarDigital Library
C. Engle et al. Shark: Fast Data Analysis Using Coarse-grained Distributed Memory. In SIGMOD, 2012. Google ScholarDigital Library
Apache Storm. http://storm.incubator.apache.org/.Google Scholar
B. Babcock et al. Models and issues in data stream systems. In PODS 2002. Google ScholarDigital Library
C. Jensen et al. Temporal Specialization. In ICDE, 1992. Google ScholarDigital Library
The LINQ Project. http://aka.ms/rjhi00.Google Scholar
BlinkDB. http://blinkdb.org/.Google Scholar
M. Zaharia et al. Discretized Streams: Fault-Tolerant Streaming Computation at Scale. In SOSP, 2013. Google ScholarDigital Library
D. Murray et al. Naiad: A Timely Dataflow System. In SOSP, 2013. Google ScholarDigital Library
SQL Server CLR integration. http://aka.ms/Bbtg44.Google Scholar
E. Liarou et al. Enhanced Stream Processing in a DBMS Kernel. In EDBT, 2013. Google ScholarDigital Library
T. Akidau et al. MillWheel: Fault-Tolerant Stream Processing at Internet Scale. In VLDB, 2013. Google ScholarDigital Library
U. Cetintemel et al. S-Store: A Streaming NewSQL System for Big Velocity Applications. In VLDB, 2014. Google ScholarDigital Library

Index Terms

Trill: a high-performance incremental query processor for diverse analytics
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Index terms have been assigned to the content through auto-classification.

Recommendations

Using TRILL, FabricPath, and VXLAN: Designing Massively Scalable Data Centers (MSDC) with Overlays
Read More
Design and Analysis of a TRILL - OpenFlow Bridge
2016 IEEE Global Communications Conference (GLOBECOM)
Cloud computing is expected to keep growing and therefore more and more data centers are being built. In parallel of this increasing amount of data centers, there are also more cloud providers. On one side, this heterogeneity in the cloud market allows a ...
Read More
RFC 7172: Transparent Interconnection of Lots of Links (TRILL): Fine-Grained Labeling
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 8, Issue 4
December 2014
132 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 December 2014
Published in pvldb Volume 8, Issue 4
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 91
  Total Citations
  View Citations
- 589
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Trill: a high-performance incremental query processor for diverse analytics

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Using TRILL, FabricPath, and VXLAN: Designing Massively Scalable Data Centers (MSDC) with Overlays

Design and Analysis of a TRILL - OpenFlow Bridge

RFC 7172: Transparent Interconnection of Lots of Links (TRILL): Fine-Grained Labeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Trill: a high-performance incremental query processor for diverse analytics

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

Using TRILL, FabricPath, and VXLAN: Designing Massively Scalable Data Centers (MSDC) with Overlays

Design and Analysis of a TRILL - OpenFlow Bridge

RFC 7172: Transparent Interconnection of Lots of Links (TRILL): Fine-Grained Labeling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media