research-article

Continuous analytics over discontinuous streams

Authors:
Sailesh Krishnamurthy

Truviso, Inc., Foster City, CA, USA

Truviso, Inc., Foster City, CA, USA
View Profile

,
Michael J. Franklin

Truviso, Inc., Foster City, CA, USA

Truviso, Inc., Foster City, CA, USA
View Profile

,
Jeffrey Davis

Truviso, Inc., Foster City, CA, USA

Truviso, Inc., Foster City, CA, USA
View Profile

,
Daniel Farina

Truviso, Inc., Foster City, CA, USA

Truviso, Inc., Foster City, CA, USA
View Profile

,
Pasha Golovko

Truviso, Inc., Foster City, CA, USA

Truviso, Inc., Foster City, CA, USA
View Profile

,
Alan Li

Truviso, Inc., Foster City, CA, USA

Truviso, Inc., Foster City, CA, USA
View Profile

,
Neil Thombre

Truviso, Inc., Foster City, CA, USA

Truviso, Inc., Foster City, CA, USA
View Profile

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataJune 2010Pages 1081–1092https://doi.org/10.1145/1807167.1807290

Published:06 June 2010Publication History

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Pages 1081–1092

ABSTRACT

Continuous analytics systems that enable query processing over steams of data have emerged as key solutions for dealing with massive data volumes and demands for low latency. These systems have been heavily influenced by an assumption that data streams can be viewed as sequences of data that arrived more or less in order. The reality, however, is that streams are not often so well behaved and disruptions of various sorts are endemic. We argue, therefore, that stream processing needs a fundamental rethink and advocate a unified approach toward continuous analytics over discontinuous streaming data. Our approach is based on a simple insight - using techniques inspired by data parallel query processing, queries can be performed over independent sub-streams with arbitrary time ranges in parallel, generating partial results. The consolidation of the partial results over each sub-stream can then be deferred to the time at which the results are actually used on an on-demand basis. In this paper, we describe how the Truviso Continuous Analytics system implements this type of order-independent processing. Not only does the approach provide the first real solution to the problem of processing streaming data that arrives arbitrarily late, it also serves as a critical building block for solutions to a host of hard problems such as parallelism, recovery, transactional consistency, high availability, failover, and replication.

References

Abadi, D., Carney, D., and Cetintemel, U., et al. Aurora: A Data Stream Management System. In Proc. CIDR 2003.Google Scholar
Abadi, D., Ahmad, Y., Balazinska, B., et al. The Design of the Borealis Stream Processing Engine. In CIDR 2005.Google Scholar
Arasu, A., Babcock, B., Babu, S., et al. STREAM: The Stanford Data Stream Management System. In Data Stream Management: Processing High-Speed Data Streams, Springer, 2009.Google Scholar
Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M. Fault-tolerance in the Borealis Distributed Stream Processing System. In SIGMOD 2005. Google ScholarDigital Library
Bancilhon, F., et al. FAD, a powerful and simple database language. In VLDB 1987. Google ScholarDigital Library
Chandrasekaran, S., Cooper, O., Deshpande, A., et al. TelegraphCQ: Continuous Dataflow Processing for an Uncertan World. In Proc. CIDR 2003.Google Scholar
Conway, N. Transactions and Data Stream Processing. http://neilconway.org/docs/thesis.pdf. April 2008.Google Scholar
Dean, J., Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004. Google ScholarDigital Library
DeWitt, D., Gray, J. Parallel Database Systems: The Future of High Performance Database Systems. CACM 35(6) 1992. Google ScholarDigital Library
Franklin, M., Krishnamurthy, S., et al. Continuous Analytics: Rethinking Query Processing in a Network-Effect World. In CIDR 2009.Google Scholar
Gray, J., Reuter, A. Transaction Processing: Concepts and Techniques. Morgan Kaufmann 1993, ISBN 1-55860-190-2. Google ScholarDigital Library
Gray, J., et al. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub-Total. In ICDE 1996. Google ScholarDigital Library
Li, J., Tufte, K., Shkapenyuk, V., Papdimos, V., Johnson, T., Maier, D. Out-of-Order Processing: A New Architecture for High-Performance Stream Systems. In Proc. VLDB Endowment (2008), 274--288. Google ScholarDigital Library
Mohan, C., Haderle, D., et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM TODS 17(1): 94--162 (1992). Google ScholarDigital Library
Motwani, R., Widom, J., Arasu, A., et al. Query Processing, Resource Management, and Approximation in a Data Stream Management System. In Proc. CIDR 2003.Google Scholar
Shah, M., Hellerstein, J., Brewer, E. Fault-Tolerant Parallel Dataflows. In SIGMOD 2004. Google ScholarDigital Library
Srivastava, U., Widom, J. Flexible Time Management in Data Stream Systems. In PODS 2004. Google ScholarDigital Library
Tucker, P., Maier, D. Exploiting Punctuation Semantics in Data Streams. In ICDE 2002.Google Scholar
Omniture Web Services API. https://sc.omniture.com/p/l10n/1.0/en_US/docs/WebServices_API_14_Implementation_Manual.pdfGoogle Scholar

Index Terms

Continuous analytics over discontinuous streams
1. Information systems
  1. Data management systems
    1. Database management system engines
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Characterizing memory requirements for queries over continuous data streams

This article deals with continuous conjunctive queries with arithmetic comparisons and optional aggregation over multiple data streams. An algorithm is presented for determining whether or not any given query can be evaluated using a bounded amount of ...
Read More
Exploiting Punctuation Semantics in Continuous Data Streams

As most current query processing architectures are already pipelined, it seems logical to apply them to data streams. However, two classes of query operators are impractical for processing long or infinite data streams. Unbounded stateful operators ...
Read More
Sketching distributed sliding-window data streams

While traditional data management systems focus on evaluating single, ad hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is widely ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
June 2010
1286 pages
ISBN:9781450300322
DOI:10.1145/1807167
General Chair:
Ahmed Elmagarmid
Purdue University, USA
,
Program Chair:
Divyakant Agrawal
University of California at Santa Barbara, USA
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
continuous queries
order
out-of-order
streams
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 62
  Total Citations
  View Citations
- 719
  Total Downloads
- Downloads (Last 12 months)18
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Continuous analytics over discontinuous streams

SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Characterizing memory requirements for queries over continuous data streams

Exploiting Punctuation Semantics in Continuous Data Streams

Sketching distributed sliding-window data streams