skip to main content
10.1145/1807167.1807290acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Continuous analytics over discontinuous streams

Published:06 June 2010Publication History

ABSTRACT

Continuous analytics systems that enable query processing over steams of data have emerged as key solutions for dealing with massive data volumes and demands for low latency. These systems have been heavily influenced by an assumption that data streams can be viewed as sequences of data that arrived more or less in order. The reality, however, is that streams are not often so well behaved and disruptions of various sorts are endemic. We argue, therefore, that stream processing needs a fundamental rethink and advocate a unified approach toward continuous analytics over discontinuous streaming data. Our approach is based on a simple insight - using techniques inspired by data parallel query processing, queries can be performed over independent sub-streams with arbitrary time ranges in parallel, generating partial results. The consolidation of the partial results over each sub-stream can then be deferred to the time at which the results are actually used on an on-demand basis. In this paper, we describe how the Truviso Continuous Analytics system implements this type of order-independent processing. Not only does the approach provide the first real solution to the problem of processing streaming data that arrives arbitrarily late, it also serves as a critical building block for solutions to a host of hard problems such as parallelism, recovery, transactional consistency, high availability, failover, and replication.

References

  1. Abadi, D., Carney, D., and Cetintemel, U., et al. Aurora: A Data Stream Management System. In Proc. CIDR 2003.Google ScholarGoogle Scholar
  2. Abadi, D., Ahmad, Y., Balazinska, B., et al. The Design of the Borealis Stream Processing Engine. In CIDR 2005.Google ScholarGoogle Scholar
  3. Arasu, A., Babcock, B., Babu, S., et al. STREAM: The Stanford Data Stream Management System. In Data Stream Management: Processing High-Speed Data Streams, Springer, 2009.Google ScholarGoogle Scholar
  4. Balazinska, M., Balakrishnan, H., Madden, S., Stonebraker, M. Fault-tolerance in the Borealis Distributed Stream Processing System. In SIGMOD 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bancilhon, F., et al. FAD, a powerful and simple database language. In VLDB 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chandrasekaran, S., Cooper, O., Deshpande, A., et al. TelegraphCQ: Continuous Dataflow Processing for an Uncertan World. In Proc. CIDR 2003.Google ScholarGoogle Scholar
  7. Conway, N. Transactions and Data Stream Processing. http://neilconway.org/docs/thesis.pdf. April 2008.Google ScholarGoogle Scholar
  8. Dean, J., Ghemawat, S. MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. DeWitt, D., Gray, J. Parallel Database Systems: The Future of High Performance Database Systems. CACM 35(6) 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Franklin, M., Krishnamurthy, S., et al. Continuous Analytics: Rethinking Query Processing in a Network-Effect World. In CIDR 2009.Google ScholarGoogle Scholar
  11. Gray, J., Reuter, A. Transaction Processing: Concepts and Techniques. Morgan Kaufmann 1993, ISBN 1-55860-190-2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Gray, J., et al. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab and Sub-Total. In ICDE 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Li, J., Tufte, K., Shkapenyuk, V., Papdimos, V., Johnson, T., Maier, D. Out-of-Order Processing: A New Architecture for High-Performance Stream Systems. In Proc. VLDB Endowment (2008), 274--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Mohan, C., Haderle, D., et al. ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging. ACM TODS 17(1): 94--162 (1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Motwani, R., Widom, J., Arasu, A., et al. Query Processing, Resource Management, and Approximation in a Data Stream Management System. In Proc. CIDR 2003.Google ScholarGoogle Scholar
  16. Shah, M., Hellerstein, J., Brewer, E. Fault-Tolerant Parallel Dataflows. In SIGMOD 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Srivastava, U., Widom, J. Flexible Time Management in Data Stream Systems. In PODS 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tucker, P., Maier, D. Exploiting Punctuation Semantics in Data Streams. In ICDE 2002.Google ScholarGoogle Scholar
  19. Omniture Web Services API. https://sc.omniture.com/p/l10n/1.0/en_US/docs/WebServices_API_14_Implementation_Manual.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. Continuous analytics over discontinuous streams

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
            June 2010
            1286 pages
            ISBN:9781450300322
            DOI:10.1145/1807167

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 June 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader