skip to main content
10.1145/1376916.1376958acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
research-article

Shape sensitive geometric monitoring

Published:09 June 2008Publication History

ABSTRACT

A fundamental problem in distributed computation is the distributed evaluation of functions. The goal is to determine the value of a function over a set of distributed inputs, in a communication efficient manner. Specifically, we assume that each node holds a time varying input vector, and we are interested in determining, at any given time, whether the value of an arbitrary function on the average of these vectors crosses a predetermined threshold.

In this paper, we introduce a new method for monitoring distributed data, which we term shape sensitive geometric monitoring. It is based on a geometric interpretation of the problem, which enables to define local constraints on the data received at the nodes. It is guaranteed that as long as none of these constraints has been violated, the value of the function does not cross the threshold. We generalize previous work on geometric monitoring, and solve two problems which seriously hampered its performance: as opposed to the constraints used so far, which depend only on the current values of the local input vectors, here we incorporate their temporal behavior into the constraints. Also, the new constraints are tailored to the geometric properties of the specific function which is being monitored, while the previous constraints were generic.

Experimental results on real world data reveal that using the new geometric constraints reduces communication by up to three orders of magnitude in comparison to existing approaches, and considerably narrows the gap between existing results and a newly defined lower bound on the communication complexity.

References

  1. Shipra Agrawal, Supratim Deb, K. V. M. Naidu, and Rajeev Rastogi. Efficient detection of distributed constraint violations. In ICDE '07, pages 1320--1324.Google ScholarGoogle Scholar
  2. Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In STOC '96, pages 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arvind Arasu and Gurmeet Singh Manku. Approximate counts and quantiles over sliding windows. In PODS '04, pages 286--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In PODS '02, pages 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Brian Babcock and Chris Olston. Distributed top-k monitoring. In SIGMOD '03, pages 28--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Donald Carney, Ugur ¸Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Greg Seidman, Michael Stonebraker, Nesime Tatbul, and Stanley B. Zdonik. Monitoring streams - a new class of data management applications. In VLDB '02, pages 215--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Amit Chakrabarti, Graham Cormode, and Andrew McGregor. A near-optimal algorithm for computing the entropy of a stream. In SODA '07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP '02, pages 693--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Edith Cohen and Martin J. Strauss. Maintaining time-decaying stream aggregates. J. Algorithms, 59(1):19--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Cormode, R. Keralapura, and J. Ramimirtham. Communication-efficient distributed monitoring of thresholded counts. In SIGMOD '06. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Graham Cormode and Minos Garofalakis. Sketching streams through the net: distributed approximate query tracking. In VLDB '05, pages 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Graham Cormode, Minos Garofalakis, S. Muthukrishnan, and Rajeev Rastogi. Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In SIGMOD '05, pages 25--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Graham Cormode, S. Muthukrishnan, and Wei Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In ICDE '07, pages 1036--1045.Google ScholarGoogle Scholar
  14. Graham Cormode, S. Muthukrishnan, and Wei Zhuang. What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In ICDE '06, page 57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Abhinandan Das, Sumit Ganguly, Minos Garofalakis, and Rajeev Rastogi. Distributed set-expression cardinality estimation. In VLDB '04, pages 312--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows: (extended abstract). In SODA '02, pages 635--644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mark Dilman and Danny Raz. Efficient reactive monitoring. In INFOCOM '01, pages 1012--1019.Google ScholarGoogle Scholar
  18. Gereon Frahling, Piotr Indyk, and Christian Sohler. Sampling in dynamic data streams and applications. In SCG '05, pages 142--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ling Huang, Minos Garofalakis, Joseph Hellerstein, Anthony Joseph, and Nina Taft. Toward sophisticated detection with distributed triggers. In MineNet '06, pages 311--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ling Huang, Minos Garofalakis, Joseph Hellerstein, Anthony Joseph, and Nina Taft. Toward sophisticated detection with distributed triggers. In MineNet '06, pages 311--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ankur Jain, Joseph M. Hellerstein, Sylvia Ratnasamy, and David Wetherall. A wakeup call for internet monitoring systems: The case for distributed triggers. In Proc. 3rd ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets), 2004.Google ScholarGoogle Scholar
  22. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman. Continuously adaptive continuous queries over streams. In SIGMOD '02, pages 49--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, and Christopher Olston. Finding (recently) frequent items in distributed data streams. In ICDE '05, pages 767--778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB '02, pages 346--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB '02, pages 346--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P.A. Parrilo. Semidefinite programming relaxations for semialgebraic problems. Mathematical Programming, 96(2):293--320, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  29. T.G. Rose, M. Stevenson, and M. Whitehead. The Reuters Corpus Volume 1 - from Yesterday's News to Tomorrow's Language Resources. In LREC '02, pages 827--832.Google ScholarGoogle Scholar
  30. Izchak Sharfman, Assaf Schuster, and Daniel Keren. A geometric approach to monitoring threshold functions over distributed data streams. In SIGMOD '06, pages 301--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97, pages 412--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Byoung-Kee Yi, Nikolaos Sidiropoulos, Theodore Johnson, H. V. Jagadish, Christos Faloutsos, and Alexandros Biliris. Online data mining for co-evolving time sequences. In ICDE '00, page 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Yonggang Jerry Zhao, Ramesh Govindan, and Deborah Estrin. Computing aggregates for monitoring wireless sensor networks. In SNPA 03.Google ScholarGoogle Scholar
  34. Yunyue Zhu and Dennis Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB '02, pages 358--369. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Shape sensitive geometric monitoring

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '08: Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        June 2008
        330 pages
        ISBN:9781605581521
        DOI:10.1145/1376916

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 June 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Author Tags

        Qualifiers

        • research-article

        Acceptance Rates

        PODS '08 Paper Acceptance Rate28of159submissions,18%Overall Acceptance Rate642of2,707submissions,24%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader