ABSTRACT
A fundamental problem in distributed computation is the distributed evaluation of functions. The goal is to determine the value of a function over a set of distributed inputs, in a communication efficient manner. Specifically, we assume that each node holds a time varying input vector, and we are interested in determining, at any given time, whether the value of an arbitrary function on the average of these vectors crosses a predetermined threshold.
In this paper, we introduce a new method for monitoring distributed data, which we term shape sensitive geometric monitoring. It is based on a geometric interpretation of the problem, which enables to define local constraints on the data received at the nodes. It is guaranteed that as long as none of these constraints has been violated, the value of the function does not cross the threshold. We generalize previous work on geometric monitoring, and solve two problems which seriously hampered its performance: as opposed to the constraints used so far, which depend only on the current values of the local input vectors, here we incorporate their temporal behavior into the constraints. Also, the new constraints are tailored to the geometric properties of the specific function which is being monitored, while the previous constraints were generic.
Experimental results on real world data reveal that using the new geometric constraints reduces communication by up to three orders of magnitude in comparison to existing approaches, and considerably narrows the gap between existing results and a newly defined lower bound on the communication complexity.
- Shipra Agrawal, Supratim Deb, K. V. M. Naidu, and Rajeev Rastogi. Efficient detection of distributed constraint violations. In ICDE '07, pages 1320--1324.Google Scholar
- Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. In STOC '96, pages 20--29. Google ScholarDigital Library
- Arvind Arasu and Gurmeet Singh Manku. Approximate counts and quantiles over sliding windows. In PODS '04, pages 286--296. Google ScholarDigital Library
- Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Models and issues in data stream systems. In PODS '02, pages 1--16. Google ScholarDigital Library
- Brian Babcock and Chris Olston. Distributed top-k monitoring. In SIGMOD '03, pages 28--39. Google ScholarDigital Library
- Donald Carney, Ugur ¸Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Greg Seidman, Michael Stonebraker, Nesime Tatbul, and Stanley B. Zdonik. Monitoring streams - a new class of data management applications. In VLDB '02, pages 215--226. Google ScholarDigital Library
- Amit Chakrabarti, Graham Cormode, and Andrew McGregor. A near-optimal algorithm for computing the entropy of a stream. In SODA '07. Google ScholarDigital Library
- Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In ICALP '02, pages 693--703. Google ScholarDigital Library
- Edith Cohen and Martin J. Strauss. Maintaining time-decaying stream aggregates. J. Algorithms, 59(1):19--36. Google ScholarDigital Library
- G. Cormode, R. Keralapura, and J. Ramimirtham. Communication-efficient distributed monitoring of thresholded counts. In SIGMOD '06. Google ScholarDigital Library
- Graham Cormode and Minos Garofalakis. Sketching streams through the net: distributed approximate query tracking. In VLDB '05, pages 13--24. Google ScholarDigital Library
- Graham Cormode, Minos Garofalakis, S. Muthukrishnan, and Rajeev Rastogi. Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In SIGMOD '05, pages 25--36. Google ScholarDigital Library
- Graham Cormode, S. Muthukrishnan, and Wei Zhuang. Conquering the divide: Continuous clustering of distributed data streams. In ICDE '07, pages 1036--1045.Google Scholar
- Graham Cormode, S. Muthukrishnan, and Wei Zhuang. What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In ICDE '06, page 57. Google ScholarDigital Library
- Abhinandan Das, Sumit Ganguly, Minos Garofalakis, and Rajeev Rastogi. Distributed set-expression cardinality estimation. In VLDB '04, pages 312--323. Google ScholarDigital Library
- Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Maintaining stream statistics over sliding windows: (extended abstract). In SODA '02, pages 635--644. Google ScholarDigital Library
- Mark Dilman and Danny Raz. Efficient reactive monitoring. In INFOCOM '01, pages 1012--1019.Google Scholar
- Gereon Frahling, Piotr Indyk, and Christian Sohler. Sampling in dynamic data streams and applications. In SCG '05, pages 142--149. Google ScholarDigital Library
- Ling Huang, Minos Garofalakis, Joseph Hellerstein, Anthony Joseph, and Nina Taft. Toward sophisticated detection with distributed triggers. In MineNet '06, pages 311--316. Google ScholarDigital Library
- Ling Huang, Minos Garofalakis, Joseph Hellerstein, Anthony Joseph, and Nina Taft. Toward sophisticated detection with distributed triggers. In MineNet '06, pages 311--316. Google ScholarDigital Library
- Ankur Jain, Joseph M. Hellerstein, Sylvia Ratnasamy, and David Wetherall. A wakeup call for internet monitoring systems: The case for distributed triggers. In Proc. 3rd ACM SIGCOMM Workshop on Hot Topics in Networks (HotNets), 2004.Google Scholar
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google ScholarDigital Library
- David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004. Google ScholarDigital Library
- Samuel Madden, Mehul Shah, Joseph M. Hellerstein, and Vijayshankar Raman. Continuously adaptive continuous queries over streams. In SIGMOD '02, pages 49--60. Google ScholarDigital Library
- Amit Manjhi, Vladislav Shkapenyuk, Kedar Dhamdhere, and Christopher Olston. Finding (recently) frequent items in distributed data streams. In ICDE '05, pages 767--778. Google ScholarDigital Library
- Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB '02, pages 346--357. Google ScholarDigital Library
- Gurmeet Singh Manku and Rajeev Motwani. Approximate frequency counts over data streams. In VLDB '02, pages 346--357. Google ScholarDigital Library
- P.A. Parrilo. Semidefinite programming relaxations for semialgebraic problems. Mathematical Programming, 96(2):293--320, 2003.Google ScholarCross Ref
- T.G. Rose, M. Stevenson, and M. Whitehead. The Reuters Corpus Volume 1 - from Yesterday's News to Tomorrow's Language Resources. In LREC '02, pages 827--832.Google Scholar
- Izchak Sharfman, Assaf Schuster, and Daniel Keren. A geometric approach to monitoring threshold functions over distributed data streams. In SIGMOD '06, pages 301--312. Google ScholarDigital Library
- Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97, pages 412--420. Google ScholarDigital Library
- Byoung-Kee Yi, Nikolaos Sidiropoulos, Theodore Johnson, H. V. Jagadish, Christos Faloutsos, and Alexandros Biliris. Online data mining for co-evolving time sequences. In ICDE '00, page 13. Google ScholarDigital Library
- Yonggang Jerry Zhao, Ramesh Govindan, and Deborah Estrin. Computing aggregates for monitoring wireless sensor networks. In SNPA 03.Google Scholar
- Yunyue Zhu and Dennis Shasha. Statstream: Statistical monitoring of thousands of data streams in real time. In VLDB '02, pages 358--369. Google ScholarDigital Library
Index Terms
- Shape sensitive geometric monitoring
Recommendations
Shape Sensitive Geometric Monitoring
An important problem in distributed, dynamic databases is to continuously monitor the value of a function defined on the nodes, and check that it satisfies some threshold constraint. We introduce a monitoring method, based on a geometric interpretation ...
Geometric modeling in shape space
SIGGRAPH '07: ACM SIGGRAPH 2007 papersWe present a novel framework to treat shapes in the setting of Riemannian geometry. Shapes -- triangular meshes or more generally straight line graphs in Euclidean space -- are treated as points in a shape space. We introduce useful Riemannian metrics ...
Geometric modeling in shape space
We present a novel framework to treat shapes in the setting of Riemannian geometry. Shapes -- triangular meshes or more generally straight line graphs in Euclidean space -- are treated as points in a shape space. We introduce useful Riemannian metrics ...
Comments