ABSTRACT
In several database applications, parameters like selectivities and load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of query optimizers and monitoring schemes can be improved by spending resources like time or bandwidth in observing or resolving these parameters, so that better query plans can be generated. In a resource-constrained situation, deciding which parameters to observe in order to best optimize the expected quality of the plan generated (or in general, optimize the expected value of a certain objective function) itself becomes an interesting optimization problem.We present a framework for studying such problems, and present several scenarios arising in anomaly detection in complex systems, monitoring extreme values in sensor networks, load shedding in data stream systems, and estimating rates in wireless channels and minimum latency routes in networks, which can be modeled in this framework with the appropriate objective functions.Even for several simple objective functions, we show the problems are Np-Hard. We present greedy algorithms with good performance bounds. The proof of the performance bounds are via novel sub-modularity arguments.
- A. Akella, B. M. Maggs, S. Seshan, A. Shaikh, and R. K. Sitaraman. A measurement-based analysis of multihoming. In ACM SIGCOMM Conference, pages 353--364, 2003.]] Google ScholarDigital Library
- R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 261--272, 2000.]] Google ScholarDigital Library
- B. Babcock and S. Chaudhuri. Towards a robust query optimizer: A principled and practical approach. In SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pages 119--130, 2005.]] Google ScholarDigital Library
- B. Babcock and C. Olston. Distributed top-k monitoring. In SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 28--39, 2003.]] Google ScholarDigital Library
- S. Babu and P. Bizarro. Proactive reoptimization. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, 2005.]] Google ScholarDigital Library
- R. Carr, L. Fleischer, V. Leung, and C. Phillips. Strengthening integrality gaps for capacitated network design and covering problems. In Proc. of the Annual ACM-SIAM Symp. on Discrete Algorithms, 2000.]] Google ScholarDigital Library
- M. Charikar, R. Fagin, J. Kleinberg, P. Raghavan, and A. Sahai. Querying priced information. Proc. of the Annual ACM Symp. on Theory of Computing, 2000.]] Google ScholarDigital Library
- F. Chu, J. Halpern, and J. Gehrke. Least expected cost query optimization: What can we expect? Proc. of the ACM Symp. on Principles of Database Systems, 2002.]] Google ScholarDigital Library
- F. Chu, J. Halpern, and P. Seshadri. Least expected cost query optimization: An exercise in utility. In Proc. of the ACM Symp. on Principles of Database Systems, 1999.]] Google ScholarDigital Library
- B. Dean, M. Goemans, and J. Vondrák. Approximating the stochastic knapsack problem: The benefit of adaptivity. Proc. of the Annual Symp. on Foundations of Computer Science, 2004.]] Google ScholarDigital Library
- A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In Proc. of the 2004 Intl. Conf. on Very Large Data Bases, 2004.]]Google ScholarDigital Library
- T. Feder, R. Motwani, R. Panigrahy, C. Olston, and J. Widom. Computing the median with uncertainty. SIAM J. Comput., 32(2), 2003.]] Google ScholarDigital Library
- A. Goel and P. Indyk. Stochastic load balancing and related problems. Proc. of the Annual Symp. on Foundations of Computer Science, 1999.]] Google ScholarDigital Library
- P. K. Gummadi, H. V. Madhyastha, S. D. Gribble, H. M. Levy, and D. Wetherall. Improving the reliability of internet paths with one-hop source routing. In 6th Symposium on Operating System Design and Implementation (OSDI), pages 183--198, 2004.]] Google ScholarDigital Library
- A. Gupta, M. Pál, R. Ravi, and A. Sinha. Boosted sampling: Approximation algorithms for stochastic optimization. Proc. of the Annual ACM Symp. on Theory of Computing, 2004.]] Google ScholarDigital Library
- A. Gupta, R. Ravi, and A. Sinha. An edge in time saves nine: LP rounding approximation algorithms for stochastic network design. In Proc. of the Annual Symp. on Foundations of Computer Science, pages 218--227, 2004.]] Google ScholarDigital Library
- N. Immorlica, D. Karger, M. Minkoff, and V. Mirrokni. On the costs and benefits of procrastination: Approximation algorithms for stochastic combinatorial optimization problems. In Proc. of the Annual ACM-SIAM Symp. on Discrete Algorithms, 2004.]] Google ScholarDigital Library
- S. Khanna and W-C. Tan. On computing functions with uncertainty. In Proc. of the ACM Symp. on Principles of Database Systems, 2001.]] Google ScholarDigital Library
- J. Kleinberg, Y. Rabani, and É. Tardos. Allocating bandwidth for bursty connections. SIAM J. Comput, 30(1), 2000.]] Google ScholarDigital Library
- S. Kolliopoulos and N. Young. Tight approximation results for general covering integer programs. Proc. of the Annual Symp. on Foundations of Computer Science, 2001.]] Google ScholarDigital Library
- A. Krause and C. Guestrin. Near-optimal nonmyopic value of information in graphical models. Twenty-first Conference on Uncertainty in Artificial Intelligence (UAI 2005), 2005.]]Google ScholarDigital Library
- M. L. Massie, B. N. Chun, and D. E. Culler. The Ganglia distributed monitoring system: Design, implementation, and experience. Parallel Computing, 30(7), 2004.]]Google Scholar
- G. L. Nemhauser, L. A. Wolsey, and M. L. Fisher. An analysis of approximations for maximizing submodular set functions-I. Math Programming, 14(1):265--294, 1978.]]Google ScholarDigital Library
- C. Olston. Approximate Replication. PhD thesis, Stanford University, 2003.]] Google ScholarDigital Library
- D. Shmoys and C. Swamy. Stochastic optimization is (almost) as easy as discrete optimization. Proc. of the Annual Symp. on Foundations of Computer Science, 2004.]] Google ScholarDigital Library
- A. Silberstein, R. Braynard, C. Ellis, K. Munagala, and J. Yang. A sampling based approach to optimizing top-k queries in sensor networks. In Proc. of the Intl. Conf. on Data Engineering, 2006.]] Google ScholarDigital Library
- N. Tatbul, U. Etintemel, S. Zdonik, M. Chemiack, and M. Stonebraker. Load shedding in a data stream manager. In Proc. of the Intl. Conf. on Very Large Data Bases, 2003.]]Google ScholarCross Ref
Index Terms
- Asking the right questions: model-driven optimization using probes
Recommendations
Asking and Answering Questions during a Programming Change Task
Little is known about the specific kinds of questions programmers ask when evolving a code base and how well existing tools support those questions. To better support the activity of programming, answers are needed to three broad research questions: (1) ...
Relevant Answers to WH-Questions
We consider two issues relating to WH-questions:
(i) when you ask a WH-question you already have a description of the entity you are interested in, namely the description embodied in the question itself. You may even have very direct access to the entity – ...
Asking what no one has asked before: using phrase similarities to generate synthetic web search queries
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementThis paper introduces a method for automatically inferring meaningful, not-yet-submitted queries. The inferred queries fill some of the knowledge gaps between documents, on one hand, and known (i.e., already-submitted) queries, on the other hand. Thus, ...
Comments