Abstract
In the parallel pipelined filter ordering problem, we are given a set of n filters that run in parallel. The filters need to be applied to a stream of elements, to determine which elements pass all filters. Each filter has a rate limit ri on the number of elements it can process per unit time, and a selectivity pi, which is the probability that a random element will pass the filter. The goal is to maximize throughput. This problem appears naturally in a variety of settings, including parallel query optimization in databases and query processing over Web services.
We present an O(n3) algorithm for this problem, given tree-structured precedence constraints on the filters. This extends work of Condon et al. [2009] and Kodialam [2001], who presented algorithms for solving the problem without precedence constraints. Our algorithm is combinatorial and produces a sparse solution. Motivated by join operators in database queries, we also give algorithms for versions of the problem in which “filter” selectivities may be greater than or equal to 1.
We prove a strong connection between the more classical problem of minimizing total work in sequential filter ordering (A), and the parallel pipelined filter ordering problem (B). More precisely, we prove that A is solvable in polynomial time for a given class of precedence constraints if and only if B is as well. This equivalence allows us to show that B is NP-Hard in the presence of arbitrary precedence constraints (since A is known to be NP-Hard in that setting).
- Avnur, R. and Hellerstein, J. M. 2000. Eddies: Continuously adaptive query processing. In SIGMOD '00: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, 261--272. Google ScholarDigital Library
- Babu, S., Motwani, R., Munagala, K., Nishizawa, I., and Widom, J. 2004. Adaptive ordering of pipelined stream filters. In SIGMOD '04: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, 407--418. Google ScholarDigital Library
- Bland, R. G., Goldfarb, D., and Todd, M. J. 1981. The ellipsoid method: A survey. Oper. Res. 29(6), 1039--1091.Google ScholarDigital Library
- Burge, J., Mingala, K., and Srivastava, U. 2005. Ordering pipelined query operators with precedence constraints. Tech. Rep. 2005-40, Stanford University.Google Scholar
- Chaudhuri, S., Dayal, U., and Yan, T. W. 1995. Join queries with external text sources: Execution and optimization techniques. In SIGMOD '95: Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM Press, 410--422. Google ScholarDigital Library
- Condon, A., Deshpande, A., Hellerstein, L., and Wu, N. 2006. Flow algorithms for two pipelined filter ordering problems. In PODS '06: Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 26-28, 2006, Chicago, Illinois, USA. ACM, 193--202. Google ScholarDigital Library
- Condon, A., Deshpande, A., Hellerstein, L., and Wu, N. 2009. Algorithms for distributional and adversarial pipelined filter ordering problems. ACM Transactions on Algorithms 5, 2. Google ScholarDigital Library
- Dantzig, G. and Thapa, M. 2003. Linear Programming 2: Theory and Extensions. Springer. Google ScholarDigital Library
- Deshpande, A., Guestrin, C., Hong, W., and Madden, S. 2005. Exploiting correlated attributes in acquisitional query processing. In ICDE '05: Proceedings of the 21st International Conference on Data Engine. IEEE Computer Society, 143--154. Google ScholarDigital Library
- Deshpande, A. and Hellerstein, L. 2008. Flow algorithms for parallel query optimization. In ICDE '08: Proceedings of the 24th International Conference on Data Engineering. IEEE, 754--763. Google ScholarDigital Library
- Etzioni, O., Hanks, S., Jiang, T., Karp, R. M., Madani, O., and Waarts, O. 1996. Efficient information gathering on the internet. In FOCS '96: Proceedings of the 37th IEEE Symposium on Foundations of Computer Science. IEEE Computer Society, 234--243. Google ScholarDigital Library
- Feige, U., Lovász, L., and Tetali, P. 2004. Approximating min-sum set cover. Algorithmica 40, 4, 219--234. Google ScholarDigital Library
- Goldman, R. and Widom, J. 2000. WSQ/DSQ: A practical approach for combined querying of databases and the web. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. ACM, 285--296. Google ScholarDigital Library
- Ibaraki, T. and Kameda, T. 1984. On the optimal nesting order for computing n-relational joins. ACM Trans. Database Syst. 9, 3, 482--502. Google ScholarDigital Library
- Kaplan, H., Kushilevitz, E., and Mansour, Y. 2005. Learning with attribute costs. In STOC' 05: The Thirty-Seventh Annual ACM Symposium on Theory of Computing. ACM Press, 356--365. Google ScholarDigital Library
- Khachiyan, L. G. 1979. A polynomial algorithm in linear programming. Soviet Math. Dokl. 20(1), 191--194.Google Scholar
- Kodialam, M. 2001. The throughput of sequential testing. In Proceedings of the Conference on Integer Programming and Combinatorial Optimization (IPCO). Lecture Notes in Computer Science, vol. 2081. Springer-Verlag, 280--292. Google ScholarDigital Library
- Krishnamurthy, R., Boral, H., and Zaniolo, C. 1986. Optimization of nonrecursive queries. In VLDB '86: Proceedings of the 12th International Conference on Very Large Data Bases. Morgan Kaufmann, 128--137. Google ScholarDigital Library
- Lawler, E. L. 1978. Sequencing jobs to minimize total weighted completion time subject to precedence constraints. Ann. Disc. Math. 2, 75--90.Google ScholarCross Ref
- Liu, Z., Parthasarathy, S., Ranganathan, A., and Yang, H. 2008. A generic flow algorithm for shared filter ordering problems. In PODS '08: Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 79--88. Google ScholarDigital Library
- Megiddo, N. and Chandrasekaran, R. 1989. On the epsilon-perturbation method for avoiding degeneracy. Oper. Res. Lett. 8, 305--308. Google ScholarDigital Library
- Schrijver, A. 1986. Theory of Linear and Integer Programming. Wiley-Interscience. Google ScholarDigital Library
- Shayman, M. and Fernandez-Gaucherand, E. 2001. Risk-sensitive decision-theoretic diagnosis. IEEE Trans. Automat. Cont. 46, 1166--1171.Google ScholarCross Ref
- Simon, H. and Kadane, J. 1975. Optimal problem-solving search: All-or-none solutions. Artif. Intell. 6, 235--247.Google ScholarCross Ref
- Srivastava, U., Munagala, K., and Widom, J. 2005. Operator placement for in-network stream query processing. In PODS '05: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM, 250--258. Google ScholarDigital Library
- Srivastava, U., Munagala, K., Widom, J., and Motwani, R. 2006. Query optimization over web services. In VLDB '06: Proceedings of the 32nd International Conference on Very Large Data Bases. ACM, 355--366. Google ScholarDigital Library
Index Terms
- Parallel pipelined filter ordering with precedence constraints
Recommendations
Algorithms for distributional and adversarial pipelined filter ordering problems
Pipelined filter ordering is a central problem in database query optimization. The problem is to determine the optimal order in which to apply a given set of commutative filters (predicates) to a set of elements (the tuples of a relation), so as to find,...
Flow algorithms for two pipelined filter ordering problems
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsPipelined filter ordering is a central problem in database query optimization, and has received renewed attention recently in the context of environments such as the web, continuous high-speed data streams and sensor networks. We present algorithms for ...
Parallel database sorting
Sorting in database processing is frequently required through the use of Order By and Distinct clauses in SQL. Sorting is also widely known in computer science community at large. Sorting in general covers internal and external sorting. Past published ...
Comments