ABSTRACT
In distributed data-parallel computing, a user program is compiled into an execution plan graph (EPG), typically a directed acyclic graph. This EPG is the core data structure used by modern distributed execution engines for task distribution, job management, and fault tolerance. Once submitted for execution, the EPG remains largely unchanged at runtime except for some limited modifications. This makes it difficult to employ dynamic optimization techniques that could substantially improve the distributed execution based on runtime information.
This paper presents Optimus, a framework for dynamically rewriting an EPG at runtime. Optimus extends dynamic rewrite mechanisms present in systems such as Dryad and CIEL by integrating rewrite policy with a high-level data-parallel language, in this case DryadLINQ. This integration enables optimizations that require knowledge of the semantics of the computation, such as language customizations for domain-specific computations including matrix algebra. We describe the design and implementation of Optimus, outline its interfaces, and detail a number of rewriting techniques that address problems arising in distributed execution including data skew, dynamic data re-partitioning, handling unbounded iterative computations, and protecting important intermediate data for fault tolerance. We evaluate Optimus with real applications and data and show significant performance gains compared to manual optimization or customized systems. We demonstrate the versatility of dynamic EPG rewriting for data-parallel computing, and argue that it is an essential feature of any general-purpose distributed dataflow execution engine.
- P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, and K. Yi. Mergeable summaries. In PODS, 2012. Google ScholarDigital Library
- S. Agarwal, S. Kandula, N. Bruno, M.-C. Wu, I. Stoica, and J. Zhou. Re-optimizing data parallel computing. In NSDI '12, April 2012. Google ScholarDigital Library
- G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris. Reining in the outliers in mapreduce clusters using mantri. In OSDI'10. Google ScholarDigital Library
- R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In ACM SIGMOD, 2000. Google ScholarDigital Library
- S. Babu, P. Bizarro, and D. DeWitt. Proactive re-optimization with Rio. In ACM SIGMOD, 2005. Google ScholarDigital Library
- J. Bennett and S. Lanning. The Netflix prize. In ACM SIGKDD 2007.Google Scholar
- K. Beyer, P. J. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. In SIGMOD 2007. Google ScholarDigital Library
- Y. Bu, B. Howe, M. Balazinska, and M. Ernst. Haloop: Efficient iterative data processing on large clusters. In VLDB, 2010. Google ScholarDigital Library
- C. Chambers, A. Raniwala, F. Perry, S. Adams, R. Henry, R. Bradshaw, and N. Weizenbaum. FlumeJava: easy, efficient data-parallel pipelines. In PLDI'10. Google ScholarDigital Library
- S. Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. In SIGMOD 1999. Google ScholarDigital Library
- S. Chaudhuri, R. Motwani, and V. R. Narasayya. Random sampling for histogram construction: How much is enough? In SIGMOD 1998. Google ScholarDigital Library
- G. Cormode and M. Hadjieleftheriou. Finding frequent items in data streams. In VLDB 2008. Google ScholarDigital Library
- J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008. Google ScholarDigital Library
- A. Deshpande, Z. Ives, and V. Raman. Adaptive query processing. Foundations and Trends in Databases, 1(1):1--140, 2007. Google ScholarDigital Library
- D. J. DeWitt, J. F. Naughton, D. A. Schneider, and S. Seshadri. Practical skew handling in parallel joins. In VLDB 1992. Google ScholarDigital Library
- E. Gonina, A. Kannan, J. Shafer, and M. Budiu. Parallelizing large-scale data processing applications with data skew: a case study in product-offer matching. In Intl. Workshop on MapReduce and its Applications (MAPREDUCE), 2011. Google ScholarDigital Library
- G. Graefe. Volcano - an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng., 6(1):120--135, 1994. Google ScholarDigital Library
- P. Gunda, L. Ravindranath, C. A. Thekkath, Y. Yu, and L. Zhuang. Nectar: Automatic Management of Data and Computation in Datacenters. In OSDI'10. Google ScholarDigital Library
- The Hadoop project. http://hadoop.apache.org/.Google Scholar
- Hadoop NextGen MapReduce (YARN). http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html.Google Scholar
- B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In NSDI '11, March 2011. Google ScholarDigital Library
- M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys 2007. Google ScholarDigital Library
- N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In ACM SIGMOD, 1998. Google ScholarDigital Library
- A. Kannan, I. E. Givoni, R. Agrawal, and A. Fuxman. Matching unstructured product offers to structured product specifications. In ACM SIGKDD 2011. Google ScholarDigital Library
- Q. Ke, V. Prabhakaran, Y. Xie, Y. Yu, J. Wu, and J. Yang. Optimizing data partitioning for data-parallel computing. In HotOS '11, 2011. Google ScholarDigital Library
- S. Y. Ko, I. Hoque, B. Cho, and I. Gupta. On availability of intermediate data in cloud computations. In HotOS '09, 2009. Google ScholarDigital Library
- Y. Kwon, M. Balazinska, B. Howe, and J. Rolia. SkewTune: Mitigating skew in MapReduce applications. In ACM SIGMOD, 2012. Google ScholarDigital Library
- Mahout project. http://mahout.apache.org/.Google Scholar
- V. Markl, V. Raman, D. E. Simmen, G. M. Lohman, and H. Pirahesh. Robust query processing through progressive optimization. In ACM SIGMOD, pages 659--670, 2004. Google ScholarDigital Library
- P. Mishra and M. H. Eich. Join processing in relational databases. ACM Comput. Surv., 24(1):63--113, 1992. Google ScholarDigital Library
- D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. Ciel: a universal execution engine for distributed data-flow computing. In NSDI 2011. Google ScholarDigital Library
- A. Okcan and M. Riedewald. Processing theta-joins using mapreduce. In ACM SIGMOD, pages 949--960, 2011. Google ScholarDigital Library
- C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig Latin: a not-so-foreign language for data processing. In SIGMOD 2008. Google ScholarDigital Library
- A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In ACM SIGMOD, 2009. Google ScholarDigital Library
- R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In OSDI, 2010. Google ScholarDigital Library
- Z. Qian, X. Chen, N. Kang, M. Chen, Y. Yu, T. Moscibroda, and Z. Zhang. MadLINQ: Large-scale distributed matrix computation for the cloud. In EuroSys 2012. Google ScholarDigital Library
- A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a Map-Reduce framework. Proc. VLDB Endow., 2(2), 2009. Google ScholarDigital Library
- Y. Xie, F. Yu, and M. Abadi. De-anonymizing the internet using unreliable IDs. In SIGCOMM 2009, August 2009. Google ScholarDigital Library
- Y. Yu, P. K. Gunda, and M. Isard. Distributed aggregation for data-parallel computing: interfaces and implementations. In SOSP, 2009. Google ScholarDigital Library
- Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In OSDI, 2008. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI 2012. Google ScholarDigital Library
- J. Zhang, R. C. Hucheng Zhou, X. Fan, Z. Guo, H. Lin, J. Y. Li, W. Lin, J. Zhou, and L. Zhou. Optimizing data shuffling in data-parallel computation by understanding user-defined functions. In NSDI, 2012. Google ScholarDigital Library
Index Terms
- Optimus: a dynamic rewriting framework for data-parallel execution plans
Recommendations
Optimus: an efficient dynamic resource scheduler for deep learning clusters
EuroSys '18: Proceedings of the Thirteenth EuroSys ConferenceDeep learning workloads are common in today's production clusters due to the proliferation of deep learning driven AI services (e.g., speech recognition, machine translation). A deep learning training job is resource-intensive and time-consuming. ...
Optimus: A Parallel Optimization Framework with Topology Aware PSO and Applications
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and AnalysisThis research presents a parallel metaheuristic optimization framework, Optimus (Optimization Methods for Universal Simulators) for integration of a desired population-based search method with a target scientific application. Optimus includes a parallel ...
Optimus: efficient realization of streaming applications on FPGAs
CASES '08: Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systemsIn this paper, we introduce Optimus: an optimizing synthesis compiler for streaming applications. Optimus compiles programs written in a high level streaming language to either software or hardware implementations. The compiler uses a hierarchical ...
Comments