ABSTRACT
Writing real-time applications that react to vast amounts of incoming data is a hard problem, as the volume of incoming data implies the need for distributed execution on a cluster architecture. We envision such an application can be created as a data processing pipeline which consists of a set of generic, reactive components, which may be reused in other applications. However, there is currently no programming model or framework that enables the reactive, scalable execution of such a pipeline on a cluster architecture. Our work introduces the notion of reactive workflows, a technique that combines concepts from scientific workflows and reactive programming. Reactive workflows enable the integration of these generic components into a single workflow that can be executed on a cluster architecture in a reactive, scalable way. To deploy these reactive workflows, we introduce a domain specific language, called Skitter. Skitter enables developers to write reactive components and compose these into reactive workflows, which can be distributed over a cluster by Skitter’s runtime system.
- Gul Agha. 1986. Actors: A Model of Concurrent Computation in Distributed Systems. (1986). Google ScholarDigital Library
- Tyler Akidau, Robert Bradshaw, Craig Chambers, Slava Chernyak, Rafael J. Fernández-Moctezuma, Reuven Lax, Sam McVeety, Daniel Mills, Frances Perry, Eric Schmidt, and Sam Whittle. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1792–1803. Google ScholarDigital Library
- Arvind and R. S. Nikhil. 1990. Executing a Program on the MIT TaggedToken Dataflow Architecture. IEEE Trans. Comput. 39, 3 (March 1990), 300–318. Google ScholarDigital Library
- Engineer Bainomugisha, Andoni Lombide Carreton, Tom van Cutsem, Stijn Mostinckx, and Wolfgang de Meuter. 2013. A Survey on Reactive Programming. Comput. Surveys 45, 4 (Aug. 2013), 52:1–52:34. Google ScholarDigital Library
- Paolo Di Tommaso, Maria Chatzou, Evan W Floden, Pablo Prieto Barja, Emilio Palumbo, and Cedric Notredame. 2017. Nextflow Enables Reproducible Computational Workflows. Nature Biotechnology 35 (April 2017), 316.Google Scholar
- Joscha Drechsler, Guido Salvaneschi, Ragnar Mogk, and Mira Mezini. 2014. Distributed REScala: An Update Algorithm for Distributed Reactive Programming. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications (OOPSLA ’14). ACM, New York, NY, USA, 361–376. Google ScholarDigital Library
- Wesley M. Johnston, J. R. Paul Hanna, and Richard J. Millar. 2004. Advances in Dataflow Programming Languages. Comput. Surveys 36, 1 (March 2004), 1–34. Google ScholarDigital Library
- Ji Liu, Esther Pacitti, Patrick Valduriez, and Marta Mattoso. 2015. A Survey of Data-Intensive Scientific Workflow Management. Journal of Grid Computing 13, 4 (Dec. 2015), 457–493. Google ScholarDigital Library
- Saeed Shahrivari. 2014. Beyond Batch Processing: Towards Real-Time and Streaming Big Data. Computers 3, 4 (Oct. 2014), 117–129.Google ScholarCross Ref
- Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, and Dmitriy Ryaboy. 2014. Storm@Twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). ACM, New York, NY, USA, 147–156. Google ScholarDigital Library
- Sam Van den Vonder, Joeri De Koster, Florian Myter, and Wolfgang De Meuter. 2017. Tackling the Awkward Squad for Reactive Programming: The Actor-Reactor Model. In Proceedings of the 4th ACM SIGPLAN International Workshop on Reactive and Event-Based Languages and Systems (REBLS 2017). ACM, New York, NY, USA, 27–33. Google ScholarDigital Library
- Michael Wilde, Mihael Hategan, Justin M Wozniak, Ben Clifford, Daniel S Katz, and Ian Foster. 2011. Swift: A Language for Distributed Parallel Scripting. Parallel Comput. 37, 9 (2011), 633–652. Google ScholarDigital Library
- Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, and Ion Stoica. 2012. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters. HotCloud 12 (2012), 10–10. Google ScholarDigital Library
Index Terms
- Skitter: a DSL for distributed reactive workflows
Recommendations
Evaluating parameter sweep workflows in high performance computing
SWEET '12: Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and TechnologiesScientific experiments based on computer simulations can be defined, executed and monitored using Scientific Workflow Management Systems (SWfMS). Several SWfMS are available, each with a different goal and a different engine. Due to the exploratory ...
Monitoring of Grid scientific workflows
Large-Scale Programming Tools and EnvironmentsScientific workflows are a means of conducting in silico experiments in modern computing infrastructures for e-Science, often built on top of Grids. Monitoring of Grid scientific workflows is essential not only for performance analysis but also to ...
Distributed REScala: an update algorithm for distributed reactive programming
OOPSLA '14Reactive programming improves the design of reactive applications by relocating the logic for managing dependencies between dependent values away from the application logic to the language implementation. Many distributed applications are reactive. Yet, ...
Comments