Elsevier

Future Generation Computer Systems

Volume 86, September 2018, Pages 1032-1041
Future Generation Computer Systems

A performance modeling framework for lambda architecture based applications

https://doi.org/10.1016/j.future.2017.07.033Get rights and content

Highlights

  • This paper presents a performance evaluation of Lambda architecture based computing systems.

  • The system is modeled by means of a multisolution approach.

  • Our work aims at providing insights to designers.

Abstract

The lambda architectural pattern allows to overcome some limitations of data processing frameworks. It builds on the methodology of having two different data processing streams on the same system: a real time computing for fast data streams and a batch computing behavior for massive workloads for delayed processing. While these two modes are clearly not new, lambda architectures allow them to coordinate their execution to avoid interference. However resource allocation over cloud infrastructure, has greatly impacted the overall performances (and importantly costs). If performance could be modeled in advance, architects could make better judgments on allocation of their resources to use the systems more efficiently. In this paper, we present a modeling approach, based on multiformalism and multisolution techniques, that provides a fast evaluation tool to support design choices about parameters and eventually lead to better architecture designs.

Introduction

The coupling of velocity and volume requirements of Big Data applications are important challenges that need to cope with algorithms and require efficient architectural solutions. Data flows, in general, are characterized as high frequency and multiple streams, containing information that needs timely processing to generate value. For example, using sensors in smart cities have increased knowledge about urban areas, like allowing traffic to be managed more efficiently, reduce energy costs and improve citizen experiences. However, with this data explosion, it is a challenge to implement quick processing system to determine meaning from data in a timely manner.

Although Big Data systems inherently exploit high parallelism, conventional architectural solutions may not be fast enough to manage and cater to requirements. Specialized solutions are needed that use advanced software and/or hardware to process in real time at minimum costs and implementation. An example is design pattern used in industry is the lambda architecture, defined as “a data-processing design pattern to handle massive quantities of data and integrate batch and real-time processing within a single framework” [1]. A lambda architecture compliant design, together with a well targeted deployment and a proper configuration of the parameters, is argued to enable an efficient implementation for real-time demands.

To support the design choices and adaptation to the dynamics of workload, this paper based on [2], we propose a modeling framework for performance evaluation of systems running applications based on lambda architecture. To provide a user-oriented approach, the framework is based on the SIMTHESys [3] framework to provide a domain-oriented model specification languages. The underlying approach is based on multiformalism modeling and multisolution [4], by means of a domain specific language. By translation, different submodels are solved analytically by an iterative algorithm. The approach is demonstrated against a case study.

The paper is organized as follows: Section 2 introduces lambda architectures as implementations of the lambda pattern and Section 3 discusses a brief presentation of related works. The Section 4 presents the modeling approach adopted using the SIMTHESys approach with Section 5 describing the solution technique. Finally Section 6 summarizes conclusions and presents future extensions of this work.

Section snippets

Lambda architecture design pattern

Cloud computing has introduced multiple opportunities such as infinite resource access and their management for complex tasks. Using cloud-based architectures has lowered the access barriers to large infrastructures by providing virtualization of remote resources at an affordable cost. Being a service-based computing approach (either as a Software as a Service (SaaS), Platform as a Service (PaaS) or Infrastructure as a Service (IaaS)), there is a significant effort to design systems to obtain

Related work

Performance evaluation of large computing facilities, is extremely complex. The scale allows the interrelations and the abstraction layers produce a large number of mutual influences and the number of parameters explodes. In these conditions, simulation approaches need longer times to produce significant and accurate results and the design complexity may become unmanageable. Classical analytical approaches also generally suffer from state space explosion and may not be able to scale up to

Modeling approach

The performances of a system complying with the lambda architecture are influenced by several factors. The number of data streams that need to be processed and frequency at which updates arrive determines the workload of the speed component of the architecture. The workload of the batch component is determined by the number of batch processes that are executed and by the frequency at which they occur. The workload of the serving component is defined by the number of queries to be executed and

Results

Models created according to the formalism presented in Section 4 are solved using a multisolution approach1as a set of interdependent queuing networks. In particular, each subsystem identified by specific primitives is converted into either a queuing or deterministic timing model

Conclusions

In this paper we presented a modeling approach that is suitable for performance evaluation of lambda architectures to support the design and assessment decision process. The proposed solution is able to provide a fast tool that works with synthetic characterizations, or hypotheses, about the workload and converges in a wide set of the parameter space. To let the approach be usable by domain experts that are not familiar with analytical modeling, we also provided a domain specific language that

Acknowledgment

This article is based upon work from COST Action IC1406 High-Performance Modelling and Simulation for Big Data Applications (cHiPSet), supported by COST (European Cooperation in Science and Technology) .

M. Gribaudo is an Associate Professor at the Politecnico di Milano, Italy. He works in the performance evaluation group. His current research interests are multi-formalism modeling, queuing networks fluid models, mean field analysis and spatial models. The main applications to which the previous methodologies are applied comes from Big Data applications, Cloud Computing, Multi-Core Architectures and Wireless Sensor Networks.

References (25)

  • BarbieratoE. et al.

    Exploiting product forms solution techniques in multiformalism modeling

    Electron. Notes Theor. Comput. Sci.

    (2013)
  • MianR. et al.

    Provisioning data analytic workloads in a cloud

    Future Gener. Comput. Syst.

    (2013)
  • CastiglioneA. et al.

    Exploiting mean field analysis to model performances of Big Data architectures

    Future Gener. Comput. Syst.

    (2014)
  • Amazon, Lambda architecture for batch and real-time processing on AWS withff spark streaming and spark SQL.URL...
  • KiranM. et al.

    Lambda architecture for cost-effective batch and speed big data processing

  • GribaudoM. et al.

    An introduction to multiformalism modeling

  • KhazaeiH. et al.

    Performance analysis of cloud computing centers using M/G/m/m+r queuing systems

    IEEE Trans. Parallel Distrib. Syst.

    (2012)
  • GillD. et al.

    Approaches for software performance modelling, cloud computing and openstack

    Int. J. Comput. Appl.

    (2015)
  • M.A. Ismail, M.F. Ismail, H. Ahmed, Openstack cloud performance optimization using Linux services, in: 2015...
  • Y. Mei, L. Liu, X. Pu, S. Sivathanu, Performance measurements and analysis of network I/O applications in virtualized...
  • OlstonC. et al.

    Pig latin: a not-so-foreign language for data processing

  • ChaikenR. et al.

    SCOPE: easy and efficient parallel processing of massive data sets

    Proc. VLDB Endow.

    (2008)
  • Cited by (0)

    M. Gribaudo is an Associate Professor at the Politecnico di Milano, Italy. He works in the performance evaluation group. His current research interests are multi-formalism modeling, queuing networks fluid models, mean field analysis and spatial models. The main applications to which the previous methodologies are applied comes from Big Data applications, Cloud Computing, Multi-Core Architectures and Wireless Sensor Networks.

    M. Iacono is a tenured Assistant Professor and Senior Researcher (holding a qualification as Associate Professor) in Information Processing Systems at Dipartimento di Matematica e Fisica, Università degli Studi della Campania “Luigi Vanvitelli” (previously known as Seconda Università degli Studi di Napoli). He holds a Ph.D. degree in Electrical Engineering and a M.Sc. degree in Computer Engineering. He published more than 70 peer reviewed scientific papers on journals, books and conference proceedings and has served as editor, chairman, committee member and reviewer for around 25 journals and more than 100 conferences. He is a member of IEEE and other scientific societies. His research activity is mainly centered on the field of performance modeling of complex computer-based systems, with a special attention for multiformalism modeling techniques, critical systems, Cloud and Big Data systems, cyberphysical systems. More information is available at http://www.mauroiacono.com.

    M. Kiran works as a Research Scientist at LBNL, working on intent-based networking and engineering intelligent networks for optimizing performance and user experience. Her work focuses on learning and decentralized optimization of system architectures and algorithms for high performance computing, agent-based simulations, underlying networks and Cloud infrastructures. She has been exploring various platforms such as HPC grids, GPUs, Cloud and SDN-related technologies. She uses optimization of QoS, performance using parallelization algorithms and software engineering principles to solve complex data intensive problems such as large-scale complex simulations. Over the years, she has been working with biologists, economists, social scientists, building tools and performing optimization of architectures for multiple problems in their domain.

    View full text