Complex Event Processing (CEP) or stream data processing are becoming increasingly popular as the platform underlying event-driven solutions and applications in industries such as financial services, oil & gas, smart grids, health care, and IT monitoring. Satisfactory performance is crucial for any solution across these industries. Typically, performance of CEP engines is measured as (1)
, i.e., number of input events processed per second, and (2)
, which denotes the time it takes for the result (output events) to emerge from the system after the business event (input event) happened. While data rates are typically easy to measure by capturing the numbers of input events over time, latency is less well defined. As it turns out, a definition becomes particularly challenging in the presence of data arriving out of order. That means that the order in which events arrive at the system is different from the order of their timestamps. Many important distributed scenarios need to deal with out-of-order arrival because communication delays easily introduce disorder.
With out-of-order arrival, a CEP system cannot produce final answers as events arrive. Instead, time first needs to progress enough in the overall system before correct results can be produced. This introduces additional latency beyond the time it takes the system to perform the processing of the events. We denote the former as
and the latter as
. This paper discusses both types of latency in detail and defines them formally without depending on particular semantics of the CEP query plans. In addition, the paper suggests incorporating these definitions as metrics into the benchmarks that are being used to assess and compare CEP systems.