Chapter: 6

Complex Event Processing (CEP)

September 22, 2021
10 min

The guide to Complex Event Processing (CEP)

Complex Event Processing (CEP) is a set of techniques used for interpreting real-time information such as social media posts, stock market feeds, supply chains, and traffic reports.


There are many facets to complex event processing, ranging from the applied algorithms to software development patterns. This article will focus on the distributed aspect of implementing complex event processing. Since many of the fundamental notions that define complex event processing overlap with stream processing, we recommend first reading our article explaining stream processing concepts.

Event-driven architecture

In a traditional programming model, operations are performed on data continuously producing results. In contrast, an event-driven architecture is a passive model where data processing functions are triggered only on predefined events. 

The traditional model processes every data object and produces some output. Instead, the event-driven model keeps processing data objects but only generates output for user-provided events. Stream processing techniques depend on events to trigger procedures as ingested data passes through the pipeline stages. 

The architects of an event-based application define and register events with the event processing engine acting as a logical coordination layer. Once events are identified, architects define specific actions for those events. The event processing engine then monitors data streams for the registered events and, once detected, forwards the event to the consumers and triggers the reaction procedures to process the events. 

An event-driven architecture usually has three main components: 

  1. Event
  2. Event Engine
  3. Action

The diagram below shows how the same components support the CEP functionality for different use cases.

how the same components support the CEP functionality for different use cases


Event definition

User-defined events behave as the building blocks of the event processing pipeline. We can borrow concepts from object-oriented programming to help explain this aspect of CEP.  

In the object-oriented paradigm, programmers define object functions based on the application use cases. Suppose developers of a virtual reality application created an object of the type “avatar” using the avatar class with associated methods such as speaking and listening. In this case, the object-oriented framework has allowed developers to define classes with functions and methods to satisfy their specific application needs. 

Similarly, the first step in event processing is to define events according to the application’s specific use cases. For example, in a smart city IoT application where the input data stream contains many sensor readings, an event can be defined as the value of pollutants in the air. This measurement directly correlates to the air quality and can generate an alert if the air quality index reaches a predefined threshold. 

Event identification and mapping

Once events are defined, the next step is to identify and map data to events systematically. The event processing engine performs event identification and mapping based on user-provided criteria. The data variables are ingested in various formats and mapped into predefined events, which is the core processing unit of CEP.

Different use cases will define and map the same input into different events. Using our previous IoT smart city example, a weather application will define events related to weather conditions based on the sensor measurements. Meanwhile, city authorities will map the same inputs to events associated with traffic or air quality. 

Actions

Actions behave as sinks (a function designed to receive an incoming event) after the event engine has processed the events. Like the event definition stage, the user usually specifies the action associated with each event. The user-defined actions depend on the application use cases. A typical example of an action is an alert that says a specific threshold has been reached. For example, once a machine’s temperature sensor goes above 70 degrees celsius, an alert is generated.

Simple event processing

In simple (or traditional) event processing, an event directly associates with an event consumer or a reaction procedure. The processing engine logic for simple event processing is straightforward, such as comparing input events in real-time against a few preset rules defined based on the application use case. 

Simple Event Processing typically doesn’t refer to stored historical data and relies only on real-time data. In terms of processing capacity, it’s typically not compute-intensive and implemented on off-the-shelf hardware.

Complex event processing

Unlike simple event processing, complex event processing deals with detecting and managing patterns, combinations, and correlations of events. These can be causal, temporal, or spatial and generally require sophisticated semantics and techniques. 

CEP considers multiple data points from available resources and provides interfaces to make complex inferences from various aspects of this data. CEP engines possess the capability to take into account historical data and aggregations when processing and identifying events. 

A human analogy of a complex event processing scenario is when we see many of our colleagues leaving for home at the end of a workday. At the same time, we receive a text from our spouse to run an errand on the way home, and we happen to notice that it’s raining outside. We gather information from multiple sources, then combine them in our minds with our historical knowledge to conclude that we will probably be late for dinner.

CEP real-world use cases and application areas

Typical real-world use cases of CEP generally have three basic requirements:

  1. High heterogeneous input data volume
  2. Low latency
  3. Complex pattern recognition 

While the range of applications having these requirements is vast, we can typically categorize them into applications having one or more aspects of:

  1. “Situation awareness.”
  2. “Sense and respond.”
  3. “Track and trace.” 

CEP has practical applications in digital marketing (targeted advertising based on a viewer’s profile), fraud detection (detecting a fraudulent credit card transaction), supply chain management (calculating real-time inventory based on RFID), and firewall systems (detecting anomalies based on machine learning). While these are a few examples, the scope is quite broad, especially with the growing adoption of IoT devices and 5G wireless technologies.

Distributed CEP

CEP engines usually allow users to define events, apply one or more rules, and describe actions. For instance,

  • One application may be interested in finding events of type A followed by type B. 
  • Another application may want to see event A without event B in the sequence of incoming events. 
  • A third application may have a relaxed requirement allowing other events between event A and event B.

This logic described above may seem simple at first glance, but the required computations are intensive when applied to “big data” containing millions of streaming events. CEP engines are usually much more compute-intensive than simple event processing engines and require the provisioning of a highly scalable application infrastructure. 

Low latency via in-memory processing

Modern distributed processing frameworks try to minimize high IO costs by keeping data in memory as much as possible. This computing approach allows them to reduce processing latencies essential for the real-time processing requirements of CEP engines.

High performance via parallel processing

Data distribution among worker nodes is a fundamental aspect of big data frameworks. By partitioning and distributing data among worker nodes, these frameworks can achieve high performance by applying processing logic on data in parallel. 

Horizontal scalability 

An essential aspect of processing a high volume of data is the ability to scale computing resources on-demand. Public cloud services such as Amazon Web Service (AWS) and open-source technologies like Kubernetes can instantly replicate and terminate processing nodes. Such an infrastructure capable of horizontal scaling should host CEP applications. 

Basic CEP patterns

Application developers can implement complex event processing logic by reusing standardized techniques (patterns) independently of their programming language of choice. The table below provides a subset of the patterns commonly used in real-time event processing applications:

Technique Description
Preprocessing Filtering, aggregating (multiple events); integrating (multiple event streams); abstracting (events into a common format); and transforming (event formats).
Event Correlation Establishing a correlation to conclude an outcome or detect an error.
Historical Data Storing events to include in a temporal analysis.
Detecting Event Sequences Maintaining a state based on the sequence of events.
Modeling Event Hierarchies Accounting for downstream and upstream impacts of events.
Detecting Relationships Uncovering correlation, membership, and causality.
Detecting Trends Calculating increases or decreases over time.
Machine Learning Applying a learning algorithm to historical and real-time events.
Alerts Alerting when a predefined condition is detected.

The patterns described in this table represent a subset of the functions available on modern frameworks designed for complex event processing. In the next section, we introduce the leading platforms in this emerging software category.

Distributed CEP frameworks

Apache Flink 

Apache Flink is a popular stream processing open source framework fundamentally conceived with streaming (versus batch processing) in mind. Flink has a sophisticated built-in CEP engine with a comprehensive API for pattern recognition and other CEP functions. 

The Flink API allows for the definition of a quantifier, condition-based singleton, and looping patterns. It also has capabilities for combined and grouped patterns such as “next”, “followedBy”, “followedByAny”, “notNext”, and “withIn”.

Apache Spark

While Apache Spark is one of the first general-purpose in-memory “big data” open source frameworks, it lacks built-in CEP capabilities. Some third-party efforts have focused on extending the capabilities of the Spark Streaming engine for CEP, but these functionalities are limited.

Macrometa

Macrometa is a noSQL database, pub/sub, event processing, and serverless edge computing platform for building geo-distributed and real-time applications. 


Learn more about Macrometa’s technology.