The Guide To Event Stream Processing

August 9, 2021
5 min read

The guide to event stream processing

Real-time applications require real-time data analysis. Such applications are used for social media, stock market, fraud detection, fleet management, traffic reporting, and customer service use cases. Here are two examples to spur your imagination:

Autonomous vehicles generate a vast amount of data from cameras, radars, light detection and ranging (LIDAR) sensors (​​measure distances by pulsing lasers), and global positioning systems.  Self-driving cars must analyze this data in real-time to obtain a three-dimensional view of their surroundings and avoid obstacles while navigating to a destination. 

Supply chain and logistics management applications rely on barcode scanners and RFID (radio frequency identification) to determine the physical location of raw materials and finished goods traveling between suppliers, manufacturers, and consumers. They use this data to estimate delivery time, calculate inventory, and detect loss or theft.

The analytical engines supporting such real-time applications rely on two modern concepts covered in this guide: Stream processing and complex event processing (CEP). Some refer to the combination of the two as event stream processing. 

A couple of decades ago, the state-of-the-art tech in data processing was Online Analytical Processing (OLAP), a key enabler of business intelligence (BI). The acronym OLAP intentionally resembles OLTP or Online Transaction Processing, a functionality of relational databases supporting a data warehouse (a centralized repository of data integrated from various sources for analysis and reporting). Data feeding OLAP is Extracted, Transformed, and Loaded (ETL) and processed in overnight batches to form an OLAP cube, a multi-dimensional array of data optimized for rapid querying to generate BI reports. 

Modern real-time applications can’t afford a delayed processing time. The data must be consumed, formatted, and analyzed at the same speed it’s generated. Starting in the 1990s, computer scientists conceived paradigms to analyze data in parallel pipelines as it streams in real-time (a.k.a. stream processing). The complex processing of the streaming events relies on several techniques such as pattern detection, aggregation, relationship modeling, and correlation to support real-time decisions (a.k.a. complex event processing). 

This guide explains stream processing and complex event processing concepts and reviews the software technologies used for their practical implementation. We aim to help developers overcome the challenges of implementing global stateful applications that require low data processing latencies. 

Chapter 1: Stream Processing. Learn the challenges, techniques, best practices, and latest technologies behind the emerging stream processing paradigm.

Chapter 2: DynamoDB Streams. Understand the use cases for DynamoDB Streams and follow implementation instructions along with examples.

Chapter 3: Google PubSub. Receive instructions along with best practices for implementing Google PubSub and compare it to alternative technologies.

Chapter 4: Kafka Alternatives. Discover the alternative technologies to Kafka, along with a tabular comparison of their pros and cons.

Chapter 5: Apache Spark Vs Flink. Compare the data processing approaches of Apache Spark and Apache Flink.

Chapter 6: Complex Event Processing. Explore complex event processing (CEP) techniques, patterns, and leading frameworks.

Chapter 7: Apache Spark vs Hadoop. Compare Spark to Hadoop in terms of real-time processing, operations cost, scheduling, fault tolerance, security, and more.

Chapter 8: Apache Beam Tutorial. Learn by example about Beam pipeline branching, composite transforms and other programming model concepts.

Chapter 9: Hadoop Streaming. Understand the use case behind Hadoop Streaming and how it compares to other streaming technologies.

Chapter 10: Apache Storm. Learn the architecture, best practices, and limitations of Apache Storm.

Chapter 11: Spark Structured Streaming. Process data in motion with Spark using Spark Structured Streaming and Discretized Streams (DStream).

Chapter 12: Databricks vs Snowflake. Learn how Databrics and Snowflake are different from a developer’s perspective by comparing them in multiple dimensions.

Stay tuned. More chapters coming soon.

Subscribe to our LinkedIn Newsletter to receive more educational content
Subscribe now
Subscribe to our LinkedIn Newsletter to receive more educational content
Subscribe now