The guide to event stream processing
Real-time applications require real-time data analysis. Such applications are used for social media, stock market, fraud detection, fleet management, traffic reporting, and customer service use cases. Here are two examples to spur your imagination:
Autonomous vehicles generate a vast amount of data from cameras, radars, light detection and ranging (LIDAR) sensors (measure distances by pulsing lasers), and global positioning systems. Self-driving cars must analyze this data in real-time to obtain a three-dimensional view of their surroundings and avoid obstacles while navigating to a destination.
Supply chain and logistics management applications rely on barcode scanners and RFID (radio frequency identification) to determine the physical location of raw materials and finished goods traveling between suppliers, manufacturers, and consumers. They use this data to estimate delivery time, calculate inventory, and detect loss or theft.
The analytical engines supporting such real-time applications rely on two modern concepts covered in this guide: Stream processing and complex event processing (CEP). Some refer to the combination of the two as event stream processing.
A couple of decades ago, the state-of-the-art tech in data processing was Online Analytical Processing (OLAP), a key enabler of business intelligence (BI). The acronym OLAP intentionally resembles OLTP or Online Transaction Processing, a functionality of relational databases supporting a data warehouse (a centralized repository of data integrated from various sources for analysis and reporting). Data feeding OLAP is Extracted, Transformed, and Loaded (ETL) and processed in overnight batches to form an OLAP cube, a multi-dimensional array of data optimized for rapid querying to generate BI reports.
Modern real-time applications can’t afford a delayed processing time. The data must be consumed, formatted, and analyzed at the same speed it’s generated. Starting in the 1990s, computer scientists conceived paradigms to analyze data in parallel pipelines as it streams in real-time (a.k.a. stream processing). The complex processing of the streaming events relies on several techniques such as pattern detection, aggregation, relationship modeling, and correlation to support real-time decisions (a.k.a. complex event processing).
This guide explains stream processing and complex event processing concepts and reviews the software technologies used for their practical implementation. We aim to help developers overcome the challenges of implementing global stateful applications that require low data processing latencies.
Learn the challenges, techniques, best practices, and latest technologies behind the emerging stream processing paradigm.
Understand the use cases for DynamoDB Streams and follow implementation instructions along with examples.
Receive instructions along with best practices for implementing Google PubSub and compare it to alternative technologies.
Discover the alternative technologies to Kafka, along with a tabular comparison of their pros and cons.
Compare the data processing approaches of Apache Spark and Apache Flink.
Explore complex event processing (CEP) techniques, patterns, and leading frameworks.
Compare Spark to Hadoop in terms of real-time processing, operations cost, scheduling, fault tolerance, security, and more.
Learn by example about Beam pipeline branching, composite transforms and other programming model concepts.
Understand the use case behind Hadoop Streaming and how it compares to other streaming technologies.
Learn the architecture, best practices, and limitations of Apache Storm.
Process data in motion with Spark using Spark Structured Streaming and Discretized Streams (DStream).
Learn how Databrics and Snowflake are different from a developer’s perspective by comparing them in multiple dimensions.
Stay tuned. More chapters coming soon.