Apache Kafka is an open source software messaging bus that uses stream processing. Because it’s a distributed platform known for its scalability, resilience, and performance, Kafka has become very popular with large enterprises. In fact, 80% of the Fortune 500 use Kafka.
However, there’s no such thing as a one-size-fits-all solution. What’s best for Uber or PayPal may not be ideal for your application. Fortunately, there are several alternative messaging platforms available.
Knowing which platform is right for you requires understanding the pros and cons of each. To help you make the right choice, we’ll take an in-depth look at Apache Kafka and four popular Kafka alternatives.
Summary of Kafka alternatives
Four of the most popular Kafka alternatives are:
- Google pub/sub
Before taking a detailed look at Kafka and those four alternatives, let’s start with a high-level overview of features.
Understanding stream processing and other messaging types
Understanding why performant messaging is essential requires taking a step back and understanding how modern systems scale. Today, applications are generating data at a much higher rate than ever before. As a result, systems need to be scaled to meet the data processing requirements. There are two basic approaches to achieve the required scaling:
- Scaling up - adding more CPU and RAM to individual servers
- Scaling out - adding more nodes to server clusters (groups of servers)
Scaling up is efficient from a performance perspective but very expensive. On the other hand, scaling out is cheap since it leverages commodity hardware, but it introduces communication challenges in distributed systems.
In recent years, scaling out has become a popular choice for modern applications. To accommodate scaling out, different machines in a cluster need to coordinate control and data messages efficiently. Because of this, messaging systems are the backbone of any of the distributed systems.
Synchronous vs. asynchronous messaging
There are two ways systems can communicate with each other: synchronously or asynchronously. Synchronous messaging is when two systems are communicating with each other directly at the same time. Asynchronous messaging is when decoupled systems use an intermediary layer for communication. Because it enables loose coupling, asynchronous messaging is more scalable than synchronous messaging.
Types of messaging
Traditionally, some data structures queues such as MSMQ were used for message exchange between distributed systems.
More recently, pub/sub (short for publish and subscribe) has become the more popular approach to messaging because it allows multiple users to create subscriptions for the same data simultaneously. In this model, the producer sends the messages without the knowledge of the consumer. Producers do this by “publishing” a message to a topic. Similarly, consumers need to “subscribe” to topics with messages they want to receive.
With pub/sub, multiple consumers can receive and process messages processed independently. Pub/sub provides loose coupling between sender and receiver of the system and enables the ability to add/remove the user on the fly, making it a more scalable approach. It also supports a durable subscription, which means if a consumer dies, the messages persist, and the subscription resumes where the consumer left.
Message queues, which connect different systems like streaming data between microservices, work differently. Multiple senders (producers) can send a message to the queue. However, a message can only be consumed by a single consumer. Some message queue implementations provide a mechanism for a consumer to acknowledge if a message is successfully processed. Message queues are easily scalable because systems can add more producers and consumers independently.
To summarize, the basic concepts in the messaging systems we’ve reviewed are:
- A message (or event) is a dataset shared across systems. It can be a control message containing coordination information about the systems or an informational message with data about systems.
- A message queue is a queue data structure that holds different messages for communication between disparate systems.
- A topic is a way to categorize the messages to be differentiated for different producers and consumers.
An overview of Kafka and Kafka alternatives
Apache Kafka is a well-known open source platform for data ingestion and processing in real-time. More than just a message broker, Kafka is a distributed streaming platform. Kafka’s three main features are:
- Storing stream of data in the orderly fashion
- Processing the data in real-time
- Publish and subscribe to different data sources
Kafka is written in Scala and combines both queueing and pub/sub messaging patterns. Queueing enables higher scalability since it allows multiple consumers to read the same data and ensures that they receive a message exactly once. Kafka performs better than traditional queues, which don’t support multiple consumers.
On the other hand, the pub/sub model sends each message to each subscriber, which isn’t the same as distributing the work. Hence, Kafka uses partitioned transaction logs to overcome that challenge and remain scalable. Each log (topic) is a set of ordered messages broken into partitions, and each consumer subscribes to a partition.
Kafka uses partitioned transaction logs at the storage layer for streaming messages. This approach enables Kafka to handle trillions of events per day. The default storage configuration is seven days, but it can scale up to the full disk size.
To manage the offset, Kafka needs ZooKeeper. Unfortunately, setting up Kafka is complex. Set up requires two components: Kafka brokers and ZooKeeper nodes. Additionally, on-prem infrastructure requires domain expertise and significant operational efforts. While it is possible to use a managed Kafka service, it can be very expensive.
The ZooKeeper requirement is the biggest bottleneck to Kafka’s scalability. Fortunately, in the latest Kafka version, the ZooKeeper dependency will be removed.
Google Pub/Sub is a service for messaging that leverages the pub/sub messaging pattern. Setting up an instance to run an application using Google Pub/Sub is easy because it’s a fully-managed cloud service. That means there can be less complexity with Google Pub/Sub than with Kafka (which requires machines, brokers, and ZooKeeper configuration).
With Google Pub/Sub, topics differentiate messages. Consumers use subscriptions to receive message notifications. Once they receive a message for processing, they send back an acknowledgment, as shown in the diagram below.
Google Pub/Sub supports “at least once” delivery, and it doesn’t offer any order guarantees. On the other hand, Kafka provides ordering guarantees per partition.
Google Pub/Sub has durable storage and real-time message delivery. Users can configure the retention policy, but the max is seven days. It’s cheap to use for smaller projects since the first 10GB is free.
Google handles pub/sub operations, and other Google Cloud services can use Pub/Sub APIs for integration. Expansion into new regions is straightforward since Google already has data centers across the globe. Comparatively, Kafka requires a lot of operational effort to scale across regions. More importantly, cross data center replication happens using the Google network, which provides robust performance.
Google Pub/Sub has good performance, and it scales quickly. However, the more we scale, the more expensive it gets.
Google Pub/Sub offers a lite version that can be less expensive with lower availability and durability. However, users need to manually manage resources because it doesn’t scale automatically, and storage also needs to be provisioned manually.
Overall, if you’re already using GCP for other services, Google Pub/Sub is a much easier integration than Kafka.
RabbitMQ is the most commonly used multi-purpose messaging tool. It’s a “distributed message broker” and supports background tasks. It is written in Erlang and has commercial support available. It uses both message queueing and pub/sub.
RabbitMQ is recommended for communication or integration among long-running tasks or background jobs compared to Kafka, primarily used to stream, store, and re-read the data.
RabbitMQ uses the message exchange concept where the publisher sends the messages to the exchange, and each consumer creates a queue out of the exchange. It lets users define routing rules and filter the messages based on their specific needs. Kafka lacks this ability and doesn’t have a mechanism to filter the messages. With Kafka, a subscriber will receive all the messages published on a particular topic.
RabbitMQ is designed for vertical scaling. It will impact performance with horizontal scaling due to the coordination among the nodes. On the other hand, Kafka is designed for horizontal scaling.
RabbitMQ provides message ordering guarantees only for the message published on one channel, passing through one exchange and one queue. It supports retries on messages that aren’t acknowledged but acknowledge messages are removed once they are consumed. This aspect makes it less resilient compared to Kafka (which uses the available disk space to retain old messages) when it comes to recovering from an outage.
RabbitMQ is a mature platform that has been on the market since 2007. As a result, there is plenty of documentation and a large user base. You can find a lot of case studies and best practices online to help optimize performance.
Apache Pulsar, an open source distributed messaging system, is a recent addition to the available messaging technology choices. Pulsar started as a queuing system but evolved to support event streaming. It leverages the approaches used by several other messaging systems in a single platform.
Apache Pulsar uses a tiered architecture, with Apache BookKeeper providing storage. Adding dedicated BookKeeper storage is easy. Pulsar has a stateless broker that can connect to multiple Bookkeepers. Since the broker is stateless, it can scale up and down based on requirements. This loosely coupled architecture makes Pulsar highly scalable. However, repartitioning and replication are required once a broker is added to a cluster, and those tasks take time.
Apache Pulsar allows storage to scale without limit, but that can be expensive. It uses tiered storage where older messages are offloaded from Bookkeeper to cheaper storage, e.g., S3 (Amazon Simple Storage Service), GCS (Google Cloud Storage), or a similar file system. This architecture allows unlimited cost-effective storage scaling.
Pulsar provides some features Kafka lacks, such as tiered storage and geo-replication. However, most of the features in Pulsar are also supported by Kafka. On the other hand, Kafka has quite a few features that Pulsar lacks. These include long-term storage, reduced infrastructure requirements (number of servers), and single save to disk for data.
Apache Pulsar provides support for both message queueing and event streaming in a single solution. However, the feature set is limited compared to what Kafka provides such as exactly-once delivery, fault-tolerant state management, event-based processing message XA transactions, or message filtering.
Setting up Apache Pulsar is complex, even compared to Kafka. It requires setting up four different components: brokers, Apache Bookkeeper, RocksDB, and Apache ZooKeeper. That means there are two additional components Kafka doesn’t need. As a result, Pulsar requires more work to set up, debug, and maintain.
Apache Pulsar has potential, but it will take some time to mature and capture significant market share.
Macrometa GDN (Global Data Network) enables the building of real-time applications and APIs instantly across the globe without the operational hassle of infrastructure management. It also supports messaging queue and pub/sub via streams to build stateful low latency applications and data pipelines. In addition, it supports stateful event processing.
Macrometa streams are straightforward to set up compared to Kafka. With a few clicks, you can have your applications running in different regions of the world and close to your clients. It doesn’t require extra operation efforts to set up the replications. Streams support both message queues and pub/sub messaging patterns.
Macrometa streams have persistent storage that retains the messages as long as the consumer does not acknowledge them for three days. Once processed, they are removed unless configured for retention. Streams also support time to live (TTL) for messages that haven’t been acknowledged. Additionally, it does load-balancing automatically across consumers.
Macrometa streams support both synchronous and asynchronous modes for both consumer and producer as well. As shown in the diagram below, pub/sub does support three different subscription modes - exclusive, shared, and failover.
The exclusive mode supports only a single consumer, while the shared mode allows multiple consumers in a round-robin fashion. Finally, failover mode allows multiple consumers with a master consumer and failover consumers who will receive messages only if the master consumer disconnects.
Compared to Kafka, Macrometa is a relatively newer technology with a more limited user base, but documentation for developers is robust.
Apache Kafka is one of the most widely used messaging systems, but it is far from your only option. Apache Pulsar provides similar capabilities with its tiered architecture and provides enhanced scalability. RabbitMQ is more of a traditional messaging system for communication instead of storing the messages. Google Pub/Sub provides a pub/sub messaging pattern with almost no effort to set up but best when used with other Google services. Finally, Macrometa offers similar messaging features integrated into an event stream processing platform with built-in geo-replication.