Apache Kafka and RabbitMQ are two popular open-source and commercially-supported pub/sub systems that have been around for almost a decade and have seen wide adoption. Given the popularity of these two systems and the fact that both are branded as pub/sub systems, two frequently asked questions in the relevant online forums are: how do they compare against each other and which one to use?
RabbitMQ is a “traditional” message broker that implements variety of messaging protocols. It was one of the first open source message brokers to achieve a reasonable level of features, client libraries, dev tools and quality documentation. RabbitMQ was originally developed to implement AMQP, an open wire protocol for messaging with powerful routing features. While Java has messaging standards like JMS, it’s not helpful for non-Java applications that need distributed messaging which is severely limiting to any integration scenario, microservice or monolithic. With the advent of AMQP, cross-language flexibility became real for open source message brokers
Apache Kafka is developed in Scala and started out at LinkedIn as a way to connect different internal systems. Kafka is well adopted today within the Apache Software Foundation ecosystem of products and is particularly useful in event-driven architecture.
Architecture and Design
RabbitMQ is designed as a general purpose message broker, employing several variations of point to point, request/reply and pub-sub communication styles patterns. It uses a smart broker / dumb consumer model, focused on consistent delivery of messages to consumers that consume at a roughly similar pace as the broker keeps track of consumer state. It is mature, performs well when configured correctly, is well supported (client libraries Java, .NET, node.js, Ruby, PHP and many more languages)
Communication in RabbitMQ can be either synchronous or asynchronous as needed. Publishers send messages to exchanges, and consumers retrieve messages from queues. Decoupling producers from queues via exchanges ensures that producers aren’t burdened with hard coded routing decisions. It has better option if you need to route your messages in complex ways to your consumers.
Apache Kafka is a message bus optimized for high volume publish-subscribe messages and streams, meant to be durable, fast and scalable. Kafka provides a durable message store, similar to a log, run in a server cluster, that stores streams of records in categories called topics.
It can also be seen as a durable message broker where applications can process and re-process streamed data on disk. It has a very simple routing approach
Every message consists of a key, a value, and a timestamp. Nearly the opposite of RabbitMQ, Kafka employs a dumb broker and uses smart consumers to read its buffer.
As the diagram above shows, Kafka does require external services to run – in this case Apache Zookeeper, which is often regarded as non-trivial to understand, setup and operate.
Requirements and Use Cases
Apache Kafka includes the broker itself, which is actually the best known and the most popular part of it, and has been designed and prominently marketed towards stream processing scenarios. In addition to that, Apache Kafka has recently added Kafka Streams. The documentation does a good job of discussing popular use cases like Website Activity Tracking, Metrics, Log Aggregation, Stream Processing, Event Sourcing and Commit logs.
RabbitMQ is a general purpose messaging solution, often used to allow web servers to respond to requests quickly instead of being forced to perform resource-heavy procedures while the user waits for the result. It’s also good for distributing a message to multiple recipients for consumption or for balancing loads between workers under high load (20k+/sec). When your requirements extend beyond throughput, RabbitMQ has a lot to offer: features for reliable delivery, routing, federation, HA, security, management tools and other features.
Apache Kafka has made strides in this area, and while it only ships a Java client, there is a growing catalog of community open source clients, ecosystem projects, and well as an adapter SDK allowing you to build your own system integration. Much of the configuration is done via .properties files or programmatically.
Kafka shines here by design: 100k/sec performance is often a key driver for people choosing Apache Kafka. Of course, message per second rates are tricky to state and quantify since they depend on so much including your environment and hardware, the nature of your workload, which delivery guarantees are used (e.g. persistent is costly, mirroring even more so), etc.