Kafka vs NATS JetStream
Kafka and NATS JetStream are both distributed messaging systems designed for high-throughput event streaming, but they approach the problem differently. Kafka prioritizes durable commit logs with strong ordering guarantees, while JetStream emphasizes lightweight operations with flexible delivery patterns. This article compares their architectures, delivery semantics, and operational characteristics to help you choose the right tool for your use case.
Core Concepts
Kafka
Kafka is a distributed commit log where messages are appended to partitioned topics. Each partition maintains strict ordering, and consumers track their position using offsets. Messages persist on disk with configurable retention, enabling replay from any point in time.
Key characteristics:
- Topics and Partitions: Topics are split into partitions for parallelism. Each partition is an ordered, immutable sequence of messages
- Consumer Groups: Multiple consumers in a group share partitions, with each partition assigned to exactly one consumer
- Offset Management: Consumers own their offsets, allowing independent progress and replay
- Replication: Partitions replicate across brokers with leader election for fault tolerance
NATS JetStream
JetStream is the persistence layer built on top of NATS core messaging. It adds durable streams that capture messages published to subjects, with flexible consumer options for processing.
Key characteristics:
- Subjects and Streams: Streams subscribe to subject patterns and persist matching messages
- Push and Pull Consumers: Consumers can receive messages via push (server-initiated) or pull (client-initiated) delivery
- Ack Policies: Configurable acknowledgment requirements (none, all, explicit) control message redelivery
- Retention Modes: Streams can retain messages by limits (size/count/age), interest (while consumers exist), or work-queue (until acknowledged)
Architecture Comparison
Kafka (KRaft Mode)
Since Kafka 3.3, the KRaft consensus protocol replaces ZooKeeper for metadata management. The architecture consists of:
Brokers: Store partition data and serve client requests. Each broker handles reads and writes for partitions where it is the leader.
Controllers: Manage cluster metadata, partition assignments, and leader elections. In KRaft mode, a subset of brokers run the controller quorum using Raft consensus.
Topics and Partitions: A topic is a logical grouping of partitions. Each partition is a log file on disk, replicated across multiple brokers based on the replication factor.
Producer
|
v
+-------------------------------------------+
| Kafka Cluster |
| |
| +--------+ +--------+ +--------+ |
| |Broker 1| |Broker 2| |Broker 3| |
| | | | | | | |
| | P0(L) | | P0(F) | | P1(L) | |
| | P1(F) | | P1(F) | | P0(F) | |
| +--------+ +--------+ +--------+ |
| |
| Controller Quorum (KRaft) |
+-------------------------------------------+
|
v
Consumer Group
NATS JetStream
JetStream runs as an embedded layer within NATS servers. Streams and consumers are first-class resources managed by the server.
Streams: Define which subjects to capture, storage backend (file or memory), and retention policy. Streams can be replicated across multiple servers.
Consumers: Named entities that track delivery state for a stream. Each consumer maintains its own sequence position and pending acknowledgments.
Clustering: NATS servers form clusters using NATS-native clustering. JetStream uses Raft consensus for stream and consumer metadata, with data replication configurable per stream.
Publisher
|
v
+-------------------------------------------+
| NATS Cluster |
| |
| +--------+ +--------+ +--------+ |
| |Server 1| |Server 2| |Server 3| |
| | | | | | | |
| | Stream | | Stream | | Stream | |
| | (R=3) | | (R=3) | | (R=3) | |
| +--------+ +--------+ +--------+ |
| |
| JetStream Raft Groups |
+-------------------------------------------+
|
v
Consumer (Push or Pull)
Delivery Semantics
Message Ordering
Kafka: Guarantees strict ordering within a partition. Messages sent to the same partition key are processed in order. Cross-partition ordering is not guaranteed.
JetStream: Guarantees ordering within a stream. Push consumers with max_ack_pending > 1 may see out-of-order processing if earlier messages are redelivered while later ones are being processed.
At-Least-Once Delivery
Both systems provide at-least-once delivery by default:
Kafka:
- Producer sends message with
acks=all - Broker writes to all in-sync replicas
- Consumer processes message and commits offset
- If consumer crashes before commit, message is redelivered
JetStream:
- Publisher sends message (optionally with acknowledgment)
- Stream replicates message based on
Rfactor - Consumer receives and acknowledges message
- If acknowledgment times out (
ack_wait), message is redelivered
Exactly-Once Semantics
Kafka: Supports exactly-once semantics (EOS) through idempotent producers and transactional messaging. Producers can write to multiple partitions atomically, and consumers can read committed messages only.
// Kafka idempotent producer
config := sarama.NewConfig()
config.Producer.Idempotent = true
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Return.Successes = true
config.Producer.Return.Errors = true
config.Net.MaxOpenRequests = 1
producer, _ := sarama.NewAsyncProducer(brokers, config)
JetStream: Supports message deduplication using the Nats-Msg-Id header. Publishers can set a unique ID, and the stream will reject duplicates within the deduplication window.
// JetStream with deduplication
js, _ := nc.JetStream()
_, err := js.Publish("orders.new", data, nats.MsgId("order-12345"))
Durability and Retention
Kafka Retention
Kafka offers two cleanup policies:
Delete: Remove segments older than retention.ms or when log size exceeds retention.bytes.
Compact: Keep only the latest value for each key. Useful for changelog topics where you need the current state.
# Time-based retention (7 days)
kafka-topics.sh --create --topic events \
--config retention.ms=604800000 \
--config cleanup.policy=delete
# Compacted topic for state
kafka-topics.sh --create --topic user-profiles \
--config cleanup.policy=compact \
--config min.cleanable.dirty.ratio=0.5
JetStream Retention
JetStream supports three retention modes:
Limits: Discard old messages when stream reaches max_msgs, max_bytes, or max_age. Default mode for most use cases.
Interest: Keep messages only while at least one consumer exists. Messages are removed when all consumers have acknowledged them.
WorkQueue: Each message is delivered to exactly one consumer. Once acknowledged, the message is removed from the stream.
# Limits-based retention
nats stream add EVENTS \
--subjects "events.>" \
--retention limits \
--max-bytes 10GB \
--max-age 7d
# Work queue for job processing
nats stream add JOBS \
--subjects "jobs.pending" \
--retention work \
--max-msgs 100000
Performance Characteristics
Throughput
Kafka optimizes for sustained high throughput:
- Batch writes reduce disk I/O overhead
- Zero-copy reads from page cache to network socket
- Compression (lz4, zstd, snappy) reduces network and storage
- Sequential disk access patterns maximize HDD/SSD performance
Typical throughput: 100MB/s to 1GB/s per broker depending on hardware and configuration.
JetStream optimizes for lower latency with moderate throughput:
- Smaller default batch sizes
- Synchronous acknowledgments by default
- Memory storage option for highest performance
- File storage with configurable sync policies
Typical throughput: 50MB/s to 200MB/s per server depending on storage backend and replication.
Latency
Kafka: Default configurations prioritize throughput over latency. Low-latency configurations:
config := sarama.NewConfig()
config.Producer.Flush.Frequency = 0 // Disable time-based batching
config.Producer.Flush.Messages = 1 // Send immediately
config.Producer.RequiredAcks = sarama.WaitForLocal // Reduced durability
With tuning, Kafka achieves 5-10ms p99 latency.
JetStream: Designed for low latency out of the box. Default configurations achieve sub-millisecond latency within a datacenter.
js, _ := nc.JetStream()
// Synchronous publish with confirmation
ack, _ := js.Publish("events.click", data)
fmt.Printf("Sequence: %d\n", ack.Sequence)
With replication (R=3), expect 1-5ms p99 latency.
Scalability
Kafka:
- Horizontal scaling by adding brokers
- Partition count determines parallelism
- Consumer group rebalancing redistributes partitions
- Topic with 100+ partitions common for high-throughput workloads
JetStream:
- Horizontal scaling by adding servers to cluster
- Subject-based routing allows many logical channels
- Consumer scale-out via multiple pull subscribers
- Superclusters for multi-region deployments
Go Client Examples
Kafka Producer and Consumer
package main
import (
"log"
"github.com/IBM/sarama"
)
func main() {
brokers := []string{"kafka-1:9092", "kafka-2:9092", "kafka-3:9092"}
config := sarama.NewConfig()
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Retry.Max = 3
config.Producer.Return.Successes = true
config.Producer.Return.Errors = true
producer, err := sarama.NewAsyncProducer(brokers, config)
if err != nil {
log.Fatalf("Failed to create producer: %v", err)
}
defer producer.Close()
// Handle successes and errors
go func() {
for msg := range producer.Successes() {
log.Printf("Message sent to partition %d at offset %d", msg.Partition, msg.Offset)
}
}()
go func() {
for err := range producer.Errors() {
log.Printf("Failed to send message: %v", err)
}
}()
// Send message
msg := &sarama.ProducerMessage{
Topic: "events",
Key: sarama.StringEncoder("user-123"),
Value: sarama.StringEncoder(`{"event":"click","page":"/products"}`),
}
producer.Input() <- msg
}
package main
import (
"context"
"log"
"github.com/IBM/sarama"
)
type EventHandler struct{}
func (h *EventHandler) Setup(_ sarama.ConsumerGroupSession) error { return nil }
func (h *EventHandler) Cleanup(_ sarama.ConsumerGroupSession) error { return nil }
func (h *EventHandler) ConsumeClaim(session sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error {
for msg := range claim.Messages() {
log.Printf("Received: topic=%s partition=%d offset=%d key=%s value=%s",
msg.Topic, msg.Partition, msg.Offset, string(msg.Key), string(msg.Value))
session.MarkMessage(msg, "")
}
return nil
}
func main() {
brokers := []string{"kafka-1:9092", "kafka-2:9092", "kafka-3:9092"}
config := sarama.NewConfig()
config.Consumer.Group.Rebalance.Strategy = sarama.NewBalanceStrategyRoundRobin()
config.Consumer.Offsets.Initial = sarama.OffsetOldest
group, err := sarama.NewConsumerGroup(brokers, "event-processor", config)
if err != nil {
log.Fatalf("Failed to create consumer group: %v", err)
}
defer group.Close()
handler := &EventHandler{}
ctx := context.Background()
for {
if err := group.Consume(ctx, []string{"events"}, handler); err != nil {
log.Printf("Consumer error: %v", err)
}
}
}
JetStream Publisher and Consumer
package main
import (
"log"
"github.com/nats-io/nats.go"
)
func main() {
nc, err := nats.Connect("nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222")
if err != nil {
log.Fatalf("Failed to connect: %v", err)
}
defer nc.Close()
js, err := nc.JetStream()
if err != nil {
log.Fatalf("Failed to get JetStream context: %v", err)
}
// Create stream if not exists
_, err = js.AddStream(&nats.StreamConfig{
Name: "EVENTS",
Subjects: []string{"events.>"},
Storage: nats.FileStorage,
Replicas: 3,
Retention: nats.LimitsPolicy,
MaxAge: 7 * 24 * 60 * 60 * 1e9, // 7 days in nanoseconds
})
if err != nil && err != nats.ErrStreamNameAlreadyInUse {
log.Fatalf("Failed to create stream: %v", err)
}
// Publish with acknowledgment
ack, err := js.Publish("events.click", []byte(`{"user":"123","page":"/products"}`))
if err != nil {
log.Fatalf("Failed to publish: %v", err)
}
log.Printf("Published to stream %s, sequence %d", ack.Stream, ack.Sequence)
}
package main
import (
"log"
"time"
"github.com/nats-io/nats.go"
)
func main() {
nc, err := nats.Connect("nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222")
if err != nil {
log.Fatalf("Failed to connect: %v", err)
}
defer nc.Close()
js, err := nc.JetStream()
if err != nil {
log.Fatalf("Failed to get JetStream context: %v", err)
}
// Create durable pull consumer
_, err = js.AddConsumer("EVENTS", &nats.ConsumerConfig{
Durable: "event-processor",
AckPolicy: nats.AckExplicitPolicy,
AckWait: 30 * time.Second,
MaxAckPending: 1000,
DeliverPolicy: nats.DeliverAllPolicy,
})
if err != nil && err != nats.ErrConsumerNameAlreadyInUse {
log.Fatalf("Failed to create consumer: %v", err)
}
// Subscribe and process
sub, err := js.PullSubscribe("events.>", "event-processor")
if err != nil {
log.Fatalf("Failed to subscribe: %v", err)
}
for {
msgs, err := sub.Fetch(10, nats.MaxWait(5*time.Second))
if err != nil {
if err == nats.ErrTimeout {
continue
}
log.Printf("Fetch error: %v", err)
continue
}
for _, msg := range msgs {
log.Printf("Received: subject=%s data=%s", msg.Subject, string(msg.Data))
if err := msg.Ack(); err != nil {
log.Printf("Ack error: %v", err)
}
}
}
}
Operational Considerations
Kafka Operations
Cluster Management:
- Monitor
UnderReplicatedPartitionsfor replication health - Track consumer lag via
kafka-consumer-groups.shor metrics - Plan capacity based on partition count and retention size
- Use rack awareness for fault tolerance
Common Tasks:
# Check consumer group lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
--describe --group event-processor
# Reassign partitions after adding broker
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
--reassignment-json-file reassign.json --execute
# Increase partition count (cannot decrease)
kafka-topics.sh --bootstrap-server localhost:9092 \
--alter --topic events --partitions 12
Monitoring Metrics:
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitionskafka.server:type=BrokerTopicMetrics,name=BytesInPerSeckafka.consumer:type=consumer-fetch-manager-metrics,client-id=*,topic=*,partition=*
JetStream Operations
Cluster Management:
- Monitor stream and consumer health via NATS CLI or API
- Watch for consumer redelivery rates indicating processing issues
- Plan storage based on stream retention settings
- Use placement tags for multi-region deployments
Common Tasks:
# View stream info
nats stream info EVENTS
# Check consumer status
nats consumer info EVENTS event-processor
# View pending messages
nats consumer next EVENTS event-processor --count 1
# Purge stream (delete all messages)
nats stream purge EVENTS
Monitoring Metrics:
jetstream_server_total_streamsjetstream_server_total_consumersjetstream_consumer_num_pendingjetstream_consumer_num_redelivered
Security
Both systems support TLS encryption and authentication:
Kafka:
- SASL authentication (PLAIN, SCRAM, GSSAPI, OAUTHBEARER)
- ACLs by principal, resource type (topic, group, cluster), and operation
- SSL/TLS for encryption in transit
JetStream:
- User credentials or NKey authentication
- Account-based multi-tenancy with resource limits
- Subject-level permissions for publish and subscribe
- TLS for encryption in transit
When to Choose Each
Choose Kafka When
- Long-term retention: You need to replay events from days, weeks, or months ago
- High throughput: Sustained multi-GB/s ingestion is required
- Stream processing: Using Kafka Streams, Flink, or similar frameworks
- Exactly-once semantics: Transactional processing across topics is required
- Ecosystem integration: Connecting to Kafka Connect, Schema Registry, or existing Kafka infrastructure
Choose JetStream When
- Low latency: Sub-millisecond message delivery is critical
- Simple operations: Smaller team without dedicated infrastructure expertise
- Many subjects: Thousands of logical channels with varying traffic patterns
- Mixed delivery: Need both push and pull consumption patterns
- Edge deployments: Lightweight footprint for edge or embedded systems
- Request-reply: Combining messaging with RPC-style communication
Hybrid Architectures
In practice, many organizations use both systems:
- JetStream for real-time event fan-out and edge aggregation
- Kafka as the central data lake for analytics and long-term storage
- Bridge services to replicate between systems
Edge Devices → NATS JetStream (Edge) → Bridge → Kafka (Central) → Analytics
↓
Local Processing
Summary
| Aspect | Kafka | NATS JetStream |
|---|---|---|
| Core Model | Distributed commit log | Subject-based streams |
| Ordering | Per partition | Per stream |
| Latency | 5-10ms (tuned) | Sub-millisecond |
| Throughput | 100MB/s - 1GB/s per broker | 50-200MB/s per server |
| Retention | Time/size-based, compaction | Limits, interest, work-queue |
| Consumers | Pull with offset tracking | Push or pull with ack tracking |
| Exactly-once | Transactions + idempotent producer | Message deduplication |
| Operations | More complex, JVM-based | Simpler, single binary |
Both systems are production-ready and well-documented. Kafka excels at high-throughput pipelines with long retention and exactly-once processing. JetStream excels at low-latency messaging with simpler operations and flexible delivery patterns. Choose based on your latency requirements, throughput needs, operational capacity, and existing infrastructure.