Kafka vs NATS JetStream

Kafka and NATS JetStream are both distributed messaging systems designed for high-throughput event streaming, but they approach the problem differently. Kafka prioritizes durable commit logs with strong ordering guarantees, while JetStream emphasizes lightweight operations with flexible delivery patterns. This article compares their architectures, delivery semantics, and operational characteristics to help you choose the right tool for your use case.

Core Concepts

Kafka

Kafka is a distributed commit log where messages are appended to partitioned topics. Each partition maintains strict ordering, and consumers track their position using offsets. Messages persist on disk with configurable retention, enabling replay from any point in time.

Key characteristics:

Topics and Partitions: Topics are split into partitions for parallelism. Each partition is an ordered, immutable sequence of messages
Consumer Groups: Multiple consumers in a group share partitions, with each partition assigned to exactly one consumer
Offset Management: Consumers own their offsets, allowing independent progress and replay
Replication: Partitions replicate across brokers with leader election for fault tolerance

NATS JetStream

JetStream is the persistence layer built on top of NATS core messaging. It adds durable streams that capture messages published to subjects, with flexible consumer options for processing.

Key characteristics:

Subjects and Streams: Streams subscribe to subject patterns and persist matching messages
Push and Pull Consumers: Consumers can receive messages via push (server-initiated) or pull (client-initiated) delivery
Ack Policies: Configurable acknowledgment requirements (none, all, explicit) control message redelivery
Retention Modes: Streams can retain messages by limits (size/count/age), interest (while consumers exist), or work-queue (until acknowledged)

Architecture Comparison

Kafka (KRaft Mode)

Since Kafka 3.3, the KRaft consensus protocol replaces ZooKeeper for metadata management. The architecture consists of:

Brokers: Store partition data and serve client requests. Each broker handles reads and writes for partitions where it is the leader.

Controllers: Manage cluster metadata, partition assignments, and leader elections. In KRaft mode, a subset of brokers run the controller quorum using Raft consensus.

Topics and Partitions: A topic is a logical grouping of partitions. Each partition is a log file on disk, replicated across multiple brokers based on the replication factor.

Producer
    |
    v
+-------------------------------------------+
|              Kafka Cluster                |
|                                           |
|  +--------+  +--------+  +--------+       |
|  |Broker 1|  |Broker 2|  |Broker 3|       |
|  |        |  |        |  |        |       |
|  | P0(L)  |  | P0(F)  |  | P1(L)  |       |
|  | P1(F)  |  | P1(F)  |  | P0(F)  |       |
|  +--------+  +--------+  +--------+       |
|                                           |
|  Controller Quorum (KRaft)                |
+-------------------------------------------+
    |
    v
Consumer Group

NATS JetStream

JetStream runs as an embedded layer within NATS servers. Streams and consumers are first-class resources managed by the server.

Streams: Define which subjects to capture, storage backend (file or memory), and retention policy. Streams can be replicated across multiple servers.

Consumers: Named entities that track delivery state for a stream. Each consumer maintains its own sequence position and pending acknowledgments.

Clustering: NATS servers form clusters using NATS-native clustering. JetStream uses Raft consensus for stream and consumer metadata, with data replication configurable per stream.

Publisher
    |
    v
+-------------------------------------------+
|              NATS Cluster                 |
|                                           |
|  +--------+  +--------+  +--------+       |
|  |Server 1|  |Server 2|  |Server 3|       |
|  |        |  |        |  |        |       |
|  | Stream |  | Stream |  | Stream |       |
|  | (R=3)  |  | (R=3)  |  | (R=3)  |       |
|  +--------+  +--------+  +--------+       |
|                                           |
|  JetStream Raft Groups                    |
+-------------------------------------------+
    |
    v
Consumer (Push or Pull)

Delivery Semantics

Message Ordering

Kafka: Guarantees strict ordering within a partition. Messages sent to the same partition key are processed in order. Cross-partition ordering is not guaranteed.

JetStream: Guarantees ordering within a stream. Push consumers with max_ack_pending > 1 may see out-of-order processing if earlier messages are redelivered while later ones are being processed.

At-Least-Once Delivery

Both systems provide at-least-once delivery by default:

Kafka:

Producer sends message with acks=all
Broker writes to all in-sync replicas
Consumer processes message and commits offset
If consumer crashes before commit, message is redelivered

JetStream:

Publisher sends message (optionally with acknowledgment)
Stream replicates message based on R factor
Consumer receives and acknowledges message
If acknowledgment times out (ack_wait), message is redelivered

Exactly-Once Semantics

Kafka: Supports exactly-once semantics (EOS) through idempotent producers and transactional messaging. Producers can write to multiple partitions atomically, and consumers can read committed messages only.

// Kafka idempotent producer
config := sarama.NewConfig()
config.Producer.Idempotent = true
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Return.Successes = true
config.Producer.Return.Errors = true
config.Net.MaxOpenRequests = 1

producer, _ := sarama.NewAsyncProducer(brokers, config)

JetStream: Supports message deduplication using the Nats-Msg-Id header. Publishers can set a unique ID, and the stream will reject duplicates within the deduplication window.

// JetStream with deduplication
js, _ := nc.JetStream()

_, err := js.Publish("orders.new", data, nats.MsgId("order-12345"))

Durability and Retention

Kafka Retention

Kafka offers two cleanup policies:

Delete: Remove segments older than retention.ms or when log size exceeds retention.bytes.

Compact: Keep only the latest value for each key. Useful for changelog topics where you need the current state.

# Time-based retention (7 days)
kafka-topics.sh --create --topic events \
  --config retention.ms=604800000 \
  --config cleanup.policy=delete

# Compacted topic for state
kafka-topics.sh --create --topic user-profiles \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.5

JetStream Retention

JetStream supports three retention modes:

Limits: Discard old messages when stream reaches max_msgs, max_bytes, or max_age. Default mode for most use cases.

Interest: Keep messages only while at least one consumer exists. Messages are removed when all consumers have acknowledged them.

WorkQueue: Each message is delivered to exactly one consumer. Once acknowledged, the message is removed from the stream.

# Limits-based retention
nats stream add EVENTS \
  --subjects "events.>" \
  --retention limits \
  --max-bytes 10GB \
  --max-age 7d

# Work queue for job processing
nats stream add JOBS \
  --subjects "jobs.pending" \
  --retention work \
  --max-msgs 100000

Performance Characteristics

Throughput

Kafka optimizes for sustained high throughput:

Batch writes reduce disk I/O overhead
Zero-copy reads from page cache to network socket
Compression (lz4, zstd, snappy) reduces network and storage
Sequential disk access patterns maximize HDD/SSD performance

Typical throughput: 100MB/s to 1GB/s per broker depending on hardware and configuration.

JetStream optimizes for lower latency with moderate throughput:

Smaller default batch sizes
Synchronous acknowledgments by default
Memory storage option for highest performance
File storage with configurable sync policies

Typical throughput: 50MB/s to 200MB/s per server depending on storage backend and replication.

Latency

Kafka: Default configurations prioritize throughput over latency. Low-latency configurations:

config := sarama.NewConfig()
config.Producer.Flush.Frequency = 0           // Disable time-based batching
config.Producer.Flush.Messages = 1            // Send immediately
config.Producer.RequiredAcks = sarama.WaitForLocal // Reduced durability

With tuning, Kafka achieves 5-10ms p99 latency.

JetStream: Designed for low latency out of the box. Default configurations achieve sub-millisecond latency within a datacenter.

js, _ := nc.JetStream()

// Synchronous publish with confirmation
ack, _ := js.Publish("events.click", data)
fmt.Printf("Sequence: %d\n", ack.Sequence)

With replication (R=3), expect 1-5ms p99 latency.

Scalability

Kafka:

Horizontal scaling by adding brokers
Partition count determines parallelism
Consumer group rebalancing redistributes partitions
Topic with 100+ partitions common for high-throughput workloads

JetStream:

Horizontal scaling by adding servers to cluster
Subject-based routing allows many logical channels
Consumer scale-out via multiple pull subscribers
Superclusters for multi-region deployments

Go Client Examples

Kafka Producer and Consumer

package main

import (
    "log"

    "github.com/IBM/sarama"
)

func main() {
    brokers := []string{"kafka-1:9092", "kafka-2:9092", "kafka-3:9092"}

    config := sarama.NewConfig()
    config.Producer.RequiredAcks = sarama.WaitForAll
    config.Producer.Retry.Max = 3
    config.Producer.Return.Successes = true
    config.Producer.Return.Errors = true

    producer, err := sarama.NewAsyncProducer(brokers, config)
    if err != nil {
        log.Fatalf("Failed to create producer: %v", err)
    }
    defer producer.Close()

    // Handle successes and errors
    go func() {
        for msg := range producer.Successes() {
            log.Printf("Message sent to partition %d at offset %d", msg.Partition, msg.Offset)
        }
    }()

    go func() {
        for err := range producer.Errors() {
            log.Printf("Failed to send message: %v", err)
        }
    }()

    // Send message
    msg := &sarama.ProducerMessage{
        Topic: "events",
        Key:   sarama.StringEncoder("user-123"),
        Value: sarama.StringEncoder(`{"event":"click","page":"/products"}`),
    }

    producer.Input() <- msg
}

package main

import (
    "context"
    "log"

    "github.com/IBM/sarama"
)

type EventHandler struct{}

func (h *EventHandler) Setup(_ sarama.ConsumerGroupSession) error   { return nil }
func (h *EventHandler) Cleanup(_ sarama.ConsumerGroupSession) error { return nil }

func (h *EventHandler) ConsumeClaim(session sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error {
    for msg := range claim.Messages() {
        log.Printf("Received: topic=%s partition=%d offset=%d key=%s value=%s",
            msg.Topic, msg.Partition, msg.Offset, string(msg.Key), string(msg.Value))

        session.MarkMessage(msg, "")
    }
    return nil
}

func main() {
    brokers := []string{"kafka-1:9092", "kafka-2:9092", "kafka-3:9092"}

    config := sarama.NewConfig()
    config.Consumer.Group.Rebalance.Strategy = sarama.NewBalanceStrategyRoundRobin()
    config.Consumer.Offsets.Initial = sarama.OffsetOldest

    group, err := sarama.NewConsumerGroup(brokers, "event-processor", config)
    if err != nil {
        log.Fatalf("Failed to create consumer group: %v", err)
    }
    defer group.Close()

    handler := &EventHandler{}
    ctx := context.Background()

    for {
        if err := group.Consume(ctx, []string{"events"}, handler); err != nil {
            log.Printf("Consumer error: %v", err)
        }
    }
}

JetStream Publisher and Consumer

package main

import (
    "log"

    "github.com/nats-io/nats.go"
)

func main() {
    nc, err := nats.Connect("nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222")
    if err != nil {
        log.Fatalf("Failed to connect: %v", err)
    }
    defer nc.Close()

    js, err := nc.JetStream()
    if err != nil {
        log.Fatalf("Failed to get JetStream context: %v", err)
    }

    // Create stream if not exists
    _, err = js.AddStream(&nats.StreamConfig{
        Name:      "EVENTS",
        Subjects:  []string{"events.>"},
        Storage:   nats.FileStorage,
        Replicas:  3,
        Retention: nats.LimitsPolicy,
        MaxAge:    7 * 24 * 60 * 60 * 1e9, // 7 days in nanoseconds
    })
    if err != nil && err != nats.ErrStreamNameAlreadyInUse {
        log.Fatalf("Failed to create stream: %v", err)
    }

    // Publish with acknowledgment
    ack, err := js.Publish("events.click", []byte(`{"user":"123","page":"/products"}`))
    if err != nil {
        log.Fatalf("Failed to publish: %v", err)
    }

    log.Printf("Published to stream %s, sequence %d", ack.Stream, ack.Sequence)
}

package main

import (
    "log"
    "time"

    "github.com/nats-io/nats.go"
)

func main() {
    nc, err := nats.Connect("nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222")
    if err != nil {
        log.Fatalf("Failed to connect: %v", err)
    }
    defer nc.Close()

    js, err := nc.JetStream()
    if err != nil {
        log.Fatalf("Failed to get JetStream context: %v", err)
    }

    // Create durable pull consumer
    _, err = js.AddConsumer("EVENTS", &nats.ConsumerConfig{
        Durable:       "event-processor",
        AckPolicy:     nats.AckExplicitPolicy,
        AckWait:       30 * time.Second,
        MaxAckPending: 1000,
        DeliverPolicy: nats.DeliverAllPolicy,
    })
    if err != nil && err != nats.ErrConsumerNameAlreadyInUse {
        log.Fatalf("Failed to create consumer: %v", err)
    }

    // Subscribe and process
    sub, err := js.PullSubscribe("events.>", "event-processor")
    if err != nil {
        log.Fatalf("Failed to subscribe: %v", err)
    }

    for {
        msgs, err := sub.Fetch(10, nats.MaxWait(5*time.Second))
        if err != nil {
            if err == nats.ErrTimeout {
                continue
            }
            log.Printf("Fetch error: %v", err)
            continue
        }

        for _, msg := range msgs {
            log.Printf("Received: subject=%s data=%s", msg.Subject, string(msg.Data))

            if err := msg.Ack(); err != nil {
                log.Printf("Ack error: %v", err)
            }
        }
    }
}

Operational Considerations

Kafka Operations

Cluster Management:

Monitor UnderReplicatedPartitions for replication health
Track consumer lag via kafka-consumer-groups.sh or metrics
Plan capacity based on partition count and retention size
Use rack awareness for fault tolerance

Common Tasks:

# Check consumer group lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group event-processor

# Reassign partitions after adding broker
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
  --reassignment-json-file reassign.json --execute

# Increase partition count (cannot decrease)
kafka-topics.sh --bootstrap-server localhost:9092 \
  --alter --topic events --partitions 12

Monitoring Metrics:

kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*,topic=*,partition=*

JetStream Operations

Cluster Management:

Monitor stream and consumer health via NATS CLI or API
Watch for consumer redelivery rates indicating processing issues
Plan storage based on stream retention settings
Use placement tags for multi-region deployments

Common Tasks:

# View stream info
nats stream info EVENTS

# Check consumer status
nats consumer info EVENTS event-processor

# View pending messages
nats consumer next EVENTS event-processor --count 1

# Purge stream (delete all messages)
nats stream purge EVENTS

Monitoring Metrics:

jetstream_server_total_streams
jetstream_server_total_consumers
jetstream_consumer_num_pending
jetstream_consumer_num_redelivered

Security

Both systems support TLS encryption and authentication:

Kafka:

SASL authentication (PLAIN, SCRAM, GSSAPI, OAUTHBEARER)
ACLs by principal, resource type (topic, group, cluster), and operation
SSL/TLS for encryption in transit

JetStream:

User credentials or NKey authentication
Account-based multi-tenancy with resource limits
Subject-level permissions for publish and subscribe
TLS for encryption in transit

When to Choose Each

Choose Kafka When

Long-term retention: You need to replay events from days, weeks, or months ago
High throughput: Sustained multi-GB/s ingestion is required
Stream processing: Using Kafka Streams, Flink, or similar frameworks
Exactly-once semantics: Transactional processing across topics is required
Ecosystem integration: Connecting to Kafka Connect, Schema Registry, or existing Kafka infrastructure

Choose JetStream When

Low latency: Sub-millisecond message delivery is critical
Simple operations: Smaller team without dedicated infrastructure expertise
Many subjects: Thousands of logical channels with varying traffic patterns
Mixed delivery: Need both push and pull consumption patterns
Edge deployments: Lightweight footprint for edge or embedded systems
Request-reply: Combining messaging with RPC-style communication

Hybrid Architectures

In practice, many organizations use both systems:

JetStream for real-time event fan-out and edge aggregation
Kafka as the central data lake for analytics and long-term storage
Bridge services to replicate between systems

Edge Devices → NATS JetStream (Edge) → Bridge → Kafka (Central) → Analytics
                     ↓
              Local Processing

Summary

Aspect	Kafka	NATS JetStream
Core Model	Distributed commit log	Subject-based streams
Ordering	Per partition	Per stream
Latency	5-10ms (tuned)	Sub-millisecond
Throughput	100MB/s - 1GB/s per broker	50-200MB/s per server
Retention	Time/size-based, compaction	Limits, interest, work-queue
Consumers	Pull with offset tracking	Push or pull with ack tracking
Exactly-once	Transactions + idempotent producer	Message deduplication
Operations	More complex, JVM-based	Simpler, single binary

Both systems are production-ready and well-documented. Kafka excels at high-throughput pipelines with long retention and exactly-once processing. JetStream excels at low-latency messaging with simpler operations and flexible delivery patterns. Choose based on your latency requirements, throughput needs, operational capacity, and existing infrastructure.