Skip to main content Skip to sidebar

Kafka vs NATS JetStream

Kafka and NATS JetStream are both distributed messaging systems designed for high-throughput event streaming, but they approach the problem differently. Kafka prioritizes durable commit logs with strong ordering guarantees, while JetStream emphasizes lightweight operations with flexible delivery patterns. This article compares their architectures, delivery semantics, and operational characteristics to help you choose the right tool for your use case.

Core Concepts

Kafka

Kafka is a distributed commit log where messages are appended to partitioned topics. Each partition maintains strict ordering, and consumers track their position using offsets. Messages persist on disk with configurable retention, enabling replay from any point in time.

Key characteristics:

  • Topics and Partitions: Topics are split into partitions for parallelism. Each partition is an ordered, immutable sequence of messages
  • Consumer Groups: Multiple consumers in a group share partitions, with each partition assigned to exactly one consumer
  • Offset Management: Consumers own their offsets, allowing independent progress and replay
  • Replication: Partitions replicate across brokers with leader election for fault tolerance

NATS JetStream

JetStream is the persistence layer built on top of NATS core messaging. It adds durable streams that capture messages published to subjects, with flexible consumer options for processing.

Key characteristics:

  • Subjects and Streams: Streams subscribe to subject patterns and persist matching messages
  • Push and Pull Consumers: Consumers can receive messages via push (server-initiated) or pull (client-initiated) delivery
  • Ack Policies: Configurable acknowledgment requirements (none, all, explicit) control message redelivery
  • Retention Modes: Streams can retain messages by limits (size/count/age), interest (while consumers exist), or work-queue (until acknowledged)

Architecture Comparison

Kafka (KRaft Mode)

Since Kafka 3.3, the KRaft consensus protocol replaces ZooKeeper for metadata management. The architecture consists of:

Brokers: Store partition data and serve client requests. Each broker handles reads and writes for partitions where it is the leader.

Controllers: Manage cluster metadata, partition assignments, and leader elections. In KRaft mode, a subset of brokers run the controller quorum using Raft consensus.

Topics and Partitions: A topic is a logical grouping of partitions. Each partition is a log file on disk, replicated across multiple brokers based on the replication factor.

Producer
    |
    v
+-------------------------------------------+
|              Kafka Cluster                |
|                                           |
|  +--------+  +--------+  +--------+       |
|  |Broker 1|  |Broker 2|  |Broker 3|       |
|  |        |  |        |  |        |       |
|  | P0(L)  |  | P0(F)  |  | P1(L)  |       |
|  | P1(F)  |  | P1(F)  |  | P0(F)  |       |
|  +--------+  +--------+  +--------+       |
|                                           |
|  Controller Quorum (KRaft)                |
+-------------------------------------------+
    |
    v
Consumer Group

NATS JetStream

JetStream runs as an embedded layer within NATS servers. Streams and consumers are first-class resources managed by the server.

Streams: Define which subjects to capture, storage backend (file or memory), and retention policy. Streams can be replicated across multiple servers.

Consumers: Named entities that track delivery state for a stream. Each consumer maintains its own sequence position and pending acknowledgments.

Clustering: NATS servers form clusters using NATS-native clustering. JetStream uses Raft consensus for stream and consumer metadata, with data replication configurable per stream.

Publisher
    |
    v
+-------------------------------------------+
|              NATS Cluster                 |
|                                           |
|  +--------+  +--------+  +--------+       |
|  |Server 1|  |Server 2|  |Server 3|       |
|  |        |  |        |  |        |       |
|  | Stream |  | Stream |  | Stream |       |
|  | (R=3)  |  | (R=3)  |  | (R=3)  |       |
|  +--------+  +--------+  +--------+       |
|                                           |
|  JetStream Raft Groups                    |
+-------------------------------------------+
    |
    v
Consumer (Push or Pull)

Delivery Semantics

Message Ordering

Kafka: Guarantees strict ordering within a partition. Messages sent to the same partition key are processed in order. Cross-partition ordering is not guaranteed.

JetStream: Guarantees ordering within a stream. Push consumers with max_ack_pending > 1 may see out-of-order processing if earlier messages are redelivered while later ones are being processed.

At-Least-Once Delivery

Both systems provide at-least-once delivery by default:

Kafka:

  • Producer sends message with acks=all
  • Broker writes to all in-sync replicas
  • Consumer processes message and commits offset
  • If consumer crashes before commit, message is redelivered

JetStream:

  • Publisher sends message (optionally with acknowledgment)
  • Stream replicates message based on R factor
  • Consumer receives and acknowledges message
  • If acknowledgment times out (ack_wait), message is redelivered

Exactly-Once Semantics

Kafka: Supports exactly-once semantics (EOS) through idempotent producers and transactional messaging. Producers can write to multiple partitions atomically, and consumers can read committed messages only.

// Kafka idempotent producer
config := sarama.NewConfig()
config.Producer.Idempotent = true
config.Producer.RequiredAcks = sarama.WaitForAll
config.Producer.Return.Successes = true
config.Producer.Return.Errors = true
config.Net.MaxOpenRequests = 1

producer, _ := sarama.NewAsyncProducer(brokers, config)

JetStream: Supports message deduplication using the Nats-Msg-Id header. Publishers can set a unique ID, and the stream will reject duplicates within the deduplication window.

// JetStream with deduplication
js, _ := nc.JetStream()

_, err := js.Publish("orders.new", data, nats.MsgId("order-12345"))

Durability and Retention

Kafka Retention

Kafka offers two cleanup policies:

Delete: Remove segments older than retention.ms or when log size exceeds retention.bytes.

Compact: Keep only the latest value for each key. Useful for changelog topics where you need the current state.

# Time-based retention (7 days)
kafka-topics.sh --create --topic events \
  --config retention.ms=604800000 \
  --config cleanup.policy=delete

# Compacted topic for state
kafka-topics.sh --create --topic user-profiles \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.5

JetStream Retention

JetStream supports three retention modes:

Limits: Discard old messages when stream reaches max_msgs, max_bytes, or max_age. Default mode for most use cases.

Interest: Keep messages only while at least one consumer exists. Messages are removed when all consumers have acknowledged them.

WorkQueue: Each message is delivered to exactly one consumer. Once acknowledged, the message is removed from the stream.

# Limits-based retention
nats stream add EVENTS \
  --subjects "events.>" \
  --retention limits \
  --max-bytes 10GB \
  --max-age 7d

# Work queue for job processing
nats stream add JOBS \
  --subjects "jobs.pending" \
  --retention work \
  --max-msgs 100000

Performance Characteristics

Throughput

Kafka optimizes for sustained high throughput:

  • Batch writes reduce disk I/O overhead
  • Zero-copy reads from page cache to network socket
  • Compression (lz4, zstd, snappy) reduces network and storage
  • Sequential disk access patterns maximize HDD/SSD performance

Typical throughput: 100MB/s to 1GB/s per broker depending on hardware and configuration.

JetStream optimizes for lower latency with moderate throughput:

  • Smaller default batch sizes
  • Synchronous acknowledgments by default
  • Memory storage option for highest performance
  • File storage with configurable sync policies

Typical throughput: 50MB/s to 200MB/s per server depending on storage backend and replication.

Latency

Kafka: Default configurations prioritize throughput over latency. Low-latency configurations:

config := sarama.NewConfig()
config.Producer.Flush.Frequency = 0           // Disable time-based batching
config.Producer.Flush.Messages = 1            // Send immediately
config.Producer.RequiredAcks = sarama.WaitForLocal // Reduced durability

With tuning, Kafka achieves 5-10ms p99 latency.

JetStream: Designed for low latency out of the box. Default configurations achieve sub-millisecond latency within a datacenter.

js, _ := nc.JetStream()

// Synchronous publish with confirmation
ack, _ := js.Publish("events.click", data)
fmt.Printf("Sequence: %d\n", ack.Sequence)

With replication (R=3), expect 1-5ms p99 latency.

Scalability

Kafka:

  • Horizontal scaling by adding brokers
  • Partition count determines parallelism
  • Consumer group rebalancing redistributes partitions
  • Topic with 100+ partitions common for high-throughput workloads

JetStream:

  • Horizontal scaling by adding servers to cluster
  • Subject-based routing allows many logical channels
  • Consumer scale-out via multiple pull subscribers
  • Superclusters for multi-region deployments

Go Client Examples

Kafka Producer and Consumer

package main

import (
    "log"

    "github.com/IBM/sarama"
)

func main() {
    brokers := []string{"kafka-1:9092", "kafka-2:9092", "kafka-3:9092"}

    config := sarama.NewConfig()
    config.Producer.RequiredAcks = sarama.WaitForAll
    config.Producer.Retry.Max = 3
    config.Producer.Return.Successes = true
    config.Producer.Return.Errors = true

    producer, err := sarama.NewAsyncProducer(brokers, config)
    if err != nil {
        log.Fatalf("Failed to create producer: %v", err)
    }
    defer producer.Close()

    // Handle successes and errors
    go func() {
        for msg := range producer.Successes() {
            log.Printf("Message sent to partition %d at offset %d", msg.Partition, msg.Offset)
        }
    }()

    go func() {
        for err := range producer.Errors() {
            log.Printf("Failed to send message: %v", err)
        }
    }()

    // Send message
    msg := &sarama.ProducerMessage{
        Topic: "events",
        Key:   sarama.StringEncoder("user-123"),
        Value: sarama.StringEncoder(`{"event":"click","page":"/products"}`),
    }

    producer.Input() <- msg
}
package main

import (
    "context"
    "log"

    "github.com/IBM/sarama"
)

type EventHandler struct{}

func (h *EventHandler) Setup(_ sarama.ConsumerGroupSession) error   { return nil }
func (h *EventHandler) Cleanup(_ sarama.ConsumerGroupSession) error { return nil }

func (h *EventHandler) ConsumeClaim(session sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error {
    for msg := range claim.Messages() {
        log.Printf("Received: topic=%s partition=%d offset=%d key=%s value=%s",
            msg.Topic, msg.Partition, msg.Offset, string(msg.Key), string(msg.Value))

        session.MarkMessage(msg, "")
    }
    return nil
}

func main() {
    brokers := []string{"kafka-1:9092", "kafka-2:9092", "kafka-3:9092"}

    config := sarama.NewConfig()
    config.Consumer.Group.Rebalance.Strategy = sarama.NewBalanceStrategyRoundRobin()
    config.Consumer.Offsets.Initial = sarama.OffsetOldest

    group, err := sarama.NewConsumerGroup(brokers, "event-processor", config)
    if err != nil {
        log.Fatalf("Failed to create consumer group: %v", err)
    }
    defer group.Close()

    handler := &EventHandler{}
    ctx := context.Background()

    for {
        if err := group.Consume(ctx, []string{"events"}, handler); err != nil {
            log.Printf("Consumer error: %v", err)
        }
    }
}

JetStream Publisher and Consumer

package main

import (
    "log"

    "github.com/nats-io/nats.go"
)

func main() {
    nc, err := nats.Connect("nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222")
    if err != nil {
        log.Fatalf("Failed to connect: %v", err)
    }
    defer nc.Close()

    js, err := nc.JetStream()
    if err != nil {
        log.Fatalf("Failed to get JetStream context: %v", err)
    }

    // Create stream if not exists
    _, err = js.AddStream(&nats.StreamConfig{
        Name:      "EVENTS",
        Subjects:  []string{"events.>"},
        Storage:   nats.FileStorage,
        Replicas:  3,
        Retention: nats.LimitsPolicy,
        MaxAge:    7 * 24 * 60 * 60 * 1e9, // 7 days in nanoseconds
    })
    if err != nil && err != nats.ErrStreamNameAlreadyInUse {
        log.Fatalf("Failed to create stream: %v", err)
    }

    // Publish with acknowledgment
    ack, err := js.Publish("events.click", []byte(`{"user":"123","page":"/products"}`))
    if err != nil {
        log.Fatalf("Failed to publish: %v", err)
    }

    log.Printf("Published to stream %s, sequence %d", ack.Stream, ack.Sequence)
}
package main

import (
    "log"
    "time"

    "github.com/nats-io/nats.go"
)

func main() {
    nc, err := nats.Connect("nats://nats-1:4222,nats://nats-2:4222,nats://nats-3:4222")
    if err != nil {
        log.Fatalf("Failed to connect: %v", err)
    }
    defer nc.Close()

    js, err := nc.JetStream()
    if err != nil {
        log.Fatalf("Failed to get JetStream context: %v", err)
    }

    // Create durable pull consumer
    _, err = js.AddConsumer("EVENTS", &nats.ConsumerConfig{
        Durable:       "event-processor",
        AckPolicy:     nats.AckExplicitPolicy,
        AckWait:       30 * time.Second,
        MaxAckPending: 1000,
        DeliverPolicy: nats.DeliverAllPolicy,
    })
    if err != nil && err != nats.ErrConsumerNameAlreadyInUse {
        log.Fatalf("Failed to create consumer: %v", err)
    }

    // Subscribe and process
    sub, err := js.PullSubscribe("events.>", "event-processor")
    if err != nil {
        log.Fatalf("Failed to subscribe: %v", err)
    }

    for {
        msgs, err := sub.Fetch(10, nats.MaxWait(5*time.Second))
        if err != nil {
            if err == nats.ErrTimeout {
                continue
            }
            log.Printf("Fetch error: %v", err)
            continue
        }

        for _, msg := range msgs {
            log.Printf("Received: subject=%s data=%s", msg.Subject, string(msg.Data))

            if err := msg.Ack(); err != nil {
                log.Printf("Ack error: %v", err)
            }
        }
    }
}

Operational Considerations

Kafka Operations

Cluster Management:

  • Monitor UnderReplicatedPartitions for replication health
  • Track consumer lag via kafka-consumer-groups.sh or metrics
  • Plan capacity based on partition count and retention size
  • Use rack awareness for fault tolerance

Common Tasks:

# Check consumer group lag
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group event-processor

# Reassign partitions after adding broker
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
  --reassignment-json-file reassign.json --execute

# Increase partition count (cannot decrease)
kafka-topics.sh --bootstrap-server localhost:9092 \
  --alter --topic events --partitions 12

Monitoring Metrics:

  • kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
  • kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec
  • kafka.consumer:type=consumer-fetch-manager-metrics,client-id=*,topic=*,partition=*

JetStream Operations

Cluster Management:

  • Monitor stream and consumer health via NATS CLI or API
  • Watch for consumer redelivery rates indicating processing issues
  • Plan storage based on stream retention settings
  • Use placement tags for multi-region deployments

Common Tasks:

# View stream info
nats stream info EVENTS

# Check consumer status
nats consumer info EVENTS event-processor

# View pending messages
nats consumer next EVENTS event-processor --count 1

# Purge stream (delete all messages)
nats stream purge EVENTS

Monitoring Metrics:

  • jetstream_server_total_streams
  • jetstream_server_total_consumers
  • jetstream_consumer_num_pending
  • jetstream_consumer_num_redelivered

Security

Both systems support TLS encryption and authentication:

Kafka:

  • SASL authentication (PLAIN, SCRAM, GSSAPI, OAUTHBEARER)
  • ACLs by principal, resource type (topic, group, cluster), and operation
  • SSL/TLS for encryption in transit

JetStream:

  • User credentials or NKey authentication
  • Account-based multi-tenancy with resource limits
  • Subject-level permissions for publish and subscribe
  • TLS for encryption in transit

When to Choose Each

Choose Kafka When

  • Long-term retention: You need to replay events from days, weeks, or months ago
  • High throughput: Sustained multi-GB/s ingestion is required
  • Stream processing: Using Kafka Streams, Flink, or similar frameworks
  • Exactly-once semantics: Transactional processing across topics is required
  • Ecosystem integration: Connecting to Kafka Connect, Schema Registry, or existing Kafka infrastructure

Choose JetStream When

  • Low latency: Sub-millisecond message delivery is critical
  • Simple operations: Smaller team without dedicated infrastructure expertise
  • Many subjects: Thousands of logical channels with varying traffic patterns
  • Mixed delivery: Need both push and pull consumption patterns
  • Edge deployments: Lightweight footprint for edge or embedded systems
  • Request-reply: Combining messaging with RPC-style communication

Hybrid Architectures

In practice, many organizations use both systems:

  • JetStream for real-time event fan-out and edge aggregation
  • Kafka as the central data lake for analytics and long-term storage
  • Bridge services to replicate between systems
Edge Devices → NATS JetStream (Edge) → Bridge → Kafka (Central) → Analytics
                     ↓
              Local Processing

Summary

AspectKafkaNATS JetStream
Core ModelDistributed commit logSubject-based streams
OrderingPer partitionPer stream
Latency5-10ms (tuned)Sub-millisecond
Throughput100MB/s - 1GB/s per broker50-200MB/s per server
RetentionTime/size-based, compactionLimits, interest, work-queue
ConsumersPull with offset trackingPush or pull with ack tracking
Exactly-onceTransactions + idempotent producerMessage deduplication
OperationsMore complex, JVM-basedSimpler, single binary

Both systems are production-ready and well-documented. Kafka excels at high-throughput pipelines with long retention and exactly-once processing. JetStream excels at low-latency messaging with simpler operations and flexible delivery patterns. Choose based on your latency requirements, throughput needs, operational capacity, and existing infrastructure.