How Kafka Bootstrap Connection Works
Understanding how Kafka clients connect to a cluster is fundamental for proper deployment and troubleshooting. The bootstrap connection process is often misunderstood, leading to connectivity issues in production. This article explains the mechanics of Kafka’s two-phase connection process and its implications.
The Two-Phase Connection Process
Kafka uses a two-phase discovery mechanism that distinguishes it from simpler client-server architectures:
- Bootstrap Phase: Initial connection using bootstrap servers
- Metadata Discovery Phase: Learning the full cluster topology
- Direct Connection Phase: Connecting directly to partition leaders
This design enables horizontal scalability and high availability but requires careful network configuration.
sequenceDiagram
participant C as Kafka Client
participant B as Bootstrap Server
participant K1 as Kafka Broker 1
participant K2 as Kafka Broker 2
participant K3 as Kafka Broker 3
Note over C,B: Phase 1: Bootstrap
C->>B: TCP Connect (kafka1.example.com:9092)
B-->>C: Connection Established
C->>B: Metadata Request
B-->>C: Cluster Metadata (Broker List + Topics)
Note over C,K3: Phase 2: Metadata Discovery
Note over C: Client learns:<br/>Broker 1: kafka1.internal:9092<br/>Broker 2: kafka2.internal:9092<br/>Broker 3: kafka3.internal:9092
Note over C,K3: Phase 3: Direct Connections
C->>K1: Direct TCP Connect
K1-->>C: Connected
C->>K2: Direct TCP Connect
K2-->>C: Connected
C->>K3: Direct TCP Connect
K3-->>C: Connected
Note over C,K3: Ready for Production/Consumption
C->>K1: Produce/Consume (Partition 0)
C->>K2: Produce/Consume (Partition 1)
C->>K3: Produce/Consume (Partition 2)
box rgb(225, 245, 255) Client
participant C
end
box rgb(255, 243, 205) Bootstrap
participant B
end
box rgb(212, 237, 218) Brokers
participant K1
participant K2
participant K3
end
Phase 1: Bootstrap Connection
What Are Bootstrap Servers?
Bootstrap servers are the initial contact points for Kafka clients. They’re specified in the client configuration:
import "github.com/IBM/sarama"
config := sarama.NewConfig()
config.Version = sarama.V3_5_0_0
brokers := []string{
"kafka1.example.com:9092",
"kafka2.example.com:9092",
"kafka3.example.com:9092",
}
client, err := sarama.NewClient(brokers, config)
if err != nil {
log.Fatal(err)
}
defer client.Close()
Bootstrap Connection Process
- Client attempts to connect to the first bootstrap server
- If connection fails, tries the next server in the list
- Continues until successful connection or all servers are exhausted
- Only ONE successful connection is needed
Important: You don’t need to list all brokers as bootstrap servers. A single reachable broker is sufficient, though multiple are recommended for redundancy.
graph LR
C[Kafka Client]
B1[kafka1.example.com:9092]
B2[kafka2.example.com:9092]
B3[kafka3.example.com:9092]
C -->|Try Connect 1| B1
B1 -.->|Failed| C
C -->|Try Connect 2| B2
B2 ==>|Success| C
style B1 fill:lightcoral
style B2 fill:lightgreen
style B3 fill:lightgray
Bootstrap Server Response
When a client connects to a bootstrap server, it sends a Metadata request. The broker responds with:
Cluster Metadata Response:
- Cluster ID
- Controller ID
- Broker List:
- Broker ID: 1
Host: kafka1.internal.example.com
Port: 9092
- Broker ID: 2
Host: kafka2.internal.example.com
Port: 9092
- Broker ID: 3
Host: kafka3.internal.example.com
Port: 9092
- Topic Metadata:
- Topic: events
Partitions:
- Partition: 0, Leader: 1, Replicas: [1,2], ISR: [1,2]
- Partition: 1, Leader: 2, Replicas: [2,3], ISR: [2,3]
- Partition: 2, Leader: 3, Replicas: [3,1], ISR: [3,1]
Phase 2: Metadata Discovery
After the bootstrap connection succeeds, the client learns the full cluster topology:
Broker Addresses in Metadata
The metadata response contains advertised.listeners addresses for each broker. These are NOT necessarily the same as bootstrap server addresses:
# Broker 1 configuration
advertised.listeners=PLAINTEXT://kafka1.internal.example.com:9092
# What clients receive in metadata
Host: kafka1.internal.example.com
Port: 9092
Common Pitfall: Address Mismatch
This is where most connection problems occur:
Scenario: Bootstrap via load balancer, metadata returns internal IPs
Bootstrap: kafka-lb.example.com:9092 Success
Metadata: 10.0.1.5:9092 Unreachable from client
The client successfully bootstraps but cannot connect to partition leaders because internal IPs aren’t routable from the client’s network.
sequenceDiagram
participant C as Client
participant LB as Load Balancer<br/>(kafka-lb.example.com)
participant B as Broker<br/>(10.0.1.5:9092)
C->>LB: Bootstrap Connect
LB->>B: Forward
B-->>LB: Metadata Response<br/>(advertised: 10.0.1.5:9092)
LB-->>C: Metadata Response
Note over C: Client tries to connect<br/>to 10.0.1.5:9092
C-xB: Direct Connect to 10.0.1.5:9092
Note over C,B: FAILS: Internal IP unreachable<br/>from client network
box rgb(225, 245, 255) Client Side
participant C
end
box rgb(255, 243, 205) Load Balancer
participant LB
end
box rgb(248, 215, 218) Unreachable Broker
participant B
end
Phase 3: Direct Connections
Partition Leader Connections
After receiving metadata, the client connects directly to partition leaders:
// Client needs to produce to topic "events" partition 0
// Metadata shows partition 0 leader is broker 1 at kafka1.internal.example.com:9092
// Client attempts direct connection to kafka1.internal.example.com:9092
config := sarama.NewConfig()
config.Producer.Return.Successes = true
producer, err := sarama.NewSyncProducer(brokers, config)
if err != nil {
log.Fatal(err)
}
defer producer.Close()
msg := &sarama.ProducerMessage{
Topic: "events",
Value: sarama.StringEncoder("test message"),
}
partition, offset, err := producer.SendMessage(msg)
if err != nil {
log.Printf("Failed to send message: %v", err)
}
Why Direct Connections?
Kafka requires direct broker connections for several reasons:
- Performance: Eliminates proxy/load balancer overhead
- Partition Distribution: Different partitions live on different brokers
- Scalability: Load balancers become bottlenecks at high throughput
- Protocol Complexity: Kafka protocol requires stateful connections
graph TB
C[Kafka Client]
subgraph "Kafka Cluster"
K1[Broker 1<br/>kafka1:9092]
K2[Broker 2<br/>kafka2:9092]
K3[Broker 3<br/>kafka3:9092]
end
subgraph "Topic: events"
P0[Partition 0<br/>Leader: Broker 1]
P1[Partition 1<br/>Leader: Broker 2]
P2[Partition 2<br/>Leader: Broker 3]
end
C -.->|Bootstrap| K2
C ==>|Direct: Partition 0| K1
C ==>|Direct: Partition 1| K2
C ==>|Direct: Partition 2| K3
P0 -.-> K1
P1 -.-> K2
P2 -.-> K3
style C fill:#e1f5ff,stroke:#0366d6,stroke-width:2px
style K1 fill:#d4edda,stroke:#28a745,stroke-width:2px
style K2 fill:#d4edda,stroke:#28a745,stroke-width:2px
style K3 fill:#d4edda,stroke:#28a745,stroke-width:2px
style P0 fill:#fff3cd,stroke:#ffc107,stroke-width:2px
style P1 fill:#fff3cd,stroke:#ffc107,stroke-width:2px
style P2 fill:#fff3cd,stroke:#ffc107,stroke-width:2px
Network Configuration Requirements
Listener Configuration
Brokers must advertise addresses reachable by clients:
# Single network (simple case)
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://kafka1.example.com:9092
# Multiple networks (internal + external)
listeners=INTERNAL://0.0.0.0:9092,EXTERNAL://0.0.0.0:9093
advertised.listeners=INTERNAL://kafka1.internal:9092,EXTERNAL://kafka1.example.com:9093
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
Client Network Requirements
For successful operation, clients must:
- Reach at least one bootstrap server
- Resolve all advertised listener hostnames
- Connect directly to ALL brokers in the cluster
- Maintain persistent TCP connections
Common Deployment Scenarios
Scenario 1: Same Network
Setup: Clients and brokers on same VPC/network
Client Network: 10.0.0.0/16
Broker Addresses: 10.0.1.5, 10.0.1.6, 10.0.1.7
Bootstrap: 10.0.1.5:9092 OK
Metadata: 10.0.1.5-7:9092 OK
Direct: 10.0.1.5-7:9092 OK
Configuration:
advertised.listeners=PLAINTEXT://10.0.1.5:9092
graph LR
C[Client<br/>10.0.0.100]
subgraph "VPC: 10.0.0.0/16"
K1[Broker 1<br/>10.0.1.5:9092]
K2[Broker 2<br/>10.0.1.6:9092]
K3[Broker 3<br/>10.0.1.7:9092]
end
C -.->|Bootstrap| K1
C ==>|Direct| K1
C ==>|Direct| K2
C ==>|Direct| K3
style C fill:#e1f5ff,stroke:#0366d6,stroke-width:2px
style K1 fill:#d4edda,stroke:#28a745,stroke-width:2px
style K2 fill:#d4edda,stroke:#28a745,stroke-width:2px
style K3 fill:#d4edda,stroke:#28a745,stroke-width:2px
Scenario 2: Across Networks (NAT)
Setup: External clients connecting through NAT/firewall
Internal Network: 10.0.0.0/16
External Network: Internet
Bootstrap: kafka.example.com:9092 OK
Metadata: 10.0.1.5:9092 FAIL (internal IP unreachable)
Solution 1: Use external DNS names (separate IPs)
# Broker 1
advertised.listeners=PLAINTEXT://kafka1.example.com:9092
# Broker 2
advertised.listeners=PLAINTEXT://kafka2.example.com:9092
# Broker 3
advertised.listeners=PLAINTEXT://kafka3.example.com:9092
Requirements:
- DNS resolution: kafka1.example.com → 203.0.113.10
- Port forwarding: 203.0.113.10:9092 → 10.0.1.5:9092
- Firewall rules: Allow TCP 9092 from client IPs
Solution 2: Use single IP with different ports
# Broker 1
advertised.listeners=PLAINTEXT://kafka.example.com:9092
# Broker 2
advertised.listeners=PLAINTEXT://kafka.example.com:9093
# Broker 3
advertised.listeners=PLAINTEXT://kafka.example.com:9094
Requirements:
- DNS resolution: kafka.example.com → 203.0.113.10
- Port forwarding:
- 203.0.113.10:9092 → 10.0.1.5:9092
- 203.0.113.10:9093 → 10.0.1.6:9092
- 203.0.113.10:9094 → 10.0.1.7:9092
- Firewall rules: Allow TCP 9092-9094 from client IPs
Solution 1 Diagram: Separate DNS names and IPs
graph TB
C[External Client<br/>Internet]
subgraph "Firewall/NAT"
FW1[Public IP: 203.0.113.10<br/>kafka1.example.com:9092]
FW2[Public IP: 203.0.113.11<br/>kafka2.example.com:9092]
FW3[Public IP: 203.0.113.12<br/>kafka3.example.com:9092]
end
subgraph "Internal Network: 10.0.0.0/16"
K1[Broker 1<br/>10.0.1.5:9092]
K2[Broker 2<br/>10.0.1.6:9092]
K3[Broker 3<br/>10.0.1.7:9092]
end
C -->|Connect to<br/>kafka1.example.com| FW1
C -->|Connect to<br/>kafka2.example.com| FW2
C -->|Connect to<br/>kafka3.example.com| FW3
FW1 -->|Port Forward<br/>:9092 → 10.0.1.5:9092| K1
FW2 -->|Port Forward<br/>:9092 → 10.0.1.6:9092| K2
FW3 -->|Port Forward<br/>:9092 → 10.0.1.7:9092| K3
style C fill:#e1f5ff,stroke:#0366d6,stroke-width:2px
style FW1 fill:#fff3cd,stroke:#ffc107,stroke-width:2px
style FW2 fill:#fff3cd,stroke:#ffc107,stroke-width:2px
style FW3 fill:#fff3cd,stroke:#ffc107,stroke-width:2px
style K1 fill:#d4edda,stroke:#28a745,stroke-width:2px
style K2 fill:#d4edda,stroke:#28a745,stroke-width:2px
style K3 fill:#d4edda,stroke:#28a745,stroke-width:2px
Solution 2 Diagram: Single IP with port mapping
graph TB
C[External Client<br/>Internet]
subgraph "Firewall/NAT"
FW[Single Public IP: 203.0.113.10<br/>Ports: 9092, 9093, 9094]
end
subgraph "Internal Network: 10.0.0.0/16"
K1[Broker 1<br/>10.0.1.5:9092<br/>kafka.example.com:9092]
K2[Broker 2<br/>10.0.1.6:9092<br/>kafka.example.com:9093]
K3[Broker 3<br/>10.0.1.7:9092<br/>kafka.example.com:9094]
end
C -->|kafka.example.com:9092<br/>DNS→203.0.113.10| FW
FW -->|:9092 → 10.0.1.5:9092| K1
FW -->|:9093 → 10.0.1.6:9092| K2
FW -->|:9094 → 10.0.1.7:9092| K3
style C fill:#e1f5ff,stroke:#0366d6,stroke-width:2px
style FW fill:#fff3cd,stroke:#ffc107,stroke-width:2px
style K1 fill:#d4edda,stroke:#28a745,stroke-width:2px
style K2 fill:#d4edda,stroke:#28a745,stroke-width:2px
style K3 fill:#d4edda,stroke:#28a745,stroke-width:2px
Scenario 3: Multiple Client Networks
Setup: Internal microservices + external applications
listeners=INTERNAL://0.0.0.0:9092,EXTERNAL://0.0.0.0:9093
advertised.listeners=INTERNAL://kafka1.internal:9092,EXTERNAL://kafka1.example.com:9093
listener.security.protocol.map=INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
inter.broker.listener.name=INTERNAL
Client configuration:
// Internal client
brokers := []string{"kafka1.internal:9092"}
// External client
brokers := []string{"kafka1.example.com:9093"}
graph TB
IC[Internal Client<br/>Microservices]
EC[External Client<br/>Internet]
subgraph "Kafka Cluster"
K1I[Broker 1<br/>INTERNAL: kafka1.internal:9092<br/>EXTERNAL: kafka1.example.com:9093]
K2I[Broker 2<br/>INTERNAL: kafka2.internal:9092<br/>EXTERNAL: kafka2.example.com:9093]
K3I[Broker 3<br/>INTERNAL: kafka3.internal:9092<br/>EXTERNAL: kafka3.example.com:9093]
end
IC -.->|Connect to<br/>:9092| K1I
IC ==>|Direct| K1I
IC ==>|Direct| K2I
IC ==>|Direct| K3I
EC -.->|Connect to<br/>:9093| K1I
EC ==>|Direct| K1I
EC ==>|Direct| K2I
EC ==>|Direct| K3I
style IC fill:#d1ecf1,stroke:#17a2b8,stroke-width:2px
style EC fill:#e1f5ff,stroke:#0366d6,stroke-width:2px
style K1I fill:#d4edda,stroke:#28a745,stroke-width:2px
style K2I fill:#d4edda,stroke:#28a745,stroke-width:2px
style K3I fill:#d4edda,stroke:#28a745,stroke-width:2px
Troubleshooting Connection Issues
Diagnostic Steps
1. Verify Bootstrap Connection
# Test TCP connectivity
nc -zv kafka1.example.com 9092
# Test with kafkacat
kafkacat -b kafka1.example.com:9092 -L
2. Check Metadata Response
# Full cluster metadata
kafkacat -b kafka1.example.com:9092 -L
# Output shows advertised addresses:
# broker 1 at kafka1.internal.example.com:9092
# broker 2 at kafka2.internal.example.com:9092
3. Verify Direct Connectivity
# Test connection to each advertised address
nc -zv kafka1.internal.example.com 9092
nc -zv kafka2.internal.example.com 9092
nc -zv kafka3.internal.example.com 9092
4. DNS Resolution
# Verify DNS resolves correctly from client network
nslookup kafka1.internal.example.com
dig kafka1.internal.example.com
flowchart TD
Start([Connection Issue])
Start --> Q1{Can connect to<br/>bootstrap server?}
Q1 -->|No| F1[Check DNS resolution<br/>Check firewall rules<br/>Verify bootstrap address]
Q1 -->|Yes| Q2{Receive metadata<br/>response?}
Q2 -->|No| F2[Check broker logs<br/>Verify client auth<br/>Check broker health]
Q2 -->|Yes| Q3{Can resolve<br/>advertised addresses?}
Q3 -->|No| F3[Fix DNS configuration<br/>Update /etc/hosts<br/>Check DNS servers]
Q3 -->|Yes| Q4{Can connect to<br/>all brokers?}
Q4 -->|No| F4[Check advertised listeners<br/>Verify network routing<br/>Check firewall rules]
Q4 -->|Yes| Success([Connection Working])
F1 --> End([Fix and Retry])
F2 --> End
F3 --> End
F4 --> End
style Start fill:#e1f5ff,stroke:#0366d6,stroke-width:2px
style Success fill:#d4edda,stroke:#28a745,stroke-width:3px
style End fill:#fff3cd,stroke:#ffc107,stroke-width:2px
style F1 fill:#f8d7da,stroke:#dc3545,stroke-width:2px
style F2 fill:#f8d7da,stroke:#dc3545,stroke-width:2px
style F3 fill:#f8d7da,stroke:#dc3545,stroke-width:2px
style F4 fill:#f8d7da,stroke:#dc3545,stroke-width:2px
Testing Connection in Go
package main
import (
"fmt"
"log"
"github.com/IBM/sarama"
)
func testKafkaConnection(brokers []string) error {
config := sarama.NewConfig()
config.Version = sarama.V3_5_0_0
config.Net.DialTimeout = 10 * time.Second
config.Net.ReadTimeout = 10 * time.Second
config.Net.WriteTimeout = 10 * time.Second
// Step 1: Create client (bootstrap connection)
client, err := sarama.NewClient(brokers, config)
if err != nil {
return fmt.Errorf("bootstrap connection failed: %w", err)
}
defer client.Close()
// Step 2: Verify we can reach all brokers
brokerList := client.Brokers()
fmt.Printf("Discovered %d brokers:\n", len(brokerList))
for _, broker := range brokerList {
addr := broker.Addr()
fmt.Printf(" Broker %d: %s\n", broker.ID(), addr)
err := broker.Open(config)
if err != nil {
return fmt.Errorf("cannot connect to broker %d at %s: %w",
broker.ID(), addr, err)
}
connected, err := broker.Connected()
if err != nil || !connected {
return fmt.Errorf("broker %d at %s not connected",
broker.ID(), addr)
}
broker.Close()
fmt.Printf(" Successfully connected to %s\n", addr)
}
// Step 3: Test topic metadata
topics, err := client.Topics()
if err != nil {
return fmt.Errorf("failed to fetch topics: %w", err)
}
fmt.Printf("\nDiscovered %d topics\n", len(topics))
return nil
}
func main() {
brokers := []string{
"kafka1.example.com:9092",
"kafka2.example.com:9092",
"kafka3.example.com:9092",
}
if err := testKafkaConnection(brokers); err != nil {
log.Fatal(err)
}
fmt.Println("\nAll connectivity tests passed")
}
Best Practices
Bootstrap Server Configuration
- Use Multiple Bootstrap Servers: Provide 2-3 for redundancy
- Use Stable Addresses: DNS names preferred over IPs
- Test from Client Network: Verify reachability before deployment
Advertised Listener Configuration
- Use Client-Reachable Addresses: Test DNS resolution from client networks
- Avoid Internal IPs: Use DNS names that resolve correctly from all client locations
- Document Network Requirements: Maintain list of required connectivity
- Use Multiple Listeners: Separate internal/external traffic when needed
Monitoring and Maintenance
- Monitor Connection Metrics: Track connection failures and timeouts
- Log Bootstrap Attempts: Debug connectivity issues
- Validate Configuration Changes: Test before deploying
- Keep Client Libraries Updated: Latest versions have better error messages
Conclusion
Kafka’s bootstrap connection process is a two-phase mechanism:
- Bootstrap Phase: Connect to any listed server for initial contact
- Metadata Phase: Discover full cluster topology and advertised addresses
- Direct Phase: Connect directly to partition leaders
Critical Requirements:
- Clients must reach ALL brokers, not just bootstrap servers
- Advertised listeners must be resolvable and routable from client networks
- Direct TCP connections required (load balancers only for bootstrap)
Common Mistakes:
- Using internal IPs in
advertised.listenersfor external clients - Assuming load balancer handles all connections
- Not testing connectivity to all brokers before deployment
- Mixing network contexts without multiple listener configuration
Understanding this connection model is essential for successful Kafka deployments across diverse network topologies.