Rendezvous and Maglev Hashing

When building distributed systems, you often need to decide which server should handle a particular request or where to store a piece of data. Consistent hashing algorithms solve this problem elegantly. This post compares two popular approaches: Rendezvous Hashing (also known as Highest Random Weight) and Maglev Hashing (developed by Google).

The Problem

Imagine you have a caching layer with multiple servers. When a request comes in for a specific key, you need to determine which server should handle it. A naive approach using modulo (hash(key) % num_servers) works until a server fails or you add a new one. Then almost every key gets remapped to a different server, invalidating your entire cache.

flowchart LR
    subgraph before["hash mod 3"]
        K1["key A"] --> S1["Server 1"]
        K2["key B"] --> S2["Server 2"]
        K3["key C"] --> S3["Server 3"]
    end

    subgraph after["hash mod 2"]
        K1a["key A"] --> S1a["Server 1"]
        K2a["key B"] --> S1a
        K3a["key C"] --> S2a["Server 2"]
    end

    before -->|"remove server"| after

    style S1 fill:#d4edda,stroke:#28a745,stroke-width:1px
    style S2 fill:#d4edda,stroke:#28a745,stroke-width:1px
    style S3 fill:#f8d7da,stroke:#dc3545,stroke-width:1px
    style S1a fill:#d4edda,stroke:#28a745,stroke-width:1px
    style S2a fill:#d4edda,stroke:#28a745,stroke-width:1px
    style K1 fill:#e1f5ff,stroke:#0366d6,stroke-width:1px
    style K2 fill:#e1f5ff,stroke:#0366d6,stroke-width:1px
    style K3 fill:#e1f5ff,stroke:#0366d6,stroke-width:1px
    style K1a fill:#e1f5ff,stroke:#0366d6,stroke-width:1px
    style K2a fill:#fff3cd,stroke:#ffc107,stroke-width:1px
    style K3a fill:#fff3cd,stroke:#ffc107,stroke-width:1px

With modulo hashing, removing one server causes most keys to remap (yellow = moved). Both Rendezvous and Maglev hashing solve this by ensuring only keys from the removed server need to move.

Why Not Ring-Based Consistent Hashing?

You may have heard of ring-based consistent hashing (used by Dynamo, Cassandra). While effective, it has drawbacks:

Requires virtual nodes for even distribution, adding complexity
Node lookup requires tree traversal (O(log n))
Adding/removing nodes requires rebalancing the ring

Rendezvous and Maglev offer simpler alternatives with different trade-offs.

Rendezvous Hashing

Rendezvous Hashing, proposed in 1996 by Thaler and Ravishankar, uses a simple but powerful idea: for each key, calculate a score for every server and pick the server with the highest score.

flowchart LR
    K["key: user_123"] --> H["Calculate scores"]
    H --> SA["A: 847"]
    H --> SB["B: 923"]
    H --> SC["C: 651"]
    SA --> MAX["Pick max"]
    SB --> MAX
    SC --> MAX
    MAX --> R["Server B"]

    style K fill:#e1f5ff,stroke:#0366d6,stroke-width:1px
    style H fill:#d1ecf1,stroke:#17a2b8,stroke-width:1px
    style SA fill:#fff3cd,stroke:#ffc107,stroke-width:1px
    style SB fill:#d4edda,stroke:#28a745,stroke-width:1px
    style SC fill:#fff3cd,stroke:#ffc107,stroke-width:1px
    style MAX fill:#d1ecf1,stroke:#17a2b8,stroke-width:1px
    style R fill:#d4edda,stroke:#28a745,stroke-width:1px

How It Works

For a given key, compute a hash combining the key and each server identifier
The server with the highest hash value wins
When a server is removed, keys previously assigned to it flow to their second-highest scoring server

Algorithm

func GetServer(key string, servers []string) string {
    var bestServer string
    var bestScore uint64

    for _, server := range servers {
        score := hash(key + server)
        if score > bestScore {
            bestScore = score
            bestServer = server
        }
    }

    return bestServer
}

Why It Works

The key insight is that hash functions produce uniformly distributed values. When you remove a server, only keys that had that server as their maximum will move. All other keys keep their existing assignment because their relative ordering remains unchanged.

For n servers, removing one server moves approximately 1/n of the keys, which is optimal.

Adding Weights

Rendezvous hashing easily supports weighted distribution:

score := hash(key+server.ID) * server.Weight

Servers with higher weights will win more often, proportional to their weight.

Characteristics

Aspect	Value
Lookup complexity	O(n) where n = number of servers
Memory usage	O(n) - just store server list
Minimal disruption	Yes - only affected keys remapped
Implementation	Simple

Maglev Hashing

Maglev Hashing was developed by Google for their network load balancer (published in 2016). It uses a precomputed lookup table to achieve O(1) lookups while maintaining consistent hashing properties.

flowchart LR
    subgraph build["Build once"]
        T["Lookup Table"]
        T0["0: B"]
        T1["1: A"]
        T2["2: C"]
        T3["3: A"]
        T4["4: B"]
        T --> T0
        T --> T1
        T --> T2
        T --> T3
        T --> T4
    end

    subgraph lookup["Each request"]
        K["key"] --> H["hash mod 5 = 3"]
        H --> R["table index 3 = A"]
    end

    style T fill:#e1f5ff,stroke:#0366d6,stroke-width:1px
    style T0 fill:#d4edda,stroke:#28a745,stroke-width:1px
    style T1 fill:#fff3cd,stroke:#ffc107,stroke-width:1px
    style T2 fill:#d1ecf1,stroke:#17a2b8,stroke-width:1px
    style T3 fill:#fff3cd,stroke:#ffc107,stroke-width:1px
    style T4 fill:#d4edda,stroke:#28a745,stroke-width:1px
    style K fill:#e1f5ff,stroke:#0366d6,stroke-width:1px
    style H fill:#d1ecf1,stroke:#17a2b8,stroke-width:1px
    style R fill:#fff3cd,stroke:#ffc107,stroke-width:1px

How Maglev Works

Create a lookup table of size M (typically a large prime number like 65537)
Each server generates a permutation sequence based on two hash functions
Servers take turns filling empty slots in the table using their permutation
To find a server for a key, hash the key and look up the result in the table

The Permutation Sequence

Each server generates its preference order for table slots using two hash functions:

offset := hash1(server.ID) % M
skip := hash2(server.ID) % (M - 1) + 1

// Generate permutation for server
for j := 0; j < M; j++ {
    permutation[j] = (offset + j * skip) % M
}

Building the Table

func BuildTable(servers []string, M int) []int {
    table := make([]int, M)
    for i := range table {
        table[i] = -1 // empty
    }

    next := make([]int, len(servers)) // position in each permutation
    filled := 0

    for filled < M {
        for i := range servers {
            slot := permutation[i][next[i]]
            for table[slot] != -1 {
                next[i]++
                slot = permutation[i][next[i]]
            }
            table[slot] = i
            next[i]++
            filled++

            if filled >= M {
                break
            }
        }
    }

    return table
}

Why Use a Prime Table Size?

Using a prime number M ensures that each server’s permutation sequence covers all slots before repeating. If M had common factors with the skip values, some slots might never be visited by certain servers.

Disruption Analysis

When a server is removed:

A new table must be built
Approximately M/n entries change (those that belonged to the removed server)
Actual disruption is slightly higher due to the round-robin filling process

Google reports that Maglev achieves “minimal disruption” comparable to consistent hashing.

Maglev Characteristics

Aspect	Value
Lookup complexity	O(1)
Memory usage	O(M) where M = table size
Minimal disruption	Yes - approximately 1/n keys remapped
Implementation	More complex

Side-by-Side Comparison

Feature	Rendezvous	Maglev
Lookup speed	O(n)	O(1)
Memory	Minimal	Lookup table required
Build time	None	O(M * n)
Weights support	Easy to add	Requires multiple entries
Best for	Small server pools	Large scale, high throughput
Disruption on change	Optimal (k/n keys)	Near-optimal

Real-World Use Cases

Rendezvous Hashing Use Cases

Distributed caches (Memcached, Redis clusters) - Simple to implement, weights for heterogeneous servers
Database sharding - Deterministic routing without coordination
CDN edge selection - Pick closest/best edge server for a request
Service discovery - Route requests to healthy instances

Maglev Hashing Use Cases

L4 load balancers - Google uses it for their global load balancing
High-throughput packet routing - Millions of decisions per second
Connection tracking - Ensure packets from same flow hit same backend
Software-defined networking - Consistent forwarding decisions across nodes

When to Use Which

Choose Rendezvous When

You have a small number of servers (under 100)
You need weighted distribution
Memory is constrained
Server list changes frequently
You want simple, auditable code

Choose Maglev When

You need O(1) lookups for high throughput
You have a large, relatively stable server pool
Memory for the lookup table is acceptable
You are building network-level load balancing
Lookup performance is critical (packet forwarding)

When to Use Neither

Stateful sessions requiring stickiness - Use session affinity instead
Geographic routing - Use latency-based or geo-IP routing
Heterogeneous workloads - Consider workload-aware scheduling

Practical Example

Consider a distributed cache with 10 servers handling 100,000 requests per second:

Rendezvous: Each request requires 10 hash computations. At 100K RPS, that is 1 million hash operations per second. This is usually acceptable for most applications.

Maglev: Each request requires 1 hash computation plus 1 table lookup. The same 100K RPS requires only 100K hash operations. The trade-off is maintaining a lookup table (typically 65537 entries or similar prime).

Memory Comparison

For 100 servers:

Rendezvous: ~100 server IDs in memory (~1-2 KB)
Maglev: 65537 entries * 8 bytes = ~512 KB lookup table

The memory difference is significant but both are trivial for modern systems.

Conclusion

Both algorithms solve the same fundamental problem but make different trade-offs:

Rendezvous Hashing is simpler, uses minimal memory, and works well for smaller server pools or when weighted distribution is needed
Maglev Hashing excels at scale with O(1) lookups but requires precomputation and more memory

For most applications, Rendezvous Hashing is a solid choice due to its simplicity. When you need to handle millions of lookups per second or have hundreds of servers, Maglev provides better performance characteristics.