Skip to main content Skip to sidebar

Rendezvous and Maglev Hashing

When building distributed systems, you often need to decide which server should handle a particular request or where to store a piece of data. Consistent hashing algorithms solve this problem elegantly. This post compares two popular approaches: Rendezvous Hashing (also known as Highest Random Weight) and Maglev Hashing (developed by Google).

The Problem

Imagine you have a caching layer with multiple servers. When a request comes in for a specific key, you need to determine which server should handle it. A naive approach using modulo (hash(key) % num_servers) works until a server fails or you add a new one. Then almost every key gets remapped to a different server, invalidating your entire cache.

Mermaid diagram

With modulo hashing, removing one server causes most keys to remap (yellow = moved). Both Rendezvous and Maglev hashing solve this by ensuring only keys from the removed server need to move.

Why Not Ring-Based Consistent Hashing?

You may have heard of ring-based consistent hashing (used by Dynamo, Cassandra). While effective, it has drawbacks:

  • Requires virtual nodes for even distribution, adding complexity
  • Node lookup requires tree traversal (O(log n))
  • Adding/removing nodes requires rebalancing the ring

Rendezvous and Maglev offer simpler alternatives with different trade-offs.

Rendezvous Hashing

Rendezvous Hashing, proposed in 1996 by Thaler and Ravishankar, uses a simple but powerful idea: for each key, calculate a score for every server and pick the server with the highest score.

Mermaid diagram

How It Works

  1. For a given key, compute a hash combining the key and each server identifier
  2. The server with the highest hash value wins
  3. When a server is removed, keys previously assigned to it flow to their second-highest scoring server

Algorithm

func GetServer(key string, servers []string) string {
    var bestServer string
    var bestScore uint64

    for _, server := range servers {
        score := hash(key + server)
        if score > bestScore {
            bestScore = score
            bestServer = server
        }
    }

    return bestServer
}

Why It Works

The key insight is that hash functions produce uniformly distributed values. When you remove a server, only keys that had that server as their maximum will move. All other keys keep their existing assignment because their relative ordering remains unchanged.

For n servers, removing one server moves approximately 1/n of the keys, which is optimal.

Adding Weights

Rendezvous hashing easily supports weighted distribution:

score := hash(key+server.ID) * server.Weight

Servers with higher weights will win more often, proportional to their weight.

Characteristics

AspectValue
Lookup complexityO(n) where n = number of servers
Memory usageO(n) - just store server list
Minimal disruptionYes - only affected keys remapped
ImplementationSimple

Maglev Hashing

Maglev Hashing was developed by Google for their network load balancer (published in 2016). It uses a precomputed lookup table to achieve O(1) lookups while maintaining consistent hashing properties.

Mermaid diagram

How It Works

  1. Create a lookup table of size M (typically a large prime number like 65537)
  2. Each server generates a permutation sequence based on two hash functions
  3. Servers take turns filling empty slots in the table using their permutation
  4. To find a server for a key, hash the key and look up the result in the table

The Permutation Sequence

Each server generates its preference order for table slots using two hash functions:

offset := hash1(server.ID) % M
skip := hash2(server.ID) % (M - 1) + 1

// Generate permutation for server
for j := 0; j < M; j++ {
    permutation[j] = (offset + j * skip) % M
}

Building the Table

func BuildTable(servers []string, M int) []int {
    table := make([]int, M)
    for i := range table {
        table[i] = -1 // empty
    }

    next := make([]int, len(servers)) // position in each permutation
    filled := 0

    for filled < M {
        for i := range servers {
            slot := permutation[i][next[i]]
            for table[slot] != -1 {
                next[i]++
                slot = permutation[i][next[i]]
            }
            table[slot] = i
            next[i]++
            filled++

            if filled >= M {
                break
            }
        }
    }

    return table
}

Why Use a Prime Table Size?

Using a prime number M ensures that each server’s permutation sequence covers all slots before repeating. If M had common factors with the skip values, some slots might never be visited by certain servers.

Disruption Analysis

When a server is removed:

  • A new table must be built
  • Approximately M/n entries change (those that belonged to the removed server)
  • Actual disruption is slightly higher due to the round-robin filling process

Google reports that Maglev achieves “minimal disruption” comparable to consistent hashing.

Characteristics

AspectValue
Lookup complexityO(1)
Memory usageO(M) where M = table size
Minimal disruptionYes - approximately 1/n keys remapped
ImplementationMore complex

Side-by-Side Comparison

FeatureRendezvousMaglev
Lookup speedO(n)O(1)
MemoryMinimalLookup table required
Build timeNoneO(M * n)
Weights supportEasy to addRequires multiple entries
Best forSmall server poolsLarge scale, high throughput
Disruption on changeOptimal (k/n keys)Near-optimal

Real-World Use Cases

Rendezvous Hashing

  • Distributed caches (Memcached, Redis clusters) - Simple to implement, weights for heterogeneous servers
  • Database sharding - Deterministic routing without coordination
  • CDN edge selection - Pick closest/best edge server for a request
  • Service discovery - Route requests to healthy instances

Maglev Hashing

  • L4 load balancers - Google uses it for their global load balancing
  • High-throughput packet routing - Millions of decisions per second
  • Connection tracking - Ensure packets from same flow hit same backend
  • Software-defined networking - Consistent forwarding decisions across nodes

When to Use Which

Choose Rendezvous When

  • You have a small number of servers (under 100)
  • You need weighted distribution
  • Memory is constrained
  • Server list changes frequently
  • You want simple, auditable code

Choose Maglev When

  • You need O(1) lookups for high throughput
  • You have a large, relatively stable server pool
  • Memory for the lookup table is acceptable
  • You are building network-level load balancing
  • Lookup performance is critical (packet forwarding)

When to Use Neither

  • Stateful sessions requiring stickiness - Use session affinity instead
  • Geographic routing - Use latency-based or geo-IP routing
  • Heterogeneous workloads - Consider workload-aware scheduling

Practical Example

Consider a distributed cache with 10 servers handling 100,000 requests per second:

Rendezvous: Each request requires 10 hash computations. At 100K RPS, that is 1 million hash operations per second. This is usually acceptable for most applications.

Maglev: Each request requires 1 hash computation plus 1 table lookup. The same 100K RPS requires only 100K hash operations. The trade-off is maintaining a lookup table (typically 65537 entries or similar prime).

Memory Comparison

For 100 servers:

  • Rendezvous: ~100 server IDs in memory (~1-2 KB)
  • Maglev: 65537 entries * 8 bytes = ~512 KB lookup table

The memory difference is significant but both are trivial for modern systems.

Conclusion

Both algorithms solve the same fundamental problem but make different trade-offs:

  • Rendezvous Hashing is simpler, uses minimal memory, and works well for smaller server pools or when weighted distribution is needed
  • Maglev Hashing excels at scale with O(1) lookups but requires precomputation and more memory

For most applications, Rendezvous Hashing is a solid choice due to its simplicity. When you need to handle millions of lookups per second or have hundreds of servers, Maglev provides better performance characteristics.

Further Reading