5 strategies for Redis sharding under large data volume

1. Modulo Sharding

Module-taking sharding is the most intuitive hash sharding method. The modulus of the node number is determined based on the hash value of the key to determine the sharding position.

How it works

Calculate the hash value of the key
Take the modulus of the total number of nodes to get the node index
Route the operation to the corresponding node

Implementation example

public class ModuloSharding {
    private final List&lt;JedisPool&gt; shards;
    
    public ModuloSharding(List&lt;String&gt; redisHosts, int port) {
        shards = new ArrayList&lt;&gt;();
        for (String host : redisHosts) {
            (new JedisPool(new JedisPoolConfig(), host, port));
        }
    }
    
    private int getShardIndex(String key) {
        return (() % ());
    }
    
    public String get(String key) {
        int index = getShardIndex(key);
        try (Jedis jedis = (index).getResource()) {
            return (key);
        }
    }
    
    public void set(String key, String value) {
        int index = getShardIndex(key);
        try (Jedis jedis = (index).getResource()) {
            (key, value);
        }
    }
    
    // All keys need to be remapping when the number of nodes changes    public void reshardData(List&lt;String&gt; newHosts, int port) {
        List&lt;JedisPool&gt; newShards = new ArrayList&lt;&gt;();
        for (String host : newHosts) {
            (new JedisPool(new JedisPoolConfig(), host, port));
        }
        
        // Here you need to migrate data, traverse all keys and reassign        // More complex logic is needed in actual implementation to handle the migration of large amounts of data        // ...
        
         = newShards;
    }
}

Pros and cons

advantage

Extremely simple to implement
The data distribution is relatively uniform when the number of nodes is fixed
Small calculation overhead

shortcoming

A large amount of data migration is required when the number of nodes changes (almost all keys are remapping)
Possible hot issues
Not suitable for scenarios where frequent expansion and expansion are required

Applicable scenarios

Scenario with relatively fixed number of nodes
Small applications that are simple to implement and have low demand for expansion
Systems with small data volume and can accept full migration

2. Proxy-based Sharding

Proxy sharding manages sharding logic by introducing an intermediate proxy layer. Common proxies include Twemproxy (nutcracker) and Codis.

How it works

Agent serves as the intermediate layer between the application and the Redis node
Clients connect to proxy rather than directly connecting to Redis
Agent routes requests to the correct Redis node according to internal algorithm

Twemproxy configuration example

alpha:
  listen: 127.0.0.1:22121
  hash: fnv1a_64
  distribution: ketama
  auto_eject_hosts: true
  redis: true
  server_retry_timeout: 2000
  server_failure_limit: 3
  servers:
   - 127.0.0.1:6379:1
   - 127.0.0.1:6380:1
   - 127.0.0.1:6381:1

Pros and cons

advantage

Transparent to applications, clients do not need to perceive sharding details
Reduce the number of connections between clients and Redis
Easy to manage and monitor

shortcoming

Introduce single point of failure risk
Added additional network latency
Expansion usually requires manual operation
Proxy layer may become a performance bottleneck

Applicable scenarios

Scenarios where minimal changes to existing systems are required
Unified sharding strategy in multi-language environment
High concurrency scenarios where the number of connections needs to be controlled

3. Redis Cluster

Redis Cluster is a cluster solution officially provided by Redis, and it is supported starting from Redis 3.0.

How it works

Using the concept of hash slots, a total of 16384 slots are
Each key is calculated according to the CRC16 algorithm and modulo 16384 and mapped to the slot.
The slots are assigned to different nodes
Supports automatic data migration and replication between nodes

Configuration and construction

Node configuration example:

port 7000
cluster-enabled yes
cluster-config-file 
cluster-node-timeout 5000
appendonly yes

Create a cluster command:

redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
  127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1

Client support code example

// Connect to Redis Cluster using Lettuce clientRedisURI redisUri = 
    .redis("127.0.0.1", 7000)
    .withTimeout((60))
    .build();

RedisClusterClient clusterClient = (redisUri);
StatefulRedisClusterConnection&lt;String, String&gt; connection = ();
RedisAdvancedClusterCommands&lt;String, String&gt; commands = ();

// Normal operation, the client will handle cluster routing("user:1000", "Zhang San");
String value = ("user:1000");

Pros and cons

advantage

Official native support, continuous update and maintenance
Decentralized architecture, no single point of failure
Automatic fault detection and failover
Automatically handle data sharding and migration between nodes

shortcoming

The client needs to support the cluster protocol
Multi-key operation is limited by the slot mechanism (must be in the same slot)
High resource consumption and high communication overhead
Configuration management is relatively complex

Applicable scenarios

Large-scale Redis deployment
Need for high availability and automatic failure recovery
Data volume and load grow dynamically over time
Redis official ecological support environment

4. Consistent hashing

The consistent hashing algorithm can minimize the keys that need to be remapped when nodes change, and is suitable for environments where nodes change frequently.

How it works

Map hash space to a ring (0 to 2^32-1)
Redis nodes are mapped to certain points on the ring
Each key finds the first node encountered clockwise
Adding or deleting nodes only affects data from neighboring nodes

Implementation example

public class ConsistentHashSharding {
    private final SortedMap<Integer, JedisPool> circle = new TreeMap<>();
    private final int numberOfReplicas;
    private final HashFunction hashFunction;
    
    public ConsistentHashSharding(List<String> nodes, int replicas) {
         = replicas;
         = Hashing.murmur3_32();
        
        for (String node : nodes) {
            addNode(node);
        }
    }
    
    public void addNode(String node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            String virtualNode = node + "-" + i;
            int hash = (virtualNode, ()).asInt();
            (hash, new JedisPool(new JedisPoolConfig(), (":")[0], 
                       ((":")[1])));
        }
    }
    
    public void removeNode(String node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            String virtualNode = node + "-" + i;
            int hash = (virtualNode, ()).asInt();
            (hash);
        }
    }
    
    public JedisPool getNode(String key) {
        if (()) {
            return null;
        }
        
        int hash = (key, ()).asInt();
        
        if (!(hash)) {
            SortedMap<Integer, JedisPool> tailMap = (hash);
            hash = () ? () : ();
        }
        
        return (hash);
    }
    
    public String get(String key) {
        JedisPool pool = getNode(key);
        try (Jedis jedis = ()) {
            return (key);
        }
    }
    
    public void set(String key, String value) {
        JedisPool pool = getNode(key);
        try (Jedis jedis = ()) {
            (key, value);
        }
    }
}

Pros and cons

advantage

Minimize data migration when node changes
Relatively uniform data distribution
Suitable for dynamic scaling environments

shortcoming

More complex implementation
Virtual nodes introduce additional memory overhead
There may still be imbalance in the data distribution

Applicable scenarios

Environments where nodes frequently increase and decrease
Large-scale applications that require dynamic scaling
Scenarios that are sensitive to data migration costs

5. Range-based Sharding

Slicing data to different nodes based on ranges of key values is especially suitable for ordered data sets.

How it works

Predefined key range division
Determine the storage node according to the key's range
Usually used in combination with ordered keys, such as time series data

Implementation example

public class RangeSharding {
    private final TreeMap&lt;Long, JedisPool&gt; rangeMap = new TreeMap&lt;&gt;();
    
    public RangeSharding() {
        // Assume that slices by user ID range        (0L, new JedisPool("", 6379));      // 0-999999
        (1000000L, new JedisPool("", 6379)); // 1000000-1999999
        (2000000L, new JedisPool("", 6379)); // 2000000-2999999
        // More scope...    }
    
    private JedisPool getShardForUserId(long userId) {
        &lt;Long, JedisPool&gt; entry = (userId);
        if (entry == null) {
            throw new IllegalArgumentException("No shard available for userId: " + userId);
        }
        return ();
    }
    
    public String getUserData(long userId) {
        JedisPool pool = getShardForUserId(userId);
        try (Jedis jedis = ()) {
            return ("user:" + userId);
        }
    }
    
    public void setUserData(long userId, String data) {
        JedisPool pool = getShardForUserId(userId);
        try (Jedis jedis = ()) {
            ("user:" + userId, data);
        }
    }
}

Pros and cons

advantage

Data in a specific range is located in the same node, which facilitates range query
Simple and clear sharding strategy
The mapping relationship between keys and nodes is easy to understand

shortcoming

May cause uneven data distribution
Hot data may be concentrated in a shard
Resolidation operation is complicated

Applicable scenarios

Time series data storage
Geographic location data partition
Scenarios that need to support efficient range query

in conclusion

Redis sharding is an effective strategy to deal with the challenges of large data volume. Each sharding method has its own unique advantages and applicable scenarios. Choosing a suitable sharding strategy requires comprehensive consideration of factors such as data scale, access mode, expansion requirements, and operation and maintenance capabilities.

No matter which sharding strategy is chosen, best practices should be followed, including reasonable data model design, good monitoring and predictive capacity planning to ensure the stability and high performance of the Redis cluster.

The above is the detailed content of the 5 strategies for Redis sharding under large data volume. For more information about Redis sharding strategies, please pay attention to my other related articles!