SoFunction
Updated on 2025-05-19

5 strategies for Redis sharding under large data volume

1. Modulo Sharding

Module-taking sharding is the most intuitive hash sharding method. The modulus of the node number is determined based on the hash value of the key to determine the sharding position.

How it works

  • Calculate the hash value of the key
  • Take the modulus of the total number of nodes to get the node index
  • Route the operation to the corresponding node

Implementation example

public class ModuloSharding {
    private final List<JedisPool> shards;
    
    public ModuloSharding(List<String> redisHosts, int port) {
        shards = new ArrayList<>();
        for (String host : redisHosts) {
            (new JedisPool(new JedisPoolConfig(), host, port));
        }
    }
    
    private int getShardIndex(String key) {
        return (() % ());
    }
    
    public String get(String key) {
        int index = getShardIndex(key);
        try (Jedis jedis = (index).getResource()) {
            return (key);
        }
    }
    
    public void set(String key, String value) {
        int index = getShardIndex(key);
        try (Jedis jedis = (index).getResource()) {
            (key, value);
        }
    }
    
    // All keys need to be remapping when the number of nodes changes    public void reshardData(List<String> newHosts, int port) {
        List<JedisPool> newShards = new ArrayList<>();
        for (String host : newHosts) {
            (new JedisPool(new JedisPoolConfig(), host, port));
        }
        
        // Here you need to migrate data, traverse all keys and reassign        // More complex logic is needed in actual implementation to handle the migration of large amounts of data        // ...
        
         = newShards;
    }
}

Pros and cons

advantage

  • Extremely simple to implement
  • The data distribution is relatively uniform when the number of nodes is fixed
  • Small calculation overhead

shortcoming

  • A large amount of data migration is required when the number of nodes changes (almost all keys are remapping)
  • Possible hot issues
  • Not suitable for scenarios where frequent expansion and expansion are required

Applicable scenarios

  • Scenario with relatively fixed number of nodes
  • Small applications that are simple to implement and have low demand for expansion
  • Systems with small data volume and can accept full migration

2. Proxy-based Sharding

Proxy sharding manages sharding logic by introducing an intermediate proxy layer. Common proxies include Twemproxy (nutcracker) and Codis.

How it works

  • Agent serves as the intermediate layer between the application and the Redis node
  • Clients connect to proxy rather than directly connecting to Redis
  • Agent routes requests to the correct Redis node according to internal algorithm

Twemproxy configuration example

alpha:
  listen: 127.0.0.1:22121
  hash: fnv1a_64
  distribution: ketama
  auto_eject_hosts: true
  redis: true
  server_retry_timeout: 2000
  server_failure_limit: 3
  servers:
   - 127.0.0.1:6379:1
   - 127.0.0.1:6380:1
   - 127.0.0.1:6381:1

Pros and cons

advantage

  • Transparent to applications, clients do not need to perceive sharding details
  • Reduce the number of connections between clients and Redis
  • Easy to manage and monitor

shortcoming

  • Introduce single point of failure risk
  • Added additional network latency
  • Expansion usually requires manual operation
  • Proxy layer may become a performance bottleneck

Applicable scenarios

  • Scenarios where minimal changes to existing systems are required
  • Unified sharding strategy in multi-language environment
  • High concurrency scenarios where the number of connections needs to be controlled

3. Redis Cluster

Redis Cluster is a cluster solution officially provided by Redis, and it is supported starting from Redis 3.0.

How it works

  • Using the concept of hash slots, a total of 16384 slots are
  • Each key is calculated according to the CRC16 algorithm and modulo 16384 and mapped to the slot.
  • The slots are assigned to different nodes
  • Supports automatic data migration and replication between nodes

Configuration and construction

Node configuration example:

port 7000
cluster-enabled yes
cluster-config-file 
cluster-node-timeout 5000
appendonly yes

Create a cluster command:

redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \
  127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1

Client support code example

// Connect to Redis Cluster using Lettuce clientRedisURI redisUri = 
    .redis("127.0.0.1", 7000)
    .withTimeout((60))
    .build();

RedisClusterClient clusterClient = (redisUri);
StatefulRedisClusterConnection<String, String> connection = ();
RedisAdvancedClusterCommands<String, String> commands = ();

// Normal operation, the client will handle cluster routing("user:1000", "Zhang San");
String value = ("user:1000");

Pros and cons

advantage

  • Official native support, continuous update and maintenance
  • Decentralized architecture, no single point of failure
  • Automatic fault detection and failover
  • Automatically handle data sharding and migration between nodes

shortcoming

  • The client needs to support the cluster protocol
  • Multi-key operation is limited by the slot mechanism (must be in the same slot)
  • High resource consumption and high communication overhead
  • Configuration management is relatively complex

Applicable scenarios

  • Large-scale Redis deployment
  • Need for high availability and automatic failure recovery
  • Data volume and load grow dynamically over time
  • Redis official ecological support environment

4. Consistent hashing

The consistent hashing algorithm can minimize the keys that need to be remapped when nodes change, and is suitable for environments where nodes change frequently.

How it works

  • Map hash space to a ring (0 to 2^32-1)
  • Redis nodes are mapped to certain points on the ring
  • Each key finds the first node encountered clockwise
  • Adding or deleting nodes only affects data from neighboring nodes

Implementation example

public class ConsistentHashSharding {
    private final SortedMap<Integer, JedisPool> circle = new TreeMap<>();
    private final int numberOfReplicas;
    private final HashFunction hashFunction;
    
    public ConsistentHashSharding(List<String> nodes, int replicas) {
         = replicas;
         = Hashing.murmur3_32();
        
        for (String node : nodes) {
            addNode(node);
        }
    }
    
    public void addNode(String node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            String virtualNode = node + "-" + i;
            int hash = (virtualNode, ()).asInt();
            (hash, new JedisPool(new JedisPoolConfig(), (":")[0], 
                       ((":")[1])));
        }
    }
    
    public void removeNode(String node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            String virtualNode = node + "-" + i;
            int hash = (virtualNode, ()).asInt();
            (hash);
        }
    }
    
    public JedisPool getNode(String key) {
        if (()) {
            return null;
        }
        
        int hash = (key, ()).asInt();
        
        if (!(hash)) {
            SortedMap<Integer, JedisPool> tailMap = (hash);
            hash = () ? () : ();
        }
        
        return (hash);
    }
    
    public String get(String key) {
        JedisPool pool = getNode(key);
        try (Jedis jedis = ()) {
            return (key);
        }
    }
    
    public void set(String key, String value) {
        JedisPool pool = getNode(key);
        try (Jedis jedis = ()) {
            (key, value);
        }
    }
}

Pros and cons

advantage

  • Minimize data migration when node changes
  • Relatively uniform data distribution
  • Suitable for dynamic scaling environments

shortcoming

  • More complex implementation
  • Virtual nodes introduce additional memory overhead
  • There may still be imbalance in the data distribution

Applicable scenarios

  • Environments where nodes frequently increase and decrease
  • Large-scale applications that require dynamic scaling
  • Scenarios that are sensitive to data migration costs

5. Range-based Sharding

Slicing data to different nodes based on ranges of key values ​​is especially suitable for ordered data sets.

How it works

  • Predefined key range division
  • Determine the storage node according to the key's range
  • Usually used in combination with ordered keys, such as time series data

Implementation example

public class RangeSharding {
    private final TreeMap&lt;Long, JedisPool&gt; rangeMap = new TreeMap&lt;&gt;();
    
    public RangeSharding() {
        // Assume that slices by user ID range        (0L, new JedisPool("", 6379));      // 0-999999
        (1000000L, new JedisPool("", 6379)); // 1000000-1999999
        (2000000L, new JedisPool("", 6379)); // 2000000-2999999
        // More scope...    }
    
    private JedisPool getShardForUserId(long userId) {
        &lt;Long, JedisPool&gt; entry = (userId);
        if (entry == null) {
            throw new IllegalArgumentException("No shard available for userId: " + userId);
        }
        return ();
    }
    
    public String getUserData(long userId) {
        JedisPool pool = getShardForUserId(userId);
        try (Jedis jedis = ()) {
            return ("user:" + userId);
        }
    }
    
    public void setUserData(long userId, String data) {
        JedisPool pool = getShardForUserId(userId);
        try (Jedis jedis = ()) {
            ("user:" + userId, data);
        }
    }
}

Pros and cons

advantage

  • Data in a specific range is located in the same node, which facilitates range query
  • Simple and clear sharding strategy
  • The mapping relationship between keys and nodes is easy to understand

shortcoming

  • May cause uneven data distribution
  • Hot data may be concentrated in a shard
  • Resolidation operation is complicated

Applicable scenarios

  • Time series data storage
  • Geographic location data partition
  • Scenarios that need to support efficient range query

in conclusion

Redis sharding is an effective strategy to deal with the challenges of large data volume. Each sharding method has its own unique advantages and applicable scenarios. Choosing a suitable sharding strategy requires comprehensive consideration of factors such as data scale, access mode, expansion requirements, and operation and maintenance capabilities.

No matter which sharding strategy is chosen, best practices should be followed, including reasonable data model design, good monitoring and predictive capacity planning to ensure the stability and high performance of the Redis cluster.

The above is the detailed content of the 5 strategies for Redis sharding under large data volume. For more information about Redis sharding strategies, please pay attention to my other related articles!