1. Modulo Sharding
Module-taking sharding is the most intuitive hash sharding method. The modulus of the node number is determined based on the hash value of the key to determine the sharding position.
How it works
- Calculate the hash value of the key
- Take the modulus of the total number of nodes to get the node index
- Route the operation to the corresponding node
Implementation example
public class ModuloSharding { private final List<JedisPool> shards; public ModuloSharding(List<String> redisHosts, int port) { shards = new ArrayList<>(); for (String host : redisHosts) { (new JedisPool(new JedisPoolConfig(), host, port)); } } private int getShardIndex(String key) { return (() % ()); } public String get(String key) { int index = getShardIndex(key); try (Jedis jedis = (index).getResource()) { return (key); } } public void set(String key, String value) { int index = getShardIndex(key); try (Jedis jedis = (index).getResource()) { (key, value); } } // All keys need to be remapping when the number of nodes changes public void reshardData(List<String> newHosts, int port) { List<JedisPool> newShards = new ArrayList<>(); for (String host : newHosts) { (new JedisPool(new JedisPoolConfig(), host, port)); } // Here you need to migrate data, traverse all keys and reassign // More complex logic is needed in actual implementation to handle the migration of large amounts of data // ... = newShards; } }
Pros and cons
advantage
- Extremely simple to implement
- The data distribution is relatively uniform when the number of nodes is fixed
- Small calculation overhead
shortcoming
- A large amount of data migration is required when the number of nodes changes (almost all keys are remapping)
- Possible hot issues
- Not suitable for scenarios where frequent expansion and expansion are required
Applicable scenarios
- Scenario with relatively fixed number of nodes
- Small applications that are simple to implement and have low demand for expansion
- Systems with small data volume and can accept full migration
2. Proxy-based Sharding
Proxy sharding manages sharding logic by introducing an intermediate proxy layer. Common proxies include Twemproxy (nutcracker) and Codis.
How it works
- Agent serves as the intermediate layer between the application and the Redis node
- Clients connect to proxy rather than directly connecting to Redis
- Agent routes requests to the correct Redis node according to internal algorithm
Twemproxy configuration example
alpha: listen: 127.0.0.1:22121 hash: fnv1a_64 distribution: ketama auto_eject_hosts: true redis: true server_retry_timeout: 2000 server_failure_limit: 3 servers: - 127.0.0.1:6379:1 - 127.0.0.1:6380:1 - 127.0.0.1:6381:1
Pros and cons
advantage
- Transparent to applications, clients do not need to perceive sharding details
- Reduce the number of connections between clients and Redis
- Easy to manage and monitor
shortcoming
- Introduce single point of failure risk
- Added additional network latency
- Expansion usually requires manual operation
- Proxy layer may become a performance bottleneck
Applicable scenarios
- Scenarios where minimal changes to existing systems are required
- Unified sharding strategy in multi-language environment
- High concurrency scenarios where the number of connections needs to be controlled
3. Redis Cluster
Redis Cluster is a cluster solution officially provided by Redis, and it is supported starting from Redis 3.0.
How it works
- Using the concept of hash slots, a total of 16384 slots are
- Each key is calculated according to the CRC16 algorithm and modulo 16384 and mapped to the slot.
- The slots are assigned to different nodes
- Supports automatic data migration and replication between nodes
Configuration and construction
Node configuration example:
port 7000 cluster-enabled yes cluster-config-file cluster-node-timeout 5000 appendonly yes
Create a cluster command:
redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 \ 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1
Client support code example
// Connect to Redis Cluster using Lettuce clientRedisURI redisUri = .redis("127.0.0.1", 7000) .withTimeout((60)) .build(); RedisClusterClient clusterClient = (redisUri); StatefulRedisClusterConnection<String, String> connection = (); RedisAdvancedClusterCommands<String, String> commands = (); // Normal operation, the client will handle cluster routing("user:1000", "Zhang San"); String value = ("user:1000");
Pros and cons
advantage
- Official native support, continuous update and maintenance
- Decentralized architecture, no single point of failure
- Automatic fault detection and failover
- Automatically handle data sharding and migration between nodes
shortcoming
- The client needs to support the cluster protocol
- Multi-key operation is limited by the slot mechanism (must be in the same slot)
- High resource consumption and high communication overhead
- Configuration management is relatively complex
Applicable scenarios
- Large-scale Redis deployment
- Need for high availability and automatic failure recovery
- Data volume and load grow dynamically over time
- Redis official ecological support environment
4. Consistent hashing
The consistent hashing algorithm can minimize the keys that need to be remapped when nodes change, and is suitable for environments where nodes change frequently.
How it works
- Map hash space to a ring (0 to 2^32-1)
- Redis nodes are mapped to certain points on the ring
- Each key finds the first node encountered clockwise
- Adding or deleting nodes only affects data from neighboring nodes
Implementation example
public class ConsistentHashSharding { private final SortedMap<Integer, JedisPool> circle = new TreeMap<>(); private final int numberOfReplicas; private final HashFunction hashFunction; public ConsistentHashSharding(List<String> nodes, int replicas) { = replicas; = Hashing.murmur3_32(); for (String node : nodes) { addNode(node); } } public void addNode(String node) { for (int i = 0; i < numberOfReplicas; i++) { String virtualNode = node + "-" + i; int hash = (virtualNode, ()).asInt(); (hash, new JedisPool(new JedisPoolConfig(), (":")[0], ((":")[1]))); } } public void removeNode(String node) { for (int i = 0; i < numberOfReplicas; i++) { String virtualNode = node + "-" + i; int hash = (virtualNode, ()).asInt(); (hash); } } public JedisPool getNode(String key) { if (()) { return null; } int hash = (key, ()).asInt(); if (!(hash)) { SortedMap<Integer, JedisPool> tailMap = (hash); hash = () ? () : (); } return (hash); } public String get(String key) { JedisPool pool = getNode(key); try (Jedis jedis = ()) { return (key); } } public void set(String key, String value) { JedisPool pool = getNode(key); try (Jedis jedis = ()) { (key, value); } } }
Pros and cons
advantage
- Minimize data migration when node changes
- Relatively uniform data distribution
- Suitable for dynamic scaling environments
shortcoming
- More complex implementation
- Virtual nodes introduce additional memory overhead
- There may still be imbalance in the data distribution
Applicable scenarios
- Environments where nodes frequently increase and decrease
- Large-scale applications that require dynamic scaling
- Scenarios that are sensitive to data migration costs
5. Range-based Sharding
Slicing data to different nodes based on ranges of key values is especially suitable for ordered data sets.
How it works
- Predefined key range division
- Determine the storage node according to the key's range
- Usually used in combination with ordered keys, such as time series data
Implementation example
public class RangeSharding { private final TreeMap<Long, JedisPool> rangeMap = new TreeMap<>(); public RangeSharding() { // Assume that slices by user ID range (0L, new JedisPool("", 6379)); // 0-999999 (1000000L, new JedisPool("", 6379)); // 1000000-1999999 (2000000L, new JedisPool("", 6379)); // 2000000-2999999 // More scope... } private JedisPool getShardForUserId(long userId) { <Long, JedisPool> entry = (userId); if (entry == null) { throw new IllegalArgumentException("No shard available for userId: " + userId); } return (); } public String getUserData(long userId) { JedisPool pool = getShardForUserId(userId); try (Jedis jedis = ()) { return ("user:" + userId); } } public void setUserData(long userId, String data) { JedisPool pool = getShardForUserId(userId); try (Jedis jedis = ()) { ("user:" + userId, data); } } }
Pros and cons
advantage
- Data in a specific range is located in the same node, which facilitates range query
- Simple and clear sharding strategy
- The mapping relationship between keys and nodes is easy to understand
shortcoming
- May cause uneven data distribution
- Hot data may be concentrated in a shard
- Resolidation operation is complicated
Applicable scenarios
- Time series data storage
- Geographic location data partition
- Scenarios that need to support efficient range query
in conclusion
Redis sharding is an effective strategy to deal with the challenges of large data volume. Each sharding method has its own unique advantages and applicable scenarios. Choosing a suitable sharding strategy requires comprehensive consideration of factors such as data scale, access mode, expansion requirements, and operation and maintenance capabilities.
No matter which sharding strategy is chosen, best practices should be followed, including reasonable data model design, good monitoring and predictive capacity planning to ensure the stability and high performance of the Redis cluster.
The above is the detailed content of the 5 strategies for Redis sharding under large data volume. For more information about Redis sharding strategies, please pay attention to my other related articles!