6 strategies for Redis to prevent cache penetration

In high concurrency systems, Redis has become a standard configuration as a cache middleware, which can effectively reduce database pressure and improve system response speed. However, caching is not omnipotent. In practical applications, we often face a serious problem - cache penetration.

This phenomenon may cause Redis to fail, causing a large number of requests to directly impact the database, causing a sharp decline in system performance or even downtime.

Analysis of cache penetration principle

What is cache penetration

Cache penetration refers to querying a data that does not exist at all. Since the cache misses, the request will penetrate the cache layer and directly access the database. In this case, the database cannot query the corresponding data, so the result cannot be written to the cache, resulting in repeated access to the database every time a similar request is made.

Typical scenarios and hazards

Client ---> Redis (missed) ---> Database (query failed) ---> Not updating cache ---> Loop repetition

The main hazards of cache penetration:

Database pressure surges: A large number of invalid queries fall directly into the database
System response slow: Excessive database load leads to overall performance degradation
Waste of resources: Involuntary queries consume CPU and IO resources
Security risks: Possible to be used maliciously as a means of denial of service attack

There are usually two situations for cache penetration:

Normal business inquiry: The query data does not exist
Malicious attack: Deliberately constructing non-existent keys to make large amounts of requests

The following are six effective prevention strategies.

Strategy 1: empty value cache

principle

Null value cache is the simplest and most direct anti-penetration strategy. When the database cannot query the value corresponding to a key, we still cache the "empty result" (usually represented by a null value or a specific tag) and set a relatively short expiration time. In this way, the next time you request the same non-existent key, you can directly return "empty results" from the cache to avoid querying the database again.

Implementation example

@Service
public class UserServiceImpl implements UserService {
    
    @Autowired
    private StringRedisTemplate redisTemplate;
    
    @Autowired
    private UserMapper userMapper;
    
    private static final String KEY_PREFIX = "user:";
    private static final String EMPTY_VALUE = "{}";  // null value tag    private static final long EMPTY_VALUE_EXPIRE_SECONDS = 300;  //Null value expiration time    private static final long NORMAL_EXPIRE_SECONDS = 3600;  // Normal value expiration time    
    @Override
    public User getUserById(Long userId) {
        String redisKey = KEY_PREFIX + userId;
        
        // 1. Query cache        String userJson = ().get(redisKey);
        
        // 2. Cache hit        if (userJson != null) {
            // Determine whether it is empty            if (EMPTY_VALUE.equals(userJson)) {
                return null;  // Return empty result            }
            // Normal cache, deserialize and return            return (userJson, );
        }
        
        // 3. Cache misses and query the database        User user = (userId);
        
        // 4. Write to cache        if (user != null) {
            // Data is found in the database and written to the normal cache            ().set(redisKey, 
                                           (user), 
                                           NORMAL_EXPIRE_SECONDS, 
                                           );
        } else {
            // No data was found in the database, and the empty value cache is written            ().set(redisKey, 
                                           EMPTY_VALUE, 
                                           EMPTY_VALUE_EXPIRE_SECONDS, 
                                           );
        }
        
        return user;
    }
}

Pros and cons analysis

advantage

Simple implementation without additional components
Low invasiveness to the system
Immediate results

shortcoming

May take up more cache space
If there are many empty values, it may lead to a decrease in cache efficiency
Unable to deal with large-scale malicious attacks
Data inconsistency may occur in the short term (the cache still returns empty value after adding new data)

Strategy 2: Bloom filter

principle

Bloom Filter is a probabilistic data structure with high space efficiency, used to detect whether an element belongs to a set. Its characteristic is that there is a misjudgment, that is, the non-existent element may be misjudged as false positive, but the existing element will not be misjudgmented as false negative.

The Bloom filter contains a long binary vector and a series of hash functions. When an element is inserted, the hash value of the element is calculated using each hash function and the corresponding position in the binary vector is set to 1. When querying, the hash value is also calculated and the corresponding position in the vector is checked. If any bit is 0, the element must not exist; if all bits are 1, the element may exist.

Implementation example

Use Redis's Bloom filter module (Redis 4.0+ supports module extension, RedisBloom is required):

@Service
public class ProductServiceWithBloomFilter implements ProductService {
    
    @Autowired
    private StringRedisTemplate redisTemplate;
    
    @Autowired
    private ProductMapper productMapper;
    
    private static final String BLOOM_FILTER_NAME = "product_filter";
    private static final String CACHE_KEY_PREFIX = "product:";
    private static final long CACHE_EXPIRE_SECONDS = 3600;
    
    // Initialize the Bloom filter, which can be executed when the application starts    @PostConstruct
    public void initBloomFilter() {
        // Determine whether the Bloom filter exists        Boolean exists = (Boolean) ((RedisCallback&lt;Boolean&gt;) connection -&gt; 
            (BLOOM_FILTER_NAME.getBytes()));
        
        if ((exists)) {
            // Create a Bloom filter with an estimated element quantity of 1 million and an error rate of 0.01            ((RedisCallback&lt;Object&gt;) connection -&gt; 
                ("", 
                                  BLOOM_FILTER_NAME.getBytes(), 
                                  "0.01".getBytes(), 
                                  "1000000".getBytes()));
            
            // Load all product IDs into the Bloom filter            List&lt;Long&gt; allProductIds = ();
            for (Long id : allProductIds) {
                ((RedisCallback&lt;Boolean&gt;) connection -&gt; 
                    ("", 
                                      BLOOM_FILTER_NAME.getBytes(), 
                                      ().getBytes()) != 0);
            }
        }
    }
    
    @Override
    public Product getProductById(Long productId) {
        String cacheKey = CACHE_KEY_PREFIX + productId;
        
        // 1. Use the Bloom filter to check whether the ID exists        Boolean mayExist = (Boolean) ((RedisCallback&lt;Boolean&gt;) connection -&gt; 
            ("", 
                             BLOOM_FILTER_NAME.getBytes(), 
                             ().getBytes()) != 0);
        
        // If the Bloom filter determines that it does not exist, it will return directly        if ((mayExist)) {
            return null;
        }
        
        // 2. Query cache        String productJson = ().get(cacheKey);
        if (productJson != null) {
            return (productJson, );
        }
        
        // 3. Query the database        Product product = (productId);
        
        // 4. Update cache        if (product != null) {
            ().set(cacheKey, 
                                           (product), 
                                           CACHE_EXPIRE_SECONDS, 
                                           );
        } else {
            // Bloom filter misjudgment, the product does not exist in the database            // You can consider recording such misjudgment and optimize the Bloom filter parameters            ("Bloom filter false positive for productId: {}", productId);
        }
        
        return product;
    }
    
    // When adding a new product, you need to add the ID to the Bloom filter    public void addProductToBloomFilter(Long productId) {
        ((RedisCallback&lt;Boolean&gt;) connection -&gt; 
            ("", 
                             BLOOM_FILTER_NAME.getBytes(), 
                             ().getBytes()) != 0);
    }
}

Pros and cons analysis

advantage

High space efficiency and small memory usage
Fast query speed, time complexity O(k), k is the number of hash functions
It can effectively filter most non-existent ID queries
Can be used in combination with other policies

shortcoming

There is a possibility of misjudgment (false positive)
Unable to remove elements from Bloom filter (standard implementation)
All data IDs need to be loaded in advance, which is not suitable for scenarios with frequent dynamic changes.
The implementation is relatively complex and requires additional maintenance of the Bloom filter
Regular reconstruction may be required to adapt to data changes

Strategy 3: Request parameter verification

principle

Requesting parameter verification is a means to prevent cache penetration at the business level. By performing legality verification on request parameters, obviously unreasonable requests are filtered out to prevent these requests from reaching the cache and database tiers. This method is particularly suitable for preventing malicious attacks.

Implementation example

@RestController
@RequestMapping("/api/user")
public class UserController {
    
    @Autowired
    private UserService userService;
    
    @GetMapping("/{userId}")
    public ResponseEntity&lt;?&gt; getUserById(@PathVariable String userId) {
        // 1. Basic format verification        if (!("\d+")) {
            return ().body("UserId must be numeric");
        }
        
        // 2. Basic logic verification        long id = (userId);
        if (id &lt;= 0 || id &gt; 100000000) {  // Assume ID range limitation            return ().body("UserId out of valid range");
        }
        
        // 3. Call business services        User user = (id);
        if (user == null) {
            return ().build();
        }
        
        return (user);
    }
}

Parameter verification can also be added at the service layer:

@Service
public class UserServiceImpl implements UserService {
    
    // Whitelist, only these ID prefixes are allowed (for example)    private static final Set&lt;String&gt; ID_PREFIXES = ("100", "200", "300");
    
    @Override
    public User getUserById(Long userId) {
        // More complex business rules verification        String idStr = ();
        boolean valid = false;
        
        for (String prefix : ID_PREFIXES) {
            if ((prefix)) {
                valid = true;
                break;
            }
        }
        
        if (!valid) {
            ("Attempt to access invalid user ID pattern: {}", userId);
            return null;
        }
        
        // Normal business logic...        return getUserFromCacheOrDb(userId);
    }
}

Pros and cons analysis

advantage

Simple implementation without additional components
Can intercept obviously unreasonable access early in the request
You can use business rules to perform fine control
Reduce the overall burden on the system

shortcoming

Unable to cover all illegal request scenarios
You need to have a good understanding of the business to design reasonable verification rules
May introduce complex business logic
Too strict verification may affect the normal user experience

Strategy 4: Interface current limit and fuse

principle

Current limiting is an effective means to control the access frequency of the system, which can prevent burst flow from causing impact on the system. Fuse breaking means that when the system load is too high, some requests are temporarily rejected to protect the system. The combination of these two mechanisms can effectively prevent the systemic risks brought about by cache penetration.

Implementation example

Use SpringBoot+Resilience4j to achieve current limiting and fuse:

@Configuration
public class ResilienceConfig {
    
    @Bean
    public RateLimiterRegistry rateLimiterRegistry() {
        RateLimiterConfig config = ()
            .limitRefreshPeriod((1))
            .limitForPeriod(100)  // 100 requests per second            .timeoutDuration((25))
            .build();
        
        return (config);
    }
    
    @Bean
    public CircuitBreakerRegistry circuitBreakerRegistry() {
        CircuitBreakerConfig config = ()
            .failureRateThreshold(50)  // 50% failure rate triggers the circuit breaker            .slidingWindowSize(100)    // Based on the last 100 calls            .minimumNumberOfCalls(10)  // Fuse will be triggered at least 10 calls            .waitDurationInOpenState((10)) // Waiting time after fuse            .build();
        
        return (config);
    }
}

@Service
public class ProductServiceWithResilience {

    private final ProductMapper productMapper;
    private final StringRedisTemplate redisTemplate;
    private final RateLimiter rateLimiter;
    private final CircuitBreaker circuitBreaker;
    
    public ProductServiceWithResilience(
            ProductMapper productMapper,
            StringRedisTemplate redisTemplate,
            RateLimiterRegistry rateLimiterRegistry,
            CircuitBreakerRegistry circuitBreakerRegistry) {
         = productMapper;
         = redisTemplate;
         = ("productService");
         = ("productService");
    }
    
    public Product getProductById(Long productId) {
        // 1. Apply the current limiter        return (() -&gt; {
            // 2. Apply fuses            return (() -&gt; {
                return doGetProduct(productId);
            });
        });
    }
    
    private Product doGetProduct(Long productId) {
        String cacheKey = "product:" + productId;
        
        // Query cache        String productJson = ().get(cacheKey);
        if (productJson != null) {
            return (productJson, );
        }
        
        // Query the database        Product product = (productId);
        
        // Update cache        if (product != null) {
            ().set(cacheKey, (product), 1, );
        } else {
            // NULL value cache, valid for short-term            ().set(cacheKey, "", 5, );
        }
        
        return product;
    }
    
    // Degradation method after fuse    private Product fallbackMethod(Long productId, Throwable t) {
        ("Circuit breaker triggered for productId: {}", productId, t);
        // Return to the default product or get it from the local cache        return new Product(productId, "Temporary Unavailable", 0.0);
    }
}

Pros and cons analysis

advantage

Provide system-level protection
Can effectively deal with burst traffic and malicious attacks
Ensure system stability and availability
Dynamic adjustments can be made in combination with the monitoring system

shortcoming

May affect normal user experience
Configuration tuning is difficult
Decline strategies that need to be improved
Cannot completely solve the cache penetration problem, but only mitigate its impact

Strategy 5: Cache preheating

principle

Cache warm-up refers to loading the data that may be queried into the cache in advance at the system startup or at a specific point in time to avoid database access caused by cache miss when user requests. For cache penetration issues, warm-up can fill up the space of valid data in advance, reducing the possibility of directly querying the database.

Implementation example

@Component
public class CacheWarmUpTask {
    
    @Autowired
    private ProductMapper productMapper;
    
    @Autowired
    private StringRedisTemplate redisTemplate;
    
    @Autowired
    private RedisBloomFilter bloomFilter;
    
    // Perform cache preheating when the system starts    @PostConstruct
    public void warmUpCacheOnStartup() {
        // Execute warm-up tasks asynchronously to avoid blocking application startup        (this::warmUpHotProducts);
    }
    
    // Refresh the cache of popular products every day at 2 a.m.    @Scheduled(cron = "0 0 2 * * ?")
    public void scheduledWarmUp() {
        warmUpHotProducts();
    }
    
    private void warmUpHotProducts() {
        ("Start preheating the product cache...");
        long startTime = ();
        
        try {
            // 1. Get a list of popular products (such as the TOP5000 sales)            List&lt;Product&gt; hotProducts = (5000);
            
            // 2. Update cache and Bloom filters            for (Product product : hotProducts) {
                String cacheKey = "product:" + ();
                ().set(
                    cacheKey, 
                    (product), 
                    6, 
                );
                
                // Update the Bloom filter                ("product_filter", ().toString());
            }
            
            // 3. At the same time, preheat some necessary aggregate information            List&lt;Category&gt; categories = ();
            for (Category category : categories) {
                String cacheKey = "category:" + ();
                List&lt;Long&gt; productIds = (());
                ().set(
                    cacheKey,
                    (productIds),
                    12, 
                );
            }
            
            long duration = () - startTime;
            ("Cache warm-up is completed，time consuming：{}ms，Preheated product quantity：{}", duration, ());
            
        } catch (Exception e) {
            ("Cache warm-up failed", e);
        }
    }
}

Pros and cons analysis

advantage

Improve access performance after system startup
Reduce cache cold start issues
Can be refreshed regularly to keep data fresh
Avoid users waiting

shortcoming

Unable to cover all possible data access
Take up additional system resources
Invalid for unpopular data
It is necessary to choose the preheating data range reasonably to avoid wasting resources

Strategy 6: Graded Filtering Strategy

principle

The hierarchical filtering strategy is to combine multiple anti-penetration measures to form a multi-layer protective net. By setting filtering conditions at different levels, the system performance can be guaranteed and cache penetration can be prevented to the maximum extent. A typical hierarchical filtering strategy includes: front-end filtering -> API Gateway Filtering -> Application Layer Filtering -> Cache Layer Filtering -> Database Protection.

Implementation example

Here is a comprehensive example of multi-layer protection:

// 1. Gateway layer filtering (using Spring Cloud Gateway)@Configuration
public class GatewayFilterConfig {
    
    @Bean
    public RouteLocator customRouteLocator(RouteLocatorBuilder builder) {
        return ()
            .route("product_route", r -&gt; ("/api/product/**")
                // Path format verification                .and().predicate(exchange -&gt; {
                    String path = ().getURI().getPath();
                    // Check the product/{id} path to make sure the id is a number                    if (("/api/product/\d+")) {
                        String id = (('/') + 1);
                        long productId = (id);
                        return productId &gt; 0 &amp;&amp; productId &lt; 10000000; // Reasonable scope inspection                    }
                    return true;
                })
                // Current limit filtering                .filters(f -&gt; ()
                    .rateLimiter(, c -&gt; (10).setBurstCapacity(20))
                    .and()
                    .circuitBreaker(c -&gt; ("productCB").setFallbackUri("forward:/fallback"))
                )
                .uri("lb://product-service")
            )
            .build();
    }
}

// 2. Application layer filtering (Resilience4j + Bloom Filter)@Service
public class ProductServiceImpl implements ProductService {
    
    private final StringRedisTemplate redisTemplate;
    private final ProductMapper productMapper;
    private final BloomFilter&lt;String&gt; localBloomFilter;
    private final RateLimiter rateLimiter;
    private final CircuitBreaker circuitBreaker;
    
    @Value("${-seconds:3600}")
    private int cacheExpireSeconds;
    
    // Constructor injection...    
    @PostConstruct
    public void initLocalFilter() {
        // Create a local Bloom filter as secondary protection        localBloomFilter = (
            (StandardCharsets.UTF_8),
            1000000,  // Expected number of elements            0.001     // Misjudgment rate        );
        
        // Initialize local Bloom filter data        List&lt;String&gt; allProductIds = ();
        for (String id : allProductIds) {
            (id);
        }
    }
    
    @Override
    public Product getProductById(Long productId) {
        String productIdStr = ();
        
        // 1. Local Bloom filter pre-check        if (!(productIdStr)) {
            ("Product filtered by local bloom filter: {}", productId);
            return null;
        }
        
        // 2. Redis Bloom filter secondary inspection        Boolean mayExist = (
            (RedisCallback&lt;Boolean&gt;) connection -&gt; (
                "", 
                "product_filter".getBytes(), 
                ()
            ) != 0
        );
        
        if ((mayExist)) {
            ("Product filtered by Redis bloom filter: {}", productId);
            return null;
        }
        
        // 3. Apply current limiting and fuse protection        try {
            return (() -&gt; 
                (() -&gt; {
                    return getProductFromCacheOrDb(productId);
                })
            );
        } catch (RequestNotPermitted e) {
            ("Request rate limited for product: {}", productId);
            throw new ServiceException("Service is busy, please try again later");
        } catch (CallNotPermittedException e) {
            ("Circuit breaker open for product queries");
            throw new ServiceException("Service is temporarily unavailable");
        }
    }
    
    private Product getProductFromCacheOrDb(Long productId) {
        String cacheKey = "product:" + productId;
        
        // 4. Query cache        String cachedValue = ().get(cacheKey);
        
        if (cachedValue != null) {
            // Handle empty value cache situation            if (()) {
                return null;
            }
            return (cachedValue, );
        }
        
        // 5. Query the database (add to DB protection)        Product product = null;
        try {
            product = (productId);
        } catch (Exception e) {
            ("Database error when querying product: {}", productId, e);
            throw new ServiceException("System error, please try again later");
        }
        
        // 6. Update cache (null values are also cached)        if (product != null) {
            ().set(
                cacheKey, 
                (product),
                cacheExpireSeconds,
                
            );
            
            // Make sure the Bloom filter contains this ID            (
                (RedisCallback&lt;Boolean&gt;) connection -&gt; (
                    "", 
                    "product_filter".getBytes(), 
                    ().getBytes()
                ) != 0
            );
            
            (());
        } else {
            // Cache null value, expires for a short period of time            ().set(
                cacheKey,
                "",
                60, // Short-term cache of empty values                
            );
        }
        
        return product;
    }
}

Pros and cons analysis

advantage

Provide all-round system protection
Each layer of protection complements each other to form a complete line of defense
Allows flexibly to configure policies at each level
Minimize resource waste and performance loss

shortcoming

High complexity
The configurations of each layer need to be consistent
May increase system response time
Relatively high maintenance costs

Summarize

Preventing cache penetration is not only a technical issue, but also an important link in system design and operation and maintenance.

In practical applications, the appropriate strategy combination should be selected based on the specific business scenario and system scale. Usually, a single strategy is difficult to completely solve the problem, while a combination strategy can provide more comprehensive protection. Regular monitoring and performance evaluation are necessary means to ensure the efficient operation of the cache system.

This is the end of this article about Redis’s 6 strategies to prevent cache penetration. For more related Redis cache penetration content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!