Share 5 service availability guarantee technologies in SpringBoot

1. Circuit Breaker

Basic Principles

The fuse mode draws on the concept of circuit fuses. When a service or component in the system is detected to fail frequently, it automatically "disconnects" calls to the service, preventing cascading failures and providing recovery time for the failed service. There are three states of fuses:

Close status: Perform operations normally and monitor failure rate
Open status: Access denied, directly return an error or perform downgrade logic
Half open state: Try to restore, allowing limited requests to pass to test whether the service is restored

SpringBoot implementation and integration

In SpringBoot, we can implement fuse mode using Resilience4j, which is a lightweight alternative to Hystrix, designed for Java 8 and functional programming.

First add dependencies:

<dependency>
    <groupId>.resilience4j</groupId>
    <artifactId>resilience4j-spring-boot2</artifactId>
    <version>1.7.0</version>
</dependency>
<dependency>
    <groupId></groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>
<dependency>
    <groupId></groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Then configure the fuse parameters:

resilience4j:
  circuitbreaker:
    instances:
      orderService:
        registerHealthIndicator: true
        slidingWindowSize: 10
        minimumNumberOfCalls: 5
        permittedNumberOfCallsInHalfOpenState: 3
        automaticTransitionFromOpenToHalfOpenEnabled: true
        waitDurationInOpenState: 5s
        failureRateThreshold: 50
        eventConsumerBufferSize: 10

Example code using fuses:

@Service
public class OrderService {
    
    private final PaymentServiceClient paymentServiceClient;
    
    public OrderService(PaymentServiceClient paymentServiceClient) {
         = paymentServiceClient;
    }
    
    @CircuitBreaker(name = "orderService", fallbackMethod = "processOrderFallback")
    public OrderResponse processOrder(OrderRequest orderRequest) {
        // Normal order processing logic, including calling payment services        PaymentResponse paymentResponse = (());
        return new OrderResponse((), "PROCESSED", ());
    }
    
    // Degradation method, executed when the fuse is triggered    public OrderResponse processOrderFallback(OrderRequest orderRequest, Exception e) {
        ("Circuit breaker triggered for order: {}. Error: {}", (), ());
        // Returns a downgrade response, which may be fetched from the local cache, or using the default value        return new OrderResponse((), "PENDING", null);
    }
}

Best Practices

Appropriate window size: Set up reasonablyslidingWindowSize, Too small may cause the fuse to be too sensitive, and too large may cause the reaction to be slow.
A reasonable threshold: Set according to business needsfailureRateThreshold, generally recommended to be between 50% and 60%.
Monitor fuse status: Integrated Spring Boot Actuator to monitor fuse status:

management:
  endpoints:
    web:
      exposure:
        include: health,circuitbreakers
  health:
    circuitbreakers:
      enabled: true

Fine-grained fuse: Configure different fuse instances for different service dependencies to avoid one service failure affecting multiple business processes.
Testing circuit breaker behavior: Use chaos test to verify whether the fuse behaves in the fault situation in accordance with expectations.

2. Rate Limiting Technology

Basic Principles

Current limiting is used to control the system's request processing rate to prevent system overload. Common current limiting algorithms include:

Token bucket: Add tokens to the bucket at a fixed rate, and the request needs to consume the token to be processed.
Leaked bucket: Requests are processed at a fixed rate, exceeding the portion is queued or rejected.
counter: Limit the number of requests within a fixed time window.

SpringBoot implementation and integration

In SpringBoot, we can use Bucket4j to implement API current limiting, which is a Java current limiting library based on token bucket algorithm.

Add dependencies:

<dependency>
    <groupId>-bukhtoyarov</groupId>
    <artifactId>bucket4j-core</artifactId>
    <version>4.10.0</version>
</dependency>
<dependency>
    <groupId></groupId>
    <artifactId>spring-boot-starter-cache</artifactId>
</dependency>
<dependency>
    <groupId></groupId>
    <artifactId>caffeine</artifactId>
</dependency>

Configure cache and current limit:

@Configuration
public class RateLimitingConfig {

    @Bean
    public CacheManager cacheManager() {
        CaffeineCacheManager cacheManager = new CaffeineCacheManager("rateLimit");
        (()
                .expireAfterWrite(1, )
                .maximumSize(1000));
        return cacheManager;
    }
    
    @Bean
    public Bucket4jCacheConfiguration bucket4jCacheConfiguration() {
        return new Bucket4jCacheConfiguration(cacheManager(), "rateLimit");
    }
}

Implement the current limit interceptor:

@Component
public class RateLimitingInterceptor implements HandlerInterceptor {

    private final Cache<String, Bucket> cache;
    
    public RateLimitingInterceptor() {
         = ()
                .expireAfterWrite(1, )
                .maximumSize(1000)
                .build();
    }
    
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
        String apiKey = ("X-API-KEY");
        if (apiKey == null || ()) {
            (HttpStatus.BAD_REQUEST.value(), "Missing API key");
            return false;
        }
        
        Bucket bucket = (apiKey, key -> createNewBucket());
        ConsumptionProbe probe = (1);
        
        if (()) {
            ("X-Rate-Limit-Remaining", (()));
            return true;
        } else {
            long waitForRefill = () / 1_000_000_000;
            ("X-Rate-Limit-Retry-After-Seconds", (waitForRefill));
            (HttpStatus.TOO_MANY_REQUESTS.value(), "Rate limit exceeded");
            return false;
        }
    }
    
    private Bucket createNewBucket() {
        BucketConfiguration config = ()
                .addLimit((100, (100, (1))))
                .addLimit((1000, (1000, (1))))
                .build();
        return ().withConfiguration(config).build();
    }
}

@Configuration
public class WebMvcConfig implements WebMvcConfigurer {

    @Autowired
    private RateLimitingInterceptor rateLimitingInterceptor;
    
    @Override
    public void addInterceptors(InterceptorRegistry registry) {
        (rateLimitingInterceptor)
                .addPathPatterns("/api/**");
    }
}

Implement current limit in Spring Cloud Gateway:

spring:
  cloud:
    gateway:
      routes:
        - id: order-service
          uri: lb://order-service
          predicates:
            - Path=/orders/**
          filters:
            - name: RequestRateLimiter
              args:
                : 10
                : 20
                : 1
                key-resolver: "#{@userKeyResolver}"

@Configuration
public class GatewayConfig {
    
    @Bean
    public KeyResolver userKeyResolver() {
        return exchange -> {
            String userId = ().getHeaders().getFirst("User-Id");
            if (userId == null) {
                userId = "anonymous";
            }
            return (userId);
        };
    }
}

Best Practices

Graded current limit: Set different current limit thresholds based on different user types or API importance.
Apply multi-level current limit: For example, use user-level, IP-level and global-level current limits at the same time.
Current limit response: Returns the appropriate HTTP status code (usually 429) and clear error messages when the current limit triggers, including retry suggestions.
Monitor current limit indicators: Collect current limiting indicators to analyze and adjust current limiting strategies.
Elegant downgrade: When the current limit threshold is reached, consider providing downgrade services rather than reject them completely.

3. Service downgrade and fault tolerance processing

Basic Principles

Service downgrade is a strategy to maintain overall system availability by providing limited but acceptable services when the system is high in load or some of its services are unavailable. Fault-tolerant processing refers to the ability of the system to detect and process errors while continuing to operate normally.

SpringBoot implementation and integration

In SpringBoot, service downgrades can be achieved in a variety of ways, including in conjunction with fuses, using asynchronous fallbacks, and implementing timeout control.

Use Resilience4j's Fallback to implement service downgrade:

@Service
public class ProductService {
    
    private final ProductRepository productRepository;
    private final ProductCacheService productCacheService;
    
    @Autowired
    public ProductService(ProductRepository productRepository, ProductCacheService productCacheService) {
         = productRepository;
         = productCacheService;
    }
    
    @CircuitBreaker(name = "productService", fallbackMethod = "getProductDetailsFallback")
    @Bulkhead(name = "productService", fallbackMethod = "getProductDetailsFallback")
    @TimeLimiter(name = "productService", fallbackMethod = "getProductDetailsFallback")
    public CompletableFuture&lt;ProductDetails&gt; getProductDetails(String productId) {
        return (() -&gt; {
            // Logic for obtaining product details            Product product = (productId)
                    .orElseThrow(() -&gt; new ProductNotFoundException(productId));
            
            // Get real-time inventory and price information            InventoryInfo inventory = (productId);
            PricingInfo pricing = (productId);
            
            return new ProductDetails(product, inventory, pricing);
        });
    }
    
    // Downgrade method, providing basic product information and cache inventory and price    public CompletableFuture&lt;ProductDetails&gt; getProductDetailsFallback(String productId, Exception e) {
        ("Fallback for product {}. Reason: {}", productId, ());
        return (() -&gt; {
            // Get basic product information from cache            Product product = (productId)
                    .orElse(new Product(productId, "Unknown Product", "No details available"));
            
            // Use default inventory and price information            InventoryInfo inventory = new InventoryInfo(productId, 0, false);
            PricingInfo pricing = new PricingInfo(productId, 0.0, false);
            
            return new ProductDetails(product, inventory, pricing, true);
        });
    }
}

Configure timeout and service isolation:

resilience4j:
  timelimiter:
    instances:
      productService:
        timeoutDuration: 2s
        cancelRunningFuture: true
  bulkhead:
    instances:
      productService:
        maxConcurrentCalls: 20
        maxWaitDuration: 500ms

Filters to implement elegant downgrade strategies:

@Component
public class GracefulDegradationFilter extends OncePerRequestFilter {
    
    private final HealthCheckService healthCheckService;
    
    @Autowired
    public GracefulDegradationFilter(HealthCheckService healthCheckService) {
         = healthCheckService;
    }
    
    @Override
    protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, 
                                  FilterChain filterChain) throws ServletException, IOException {
        
        String path = ();
        
        // Check the health status of the system        SystemHealth health = ();
        
        if (() &amp;&amp; isNonCriticalPath(path)) {
            // Non-critical path requests in high load degradation stage            sendDegradedResponse(response, "Service temporarily operating at reduced capacity");
            return;
        } else if (() &amp;&amp; !isAdminPath(path)) {
            // Only management requests are allowed in maintenance mode            sendMaintenanceResponse(response);
            return;
        } else if (() &amp;&amp; dependsOnFailedServices(path, ())) {
            // If the requested dependent service is unavailable, return a downgrade response            sendDependencyFailureResponse(response, ());
            return;
        }
        
        //Requests are processed normally        (request, response);
    }
    
    private boolean isNonCriticalPath(String path) {
        // Determine whether the request is a non-critical path        return ("/api/recommendations") || 
               ("/api/analytics") ||
               ("/api/marketing");
    }
    
    private boolean isAdminPath(String path) {
        return ("/admin") || ("/management");
    }
    
    private boolean dependsOnFailedServices(String path, List&lt;String&gt; failedServices) {
        // Check whether the request depends on the failed service        Map&lt;String, List&lt;String&gt;&gt; serviceDependencies = new HashMap&lt;&gt;();
        ("/api/orders", ("payment-service", "inventory-service"));
        ("/api/payments", ("payment-service"));
        // ... Dependencies between other paths and services        
        String matchingPath = findMatchingPath(path, ());
        if (matchingPath != null) {
            List&lt;String&gt; dependencies = (matchingPath);
            return ().anyMatch(failedServices::contains);
        }
        return false;
    }
    
    private String findMatchingPath(String requestPath, Set&lt;String&gt; configuredPaths) {
        // Find the configuration path that matches the request path        return ()
                .filter(requestPath::startsWith)
                .findFirst()
                .orElse(null);
    }
    
    private void sendDegradedResponse(HttpServletResponse response, String message) throws IOException {
        (HttpStatus.SERVICE_UNAVAILABLE.value());
        (MediaType.APPLICATION_JSON_VALUE);
        
        Map&lt;String, Object&gt; responseBody = new HashMap&lt;&gt;();
        ("status", "degraded");
        ("message", message);
        ("retry_after", 30); // It is recommended to try again after 30 seconds        
        ().write(new ObjectMapper().writeValueAsString(responseBody));
    }
    
    // Other response processing methods...}

Best Practices

Grading downgrade strategy: Formulate a hierarchical downgrade strategy for different failure scenarios and service importance.
Static downgrade: Prepare static resources or cached data in advance to use when the service is unavailable.
Functional downgrade: Temporarily close non-core functions to ensure the normal core business.
Degraded by specific user groups: Under high load conditions, priority is given to ensuring the experience of VIP users.
Service quarantine: Use Bulkhead mode to isolate resources of different services to prevent problems of one service from affecting other services.
Timeout control: Set a reasonable timeout time to prevent long-term waiting from affecting the user experience.

4. Retry mechanism (Retry)

Basic Principles

The retry mechanism is used to handle temporary failures, improving the resilience of the system by automatically retrying failed operations. It is especially effective for scenarios such as network jitter and temporary unavailability of databases.

SpringBoot implementation and integration

The Spring Retry library can be used in SpringBoot to implement the retry function.

Add dependencies:

<dependency>
    <groupId></groupId>
    <artifactId>spring-retry</artifactId>
</dependency>
<dependency>
    <groupId></groupId>
    <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

Enable the retry function:

@SpringBootApplication
@EnableRetry
public class MyApplication {
    public static void main(String[] args) {
        (, args);
    }
}

Try again using declarative:

@Service
public class RemoteServiceClient {
    
    private final RestTemplate restTemplate;
    
    @Autowired
    public RemoteServiceClient(RestTemplate restTemplate) {
         = restTemplate;
    }
    
    @Retryable(
        value = {, },
        maxAttempts = 3,
        backoff = @Backoff(delay = 1000, multiplier = 2)
    )
    public ResponseEntity&lt;OrderData&gt; getOrderDetails(String orderId) {
        ("Attempting to fetch order details for {}", orderId);
        return ("/api/orders/" + orderId, );
    }
    
    @Recover
    public ResponseEntity&lt;OrderData&gt; recoverGetOrderDetails(Exception e, String orderId) {
        ("All retries failed for order {}. Last error: {}", orderId, ());
        // Return cached data or default response        return (new OrderData(orderId, "UNKNOWN", new Date(), ()));
    }
}

Try again using programmatic:

@Service
public class PaymentService {
    
    private final RetryTemplate retryTemplate;
    
    @Autowired
    public PaymentService(RetryTemplate retryTemplate) {
         = retryTemplate;
    }
    
    public PaymentResult processPayment(PaymentRequest paymentRequest) {
        return (context -&gt; {
            // Get the current number of retries            int retryCount = ();
            ("Processing payment attempt {} for order {}", 
                     retryCount + 1, ());
            
            try {
                // Perform payment processing                return (paymentRequest);
            } catch (PaymentGatewayException e) {
                // Analyze the exception and decide whether to try again                if (()) {
                    ("Retryable payment error: {}. Will retry.", ());
                    throw e; // Throw an exception to trigger a retry                } else {
                    ("Non-retryable payment error: {}", ());
                    throw new NonRetryableException("Payment failed with non-retryable error", e);
                }
            }
        }, context -&gt; {
            // Recovery strategy            ("All payment retries failed for order {}", ());
            // Return the failed result and record it requires subsequent processing            return ((), "Maximum retries exceeded");
        });
    }
}

@Configuration
public class RetryConfig {
    
    @Bean
    public RetryTemplate retryTemplate() {
        RetryTemplate retryTemplate = new RetryTemplate();
        
        // Set a retry policy        SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy();
        (3);
        
        // Set back policy        ExponentialBackOffPolicy backOffPolicy = new ExponentialBackOffPolicy();
        (1000); // 1 second        (2.0);       // The waiting time doubles after each failure        (10000);    // Wait for up to 10 seconds        
        (retryPolicy);
        (backOffPolicy);
        
        return retryTemplate;
    }
}

Combined with Resilience4j's retry function:

:
  instances:
    paymentService:
      maxRetryAttempts: 3
      waitDuration: 1s
      enableExponentialBackoff: true
      exponentialBackoffMultiplier: 2
      retryExceptions:
        - 
        -

@Service
public class PaymentServiceWithResilience4j {
    
    private final PaymentGateway paymentGateway;
    
    @Autowired
    public PaymentServiceWithResilience4j(PaymentGateway paymentGateway) {
         = paymentGateway;
    }
    
    @Retry(name = "paymentService", fallbackMethod = "processPaymentFallback")
    public PaymentResult processPayment(PaymentRequest request) {
        return (request);
    }
    
    public PaymentResult processPaymentFallback(PaymentRequest request, Exception e) {
        ("Payment processing failed after retries for order: {}", ());
        return ((), "Payment processing temporarily unavailable");
    }
}

Best Practices

Distinguish between temporary and permanent failures: Only retry for temporary faults and fail immediately for permanent faults.
Exponential backoff: Use an exponential backoff strategy to avoid retrying the storm.
Reasonable number of retry: Set the appropriate maximum number of retry times, usually 3-5 times.
Monitor after retry: Record the number of retry times and results to help identify problem services.
Idepotential operation: Make sure that the retry operation is idempotent to avoid problems caused by repeated processing.
Set timeout: There should be a reasonable timeout time every time you try.
Combined with fuses: Use the retry mechanism in conjunction with the fuse and fails quickly when the failure persists.

5. Health Checks and Monitoring

Basic Principles

Health checks and monitoring are the infrastructure to ensure service availability, which is used to understand system status in real time, detect and resolve problems early. Service failures can be prevented or quickly resolved through system metric collection, health status checks and alarm mechanisms.

SpringBoot implementation and integration

SpringBoot Actuator provides rich monitoring and management capabilities that can be easily integrated into the application.

Add dependencies:

<dependency>
    <groupId></groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId></groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Configure Actuator endpoints:

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus,loggers,env
  endpoint:
    health:
      show-details: always
      group:
        readiness:
          include: db,redis,rabbit,diskSpace
  health:
    circuitbreakers:
      enabled: true
    ratelimiters:
      enabled: true
  metrics:
    export:
      prometheus:
        enabled: true
    enable:
      jvm: true
      system: true
      process: true
      http: true

Custom Health Checker:

@Component
public class ExternalServiceHealthIndicator implements HealthIndicator {
    
    private final RestTemplate restTemplate;
    
    @Autowired
    public ExternalServiceHealthIndicator(RestTemplate restTemplate) {
         = restTemplate;
    }
    
    @Override
    public Health health() {
        try {
            // Check the health status of external services            ResponseEntity&lt;Map&gt; response = ("/health", );
            
            if (().is2xxSuccessful()) {
                return ()
                        .withDetail("status", ().get("status"))
                        .withDetail("version", ().get("version"))
                        .build();
            } else {
                return ()
                        .withDetail("statusCode", ())
                        .withDetail("reason", "Unexpected status code")
                        .build();
            }
        } catch (Exception e) {
            return ()
                    .withDetail("error", ())
                    .build();
        }
    }
}

Configure application ready probes and active probes:

@Configuration
public class HealthCheckConfig {
    
    @Bean
    public HealthContributorRegistry healthContributorRegistry(
            ApplicationAvailabilityBean availabilityBean) {
        
        HealthContributorRegistry registry = new DefaultHealthContributorRegistry();
        
        // Add the ready check for application startup        ("readiness", new ApplicationAvailabilityHealthIndicator(
                availabilityBean, ));
        
        // Add the activity check that the application is running        ("liveness", new ApplicationAvailabilityHealthIndicator(
                availabilityBean, .ACCEPTING_TRAFFIC));
        
        return registry;
    }
}

Custom metric collection:

@Component
public class OrderMetrics {
    
    private final MeterRegistry meterRegistry;
    private final Counter orderCounter;
    private final DistributionSummary orderAmountSummary;
    private final Timer orderProcessingTimer;
    
    public OrderMetrics(MeterRegistry meterRegistry) {
         = meterRegistry;
        
         = ("")
                .description("Number of orders created")
                .tag("application", "order-service")
                .register(meterRegistry);
        
         = ("")
                .description("Order amount distribution")
                .tag("application", "order-service")
                .publishPercentiles(0.5, 0.95, 0.99)
                .register(meterRegistry);
        
         = ("")
                .description("Order processing time")
                .tag("application", "order-service")
                .publishPercentiles(0.5, 0.95, 0.99)
                .register(meterRegistry);
    }
    
    public void recordOrderCreated(String orderType) {
        ();
        ("", "type", orderType).increment();
    }
    
    public void recordOrderAmount(double amount) {
        (amount);
    }
    
    public  startOrderProcessing() {
        return (meterRegistry);
    }
    
    public void endOrderProcessing( sample) {
        (orderProcessingTimer);
    }
}

@Service
public class OrderServiceWithMetrics {
    
    private final OrderRepository orderRepository;
    private final OrderMetrics orderMetrics;
    
    @Autowired
    public OrderServiceWithMetrics(OrderRepository orderRepository, OrderMetrics orderMetrics) {
         = orderRepository;
         = orderMetrics;
    }
    
    public Order createOrder(OrderRequest request) {
         timer = ();
        
        try {
            Order order = new Order();
            // Process orders            (());
            (calculateTotalAmount(()));
            (());
            
            Order savedOrder = (order);
            
            // Record indicators            (());
            (());
            
            return savedOrder;
        } finally {
            (timer);
        }
    }
    
    private double calculateTotalAmount(List&lt;OrderItem&gt; items) {
        // Calculate the total amount        return ()
                .mapToDouble(item -&gt; () * ())
                .sum();
    }
}

Integrate Grafana and Prometheus monitoring:

# 
version: '3.8'
services:
  app:
    image: my-spring-boot-app:latest
    ports:
      - "8080:8080"
    
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./:/etc/prometheus/
    ports:
      - "9090:9090"
    
  grafana:
    image: grafana/grafana:latest
    depends_on:
      - prometheus
    ports:
      - "3000:3000"
    volumes:
      - grafana-storage:/var/lib/grafana
    
volumes:
  grafana-storage:

# 
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'spring-boot-app'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['app:8080']

Best Practices

Multi-level health check: Achieve shallow and deep health examinations, the former responds quickly and the latter is comprehensively inspected.
Key business indicator monitoring: Monitor key business indicators, such as order quantity, conversion rate, etc.
System resource monitoring: Monitor CPU, memory, disk, network and other system resources.
Set reasonable alarm thresholds: Set alert thresholds based on business importance and system characteristics.
Correlation Analysis: Related indicators of different services to facilitate root cause analysis of the problem.
Combining logs and indicators: Combining logs and metrics to provide a more complete system view.
Predictive monitoring: Use trend analysis to predict potential problems, such as time-consuming disk space prediction.

Summarize

This article introduces five core service availability guarantee technologies in SpringBoot: fuse mode, current limiting technology, service downgrade and fault tolerance processing, retry mechanism, and health inspection and monitoring. These technologies are not isolated, but cooperate with each other and work together to build an applied defense system.

The above is the detailed content shared by 5 service availability guarantee technologies in SpringBoot. For more information about SpringBoot service availability guarantee technologies, please pay attention to my other related articles!