1. Circuit Breaker
Basic Principles
The fuse mode draws on the concept of circuit fuses. When a service or component in the system is detected to fail frequently, it automatically "disconnects" calls to the service, preventing cascading failures and providing recovery time for the failed service. There are three states of fuses:
- Close status: Perform operations normally and monitor failure rate
- Open status: Access denied, directly return an error or perform downgrade logic
- Half open state: Try to restore, allowing limited requests to pass to test whether the service is restored
SpringBoot implementation and integration
In SpringBoot, we can implement fuse mode using Resilience4j, which is a lightweight alternative to Hystrix, designed for Java 8 and functional programming.
First add dependencies:
<dependency> <groupId>.resilience4j</groupId> <artifactId>resilience4j-spring-boot2</artifactId> <version>1.7.0</version> </dependency> <dependency> <groupId></groupId> <artifactId>spring-boot-starter-aop</artifactId> </dependency> <dependency> <groupId></groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency>
Then configure the fuse parameters:
resilience4j: circuitbreaker: instances: orderService: registerHealthIndicator: true slidingWindowSize: 10 minimumNumberOfCalls: 5 permittedNumberOfCallsInHalfOpenState: 3 automaticTransitionFromOpenToHalfOpenEnabled: true waitDurationInOpenState: 5s failureRateThreshold: 50 eventConsumerBufferSize: 10
Example code using fuses:
@Service public class OrderService { private final PaymentServiceClient paymentServiceClient; public OrderService(PaymentServiceClient paymentServiceClient) { = paymentServiceClient; } @CircuitBreaker(name = "orderService", fallbackMethod = "processOrderFallback") public OrderResponse processOrder(OrderRequest orderRequest) { // Normal order processing logic, including calling payment services PaymentResponse paymentResponse = (()); return new OrderResponse((), "PROCESSED", ()); } // Degradation method, executed when the fuse is triggered public OrderResponse processOrderFallback(OrderRequest orderRequest, Exception e) { ("Circuit breaker triggered for order: {}. Error: {}", (), ()); // Returns a downgrade response, which may be fetched from the local cache, or using the default value return new OrderResponse((), "PENDING", null); } }
Best Practices
-
Appropriate window size: Set up reasonably
slidingWindowSize
, Too small may cause the fuse to be too sensitive, and too large may cause the reaction to be slow. -
A reasonable threshold: Set according to business needs
failureRateThreshold
, generally recommended to be between 50% and 60%. - Monitor fuse status: Integrated Spring Boot Actuator to monitor fuse status:
management: endpoints: web: exposure: include: health,circuitbreakers health: circuitbreakers: enabled: true
- Fine-grained fuse: Configure different fuse instances for different service dependencies to avoid one service failure affecting multiple business processes.
- Testing circuit breaker behavior: Use chaos test to verify whether the fuse behaves in the fault situation in accordance with expectations.
2. Rate Limiting Technology
Basic Principles
Current limiting is used to control the system's request processing rate to prevent system overload. Common current limiting algorithms include:
- Token bucket: Add tokens to the bucket at a fixed rate, and the request needs to consume the token to be processed.
- Leaked bucket: Requests are processed at a fixed rate, exceeding the portion is queued or rejected.
- counter: Limit the number of requests within a fixed time window.
SpringBoot implementation and integration
In SpringBoot, we can use Bucket4j to implement API current limiting, which is a Java current limiting library based on token bucket algorithm.
Add dependencies:
<dependency> <groupId>-bukhtoyarov</groupId> <artifactId>bucket4j-core</artifactId> <version>4.10.0</version> </dependency> <dependency> <groupId></groupId> <artifactId>spring-boot-starter-cache</artifactId> </dependency> <dependency> <groupId></groupId> <artifactId>caffeine</artifactId> </dependency>
Configure cache and current limit:
@Configuration public class RateLimitingConfig { @Bean public CacheManager cacheManager() { CaffeineCacheManager cacheManager = new CaffeineCacheManager("rateLimit"); (() .expireAfterWrite(1, ) .maximumSize(1000)); return cacheManager; } @Bean public Bucket4jCacheConfiguration bucket4jCacheConfiguration() { return new Bucket4jCacheConfiguration(cacheManager(), "rateLimit"); } }
Implement the current limit interceptor:
@Component public class RateLimitingInterceptor implements HandlerInterceptor { private final Cache<String, Bucket> cache; public RateLimitingInterceptor() { = () .expireAfterWrite(1, ) .maximumSize(1000) .build(); } @Override public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception { String apiKey = ("X-API-KEY"); if (apiKey == null || ()) { (HttpStatus.BAD_REQUEST.value(), "Missing API key"); return false; } Bucket bucket = (apiKey, key -> createNewBucket()); ConsumptionProbe probe = (1); if (()) { ("X-Rate-Limit-Remaining", (())); return true; } else { long waitForRefill = () / 1_000_000_000; ("X-Rate-Limit-Retry-After-Seconds", (waitForRefill)); (HttpStatus.TOO_MANY_REQUESTS.value(), "Rate limit exceeded"); return false; } } private Bucket createNewBucket() { BucketConfiguration config = () .addLimit((100, (100, (1)))) .addLimit((1000, (1000, (1)))) .build(); return ().withConfiguration(config).build(); } } @Configuration public class WebMvcConfig implements WebMvcConfigurer { @Autowired private RateLimitingInterceptor rateLimitingInterceptor; @Override public void addInterceptors(InterceptorRegistry registry) { (rateLimitingInterceptor) .addPathPatterns("/api/**"); } }
Implement current limit in Spring Cloud Gateway:
spring: cloud: gateway: routes: - id: order-service uri: lb://order-service predicates: - Path=/orders/** filters: - name: RequestRateLimiter args: : 10 : 20 : 1 key-resolver: "#{@userKeyResolver}"
@Configuration public class GatewayConfig { @Bean public KeyResolver userKeyResolver() { return exchange -> { String userId = ().getHeaders().getFirst("User-Id"); if (userId == null) { userId = "anonymous"; } return (userId); }; } }
Best Practices
- Graded current limit: Set different current limit thresholds based on different user types or API importance.
- Apply multi-level current limit: For example, use user-level, IP-level and global-level current limits at the same time.
- Current limit response: Returns the appropriate HTTP status code (usually 429) and clear error messages when the current limit triggers, including retry suggestions.
- Monitor current limit indicators: Collect current limiting indicators to analyze and adjust current limiting strategies.
- Elegant downgrade: When the current limit threshold is reached, consider providing downgrade services rather than reject them completely.
3. Service downgrade and fault tolerance processing
Basic Principles
Service downgrade is a strategy to maintain overall system availability by providing limited but acceptable services when the system is high in load or some of its services are unavailable. Fault-tolerant processing refers to the ability of the system to detect and process errors while continuing to operate normally.
SpringBoot implementation and integration
In SpringBoot, service downgrades can be achieved in a variety of ways, including in conjunction with fuses, using asynchronous fallbacks, and implementing timeout control.
Use Resilience4j's Fallback to implement service downgrade:
@Service public class ProductService { private final ProductRepository productRepository; private final ProductCacheService productCacheService; @Autowired public ProductService(ProductRepository productRepository, ProductCacheService productCacheService) { = productRepository; = productCacheService; } @CircuitBreaker(name = "productService", fallbackMethod = "getProductDetailsFallback") @Bulkhead(name = "productService", fallbackMethod = "getProductDetailsFallback") @TimeLimiter(name = "productService", fallbackMethod = "getProductDetailsFallback") public CompletableFuture<ProductDetails> getProductDetails(String productId) { return (() -> { // Logic for obtaining product details Product product = (productId) .orElseThrow(() -> new ProductNotFoundException(productId)); // Get real-time inventory and price information InventoryInfo inventory = (productId); PricingInfo pricing = (productId); return new ProductDetails(product, inventory, pricing); }); } // Downgrade method, providing basic product information and cache inventory and price public CompletableFuture<ProductDetails> getProductDetailsFallback(String productId, Exception e) { ("Fallback for product {}. Reason: {}", productId, ()); return (() -> { // Get basic product information from cache Product product = (productId) .orElse(new Product(productId, "Unknown Product", "No details available")); // Use default inventory and price information InventoryInfo inventory = new InventoryInfo(productId, 0, false); PricingInfo pricing = new PricingInfo(productId, 0.0, false); return new ProductDetails(product, inventory, pricing, true); }); } }
Configure timeout and service isolation:
resilience4j: timelimiter: instances: productService: timeoutDuration: 2s cancelRunningFuture: true bulkhead: instances: productService: maxConcurrentCalls: 20 maxWaitDuration: 500ms
Filters to implement elegant downgrade strategies:
@Component public class GracefulDegradationFilter extends OncePerRequestFilter { private final HealthCheckService healthCheckService; @Autowired public GracefulDegradationFilter(HealthCheckService healthCheckService) { = healthCheckService; } @Override protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain) throws ServletException, IOException { String path = (); // Check the health status of the system SystemHealth health = (); if (() && isNonCriticalPath(path)) { // Non-critical path requests in high load degradation stage sendDegradedResponse(response, "Service temporarily operating at reduced capacity"); return; } else if (() && !isAdminPath(path)) { // Only management requests are allowed in maintenance mode sendMaintenanceResponse(response); return; } else if (() && dependsOnFailedServices(path, ())) { // If the requested dependent service is unavailable, return a downgrade response sendDependencyFailureResponse(response, ()); return; } //Requests are processed normally (request, response); } private boolean isNonCriticalPath(String path) { // Determine whether the request is a non-critical path return ("/api/recommendations") || ("/api/analytics") || ("/api/marketing"); } private boolean isAdminPath(String path) { return ("/admin") || ("/management"); } private boolean dependsOnFailedServices(String path, List<String> failedServices) { // Check whether the request depends on the failed service Map<String, List<String>> serviceDependencies = new HashMap<>(); ("/api/orders", ("payment-service", "inventory-service")); ("/api/payments", ("payment-service")); // ... Dependencies between other paths and services String matchingPath = findMatchingPath(path, ()); if (matchingPath != null) { List<String> dependencies = (matchingPath); return ().anyMatch(failedServices::contains); } return false; } private String findMatchingPath(String requestPath, Set<String> configuredPaths) { // Find the configuration path that matches the request path return () .filter(requestPath::startsWith) .findFirst() .orElse(null); } private void sendDegradedResponse(HttpServletResponse response, String message) throws IOException { (HttpStatus.SERVICE_UNAVAILABLE.value()); (MediaType.APPLICATION_JSON_VALUE); Map<String, Object> responseBody = new HashMap<>(); ("status", "degraded"); ("message", message); ("retry_after", 30); // It is recommended to try again after 30 seconds ().write(new ObjectMapper().writeValueAsString(responseBody)); } // Other response processing methods...}
Best Practices
- Grading downgrade strategy: Formulate a hierarchical downgrade strategy for different failure scenarios and service importance.
- Static downgrade: Prepare static resources or cached data in advance to use when the service is unavailable.
- Functional downgrade: Temporarily close non-core functions to ensure the normal core business.
- Degraded by specific user groups: Under high load conditions, priority is given to ensuring the experience of VIP users.
- Service quarantine: Use Bulkhead mode to isolate resources of different services to prevent problems of one service from affecting other services.
- Timeout control: Set a reasonable timeout time to prevent long-term waiting from affecting the user experience.
4. Retry mechanism (Retry)
Basic Principles
The retry mechanism is used to handle temporary failures, improving the resilience of the system by automatically retrying failed operations. It is especially effective for scenarios such as network jitter and temporary unavailability of databases.
SpringBoot implementation and integration
The Spring Retry library can be used in SpringBoot to implement the retry function.
Add dependencies:
<dependency> <groupId></groupId> <artifactId>spring-retry</artifactId> </dependency> <dependency> <groupId></groupId> <artifactId>spring-boot-starter-aop</artifactId> </dependency>
Enable the retry function:
@SpringBootApplication @EnableRetry public class MyApplication { public static void main(String[] args) { (, args); } }
Try again using declarative:
@Service public class RemoteServiceClient { private final RestTemplate restTemplate; @Autowired public RemoteServiceClient(RestTemplate restTemplate) { = restTemplate; } @Retryable( value = {, }, maxAttempts = 3, backoff = @Backoff(delay = 1000, multiplier = 2) ) public ResponseEntity<OrderData> getOrderDetails(String orderId) { ("Attempting to fetch order details for {}", orderId); return ("/api/orders/" + orderId, ); } @Recover public ResponseEntity<OrderData> recoverGetOrderDetails(Exception e, String orderId) { ("All retries failed for order {}. Last error: {}", orderId, ()); // Return cached data or default response return (new OrderData(orderId, "UNKNOWN", new Date(), ())); } }
Try again using programmatic:
@Service public class PaymentService { private final RetryTemplate retryTemplate; @Autowired public PaymentService(RetryTemplate retryTemplate) { = retryTemplate; } public PaymentResult processPayment(PaymentRequest paymentRequest) { return (context -> { // Get the current number of retries int retryCount = (); ("Processing payment attempt {} for order {}", retryCount + 1, ()); try { // Perform payment processing return (paymentRequest); } catch (PaymentGatewayException e) { // Analyze the exception and decide whether to try again if (()) { ("Retryable payment error: {}. Will retry.", ()); throw e; // Throw an exception to trigger a retry } else { ("Non-retryable payment error: {}", ()); throw new NonRetryableException("Payment failed with non-retryable error", e); } } }, context -> { // Recovery strategy ("All payment retries failed for order {}", ()); // Return the failed result and record it requires subsequent processing return ((), "Maximum retries exceeded"); }); } } @Configuration public class RetryConfig { @Bean public RetryTemplate retryTemplate() { RetryTemplate retryTemplate = new RetryTemplate(); // Set a retry policy SimpleRetryPolicy retryPolicy = new SimpleRetryPolicy(); (3); // Set back policy ExponentialBackOffPolicy backOffPolicy = new ExponentialBackOffPolicy(); (1000); // 1 second (2.0); // The waiting time doubles after each failure (10000); // Wait for up to 10 seconds (retryPolicy); (backOffPolicy); return retryTemplate; } }
Combined with Resilience4j's retry function:
: instances: paymentService: maxRetryAttempts: 3 waitDuration: 1s enableExponentialBackoff: true exponentialBackoffMultiplier: 2 retryExceptions: - -
@Service public class PaymentServiceWithResilience4j { private final PaymentGateway paymentGateway; @Autowired public PaymentServiceWithResilience4j(PaymentGateway paymentGateway) { = paymentGateway; } @Retry(name = "paymentService", fallbackMethod = "processPaymentFallback") public PaymentResult processPayment(PaymentRequest request) { return (request); } public PaymentResult processPaymentFallback(PaymentRequest request, Exception e) { ("Payment processing failed after retries for order: {}", ()); return ((), "Payment processing temporarily unavailable"); } }
Best Practices
- Distinguish between temporary and permanent failures: Only retry for temporary faults and fail immediately for permanent faults.
- Exponential backoff: Use an exponential backoff strategy to avoid retrying the storm.
- Reasonable number of retry: Set the appropriate maximum number of retry times, usually 3-5 times.
- Monitor after retry: Record the number of retry times and results to help identify problem services.
- Idepotential operation: Make sure that the retry operation is idempotent to avoid problems caused by repeated processing.
- Set timeout: There should be a reasonable timeout time every time you try.
- Combined with fuses: Use the retry mechanism in conjunction with the fuse and fails quickly when the failure persists.
5. Health Checks and Monitoring
Basic Principles
Health checks and monitoring are the infrastructure to ensure service availability, which is used to understand system status in real time, detect and resolve problems early. Service failures can be prevented or quickly resolved through system metric collection, health status checks and alarm mechanisms.
SpringBoot implementation and integration
SpringBoot Actuator provides rich monitoring and management capabilities that can be easily integrated into the application.
Add dependencies:
<dependency> <groupId></groupId> <artifactId>spring-boot-starter-actuator</artifactId> </dependency> <dependency> <groupId></groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency>
Configure Actuator endpoints:
management: endpoints: web: exposure: include: health,info,metrics,prometheus,loggers,env endpoint: health: show-details: always group: readiness: include: db,redis,rabbit,diskSpace health: circuitbreakers: enabled: true ratelimiters: enabled: true metrics: export: prometheus: enabled: true enable: jvm: true system: true process: true http: true
Custom Health Checker:
@Component public class ExternalServiceHealthIndicator implements HealthIndicator { private final RestTemplate restTemplate; @Autowired public ExternalServiceHealthIndicator(RestTemplate restTemplate) { = restTemplate; } @Override public Health health() { try { // Check the health status of external services ResponseEntity<Map> response = ("/health", ); if (().is2xxSuccessful()) { return () .withDetail("status", ().get("status")) .withDetail("version", ().get("version")) .build(); } else { return () .withDetail("statusCode", ()) .withDetail("reason", "Unexpected status code") .build(); } } catch (Exception e) { return () .withDetail("error", ()) .build(); } } }
Configure application ready probes and active probes:
@Configuration public class HealthCheckConfig { @Bean public HealthContributorRegistry healthContributorRegistry( ApplicationAvailabilityBean availabilityBean) { HealthContributorRegistry registry = new DefaultHealthContributorRegistry(); // Add the ready check for application startup ("readiness", new ApplicationAvailabilityHealthIndicator( availabilityBean, )); // Add the activity check that the application is running ("liveness", new ApplicationAvailabilityHealthIndicator( availabilityBean, .ACCEPTING_TRAFFIC)); return registry; } }
Custom metric collection:
@Component public class OrderMetrics { private final MeterRegistry meterRegistry; private final Counter orderCounter; private final DistributionSummary orderAmountSummary; private final Timer orderProcessingTimer; public OrderMetrics(MeterRegistry meterRegistry) { = meterRegistry; = ("") .description("Number of orders created") .tag("application", "order-service") .register(meterRegistry); = ("") .description("Order amount distribution") .tag("application", "order-service") .publishPercentiles(0.5, 0.95, 0.99) .register(meterRegistry); = ("") .description("Order processing time") .tag("application", "order-service") .publishPercentiles(0.5, 0.95, 0.99) .register(meterRegistry); } public void recordOrderCreated(String orderType) { (); ("", "type", orderType).increment(); } public void recordOrderAmount(double amount) { (amount); } public startOrderProcessing() { return (meterRegistry); } public void endOrderProcessing( sample) { (orderProcessingTimer); } } @Service public class OrderServiceWithMetrics { private final OrderRepository orderRepository; private final OrderMetrics orderMetrics; @Autowired public OrderServiceWithMetrics(OrderRepository orderRepository, OrderMetrics orderMetrics) { = orderRepository; = orderMetrics; } public Order createOrder(OrderRequest request) { timer = (); try { Order order = new Order(); // Process orders (()); (calculateTotalAmount(())); (()); Order savedOrder = (order); // Record indicators (()); (()); return savedOrder; } finally { (timer); } } private double calculateTotalAmount(List<OrderItem> items) { // Calculate the total amount return () .mapToDouble(item -> () * ()) .sum(); } }
Integrate Grafana and Prometheus monitoring:
# version: '3.8' services: app: image: my-spring-boot-app:latest ports: - "8080:8080" prometheus: image: prom/prometheus:latest volumes: - ./:/etc/prometheus/ ports: - "9090:9090" grafana: image: grafana/grafana:latest depends_on: - prometheus ports: - "3000:3000" volumes: - grafana-storage:/var/lib/grafana volumes: grafana-storage:
# global: scrape_interval: 15s scrape_configs: - job_name: 'spring-boot-app' metrics_path: '/actuator/prometheus' static_configs: - targets: ['app:8080']
Best Practices
- Multi-level health check: Achieve shallow and deep health examinations, the former responds quickly and the latter is comprehensively inspected.
- Key business indicator monitoring: Monitor key business indicators, such as order quantity, conversion rate, etc.
- System resource monitoring: Monitor CPU, memory, disk, network and other system resources.
- Set reasonable alarm thresholds: Set alert thresholds based on business importance and system characteristics.
- Correlation Analysis: Related indicators of different services to facilitate root cause analysis of the problem.
- Combining logs and indicators: Combining logs and metrics to provide a more complete system view.
- Predictive monitoring: Use trend analysis to predict potential problems, such as time-consuming disk space prediction.
Summarize
This article introduces five core service availability guarantee technologies in SpringBoot: fuse mode, current limiting technology, service downgrade and fault tolerance processing, retry mechanism, and health inspection and monitoring. These technologies are not isolated, but cooperate with each other and work together to build an applied defense system.
The above is the detailed content shared by 5 service availability guarantee technologies in SpringBoot. For more information about SpringBoot service availability guarantee technologies, please pay attention to my other related articles!