A fallback policy is applied when a service request is abnormal.
There are three key concepts in fallback: isolation, circuit breaking, and fault tolerance:
- Isolation is an exception detection mechanism. There are two common "exception"s: timeout and overload, which can be controlled by timeout duration and max concurrent requests.
- Circuit breaking is an exception response mechanism which depends on isolation. Circuit breaking is triggered by the error rate, like the number of bad requests, or the rate of invalid calls.
- Fault tolerance is an exception handling mechanism that depends on circuit breaking. Fault tolerance is called after a circuit breaking is triggered. Users can set the number of fault tolerance calls in the configuration.
Let's combine the 3 concepts: the isolation mechanism detects there are M(the threshold) errors in N requests, the circuit breaking is triggered and make sure there are no more requests sent, and then fault tolerance method is called. Technically the concept definition is the same with Netflix Hystrix, making it easy to understand the config items(Reference: Hystrix Configuration)). ServiceComb provides 2 fault tolerance methods: returning null values and throwing exceptions.
Users configure a fallback policy to handle microservices' exceptions.
Configuration items can be set to be applied to all APIs or a particular method of a microservice.
- Configuration by type: items can be applied to Providers and Consumers
- Configuration by scope: items can be applied to a specific microservice, or [x-schema-id+operationId]
All the items in this chapter can be configured in the following format:
servicecomb.[namespace].[type].[MicroServiceName].[interface name].[property name]
The type can be Consumer or Provider. Specify the [MicroServiceName] to apply configuration to specific microservice. To make the configuration applied to API, we have to specify the API name in the format x-[schema-id+operationId]
The possible Isolation config items are as follows:
servicecomb.isolation.Consumer.timeout.enabled servicecomb.isolation.Consumer.DemoService.timeout.enabled servicecomb.isolation.Consumer.DemoService.hello.sayHello.timeout.enabled servicecomb.isolation.Provider.timeout.enabled servicecomb.isolation.Provider.DemoService.timeout.enabled servicecomb.isolation.Provider.DemoService.hello.sayHello.timeout.enabled
Tips: the items' type' field is omitted in the following table, while items can be applied to both Provider and Consumer.
For Providers, the configuration item should be: servicecomb.isolation.Consumer.timeout.enabled
For Consumers, the conriguration item should be servicecomb.isolation.Provider.timeout.enabled
Table 1-1 The fallback policy config items
|Configuration Item||Default value||Value Range||Required||Description||Tips|
|servicecomb.isolation.timeout.enabled||FALSE||-||No||Enable timeout detection or not.|
|servicecomb.isolation.timeoutInMilliseconds||30000||-||No||The timeout duration threshold.|
|servicecomb.isolation.maxConcurrentRequests||10||-||No||The maximum number of concurrent requests.|
|servicecomb.circuitBreaker.enabled||TRUE||-||No||Enable circuit breaking or not.|
|servicecomb.circuitBreaker.forceOpen||FALSE||-||No||Force circuit breaker to be enabled regardless of the number of errors.|
|servicecomb.circuitBreaker.forceClosed||FALSE||-||No||Force circuit breaker to be disabled.||When forceOpen and forceClose are set at the same time, forceOpen will take effect.|
|servicecomb.circuitBreaker.sleepWindowInMilliseconds||15000||-||No||How long to recover from a circuit breaking.||After the recovery, the number of failures will be reset. Note: If the call fails immediately after a recover, the circuit breaker is triggered immediately again.|
|servicecomb.circuitBreaker.requestVolumeThreshold||20||-||No||The threshold of failed requests within 10 seconds. If the threshold is reached, circuit breaker is triggered.||The 10 seconds duration is splitted evenly into 10 segments for error calculation. The calculation will start after 1 second. So circuit breakers are triggered after at least 1 second.|
|servicecomb.circuitBreaker.errorThresholdPercentage||50||-||No||The threshold of error rate. If the threshold is reached, circuit breaker is triggered.|
|servicecomb.fallback.enabled||TRUE||-||No||Enable fallback handling or not|
|servicecomb.fallback.maxConcurrentRequests||10||-||No||The max number of concurrent fallback(specified by servicecomb.fallbackpolicy.policy) calls. When the threshold is reached, the fallback method is not called by return exception directly.|
|servicecomb.fallbackpolicy.policy||throwexception||returnnulll \||throwexception||No||The fallback policy when errors occurred.|
Caution: Be cautious to set servicecomb.isolation.timeout.enabled to true. All handlers in the handler chain are asynchronously executed, the intermediate handlers' return will make the follow-up handlers processing abandoned. Therefore, we recommend to set servicecomb.isolation.timeout.enabled to be false(by default) and set the network timeout duration servicecomb.request.timeout to 30000.
servicecomb: handler: chain: Consumer: default: bizkeeper-consumer isolation: Consumer: timeout: enabled: true timeoutInMilliseconds: 30000 circuitBreaker: Consumer: sleepWindowInMilliseconds: 15000 requestVolumeThreshold: 20 fallback: Consumer: enabled: true fallbackpolicy: Consumer: policy: throwexception
You need to enable service governance for fallback. The corresponding provider handler
bizkeeper-provider, and the consumer handler is