trpc.group/trpc-go/trpc-go@v1.0.3/healthcheck/README.md (about) 1 English | [中文](README.zh_CN.md) 2 3 ## Introduction 4 5 When a process starts, the code service may not have finished initializing, such as services that require hot loading during startup. 6 Long-running services may eventually enter an inconsistent state and be unable to provide services normally to the outside world unless they are restarted. 7 Similar to K8s [readiness](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes) and [liveness](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request), tRPC also provides a health check function for services. 8 9 ## Quick Start 10 11 The health check of tRPC-Go is built into the `admin` package and needs to be enabled in `trpc_go.yaml`: 12 ```yaml 13 server: 14 admin: 15 port: 11014 16 ``` 17 You can then use `curl "http://localhost:11014/is_healthy/"` to determine the status of the service. The corresponding relationship between HTTP status codes and service status is as follows: 18 19 | HTTP status code | Service status | 20 | :-: | :-: | 21 | `200` | Healthy | 22 | `404` | Unknown | 23 | `503` | Unhealthy | 24 25 ## Detailed Introduction 26 27 In the "Quick Start" section, as long as the `/is_healthy/` of admin is called, the entire service is healthy, and you do not need to care about which services are under the server, which is suitable for most default scenarios. For scenarios that require setting specific service status, we provide an API at the code level: 28 ```go 29 // trpc.go 30 // GetAdminService gets admin service from server.Server. 31 func GetAdminService(s *server.Server) (*admin.TrpcAdminServer, error) 32 33 // admin/admin.go 34 // RegisterHealthCheck registers a new service and return two functions, one for unregistering the service and one for 35 // updating the status of the service. 36 func (s *TrpcAdminServer) RegisterHealthCheck(serviceName string) (unregister func(), update func(healthcheck.Status), err error) 37 ``` 38 For example, in the following sample: 39 ```go 40 func main() { 41 s := trpc.NewServer() 42 admin, err := trpc.GetAdminService(s) 43 if err != nil { panic(err) } 44 45 unregisterXxx, updateXxx, err := admin.RegisterHealthCheck("Xxx") 46 if err != nil { panic(err) } 47 _, updateYyy, err := admin.RegisterHealthCheck("Yyy") 48 if err != nil { panic(err) } 49 50 // When you no longer care about Xxx and want it to not affect the overall status of the server, you can call unregisterXxx 51 // In the implementation of Xxx/Yyy, updateXxx/updateYyy is called to update their health status 52 pb.RegisterXxxService(s, newXxxImpl(unregisterXxx, updateXxx)) 53 pb.RegisterYyyService(s, newYyyImpl(updateYyy)) 54 pb.RegisterZzzService(s, newZzzImpl()) // We don't care about Zzz 55 56 log.Info(s.serve()) 57 } 58 ``` 59 You register three services, but only `Xxx` and `Yyy` have registered health checks. At this time, you can obtain the status of service `Xxx` separately by appending `Xxx` to the URL: `curl "http://localhost:11014/is_healthy/Xxx"`. For the unregistered service `Zzz`, its HTTP status code is `404`. 60 61 Because we have registered health checks for `Xxx` and `Yyy`, the status of the entire server (i.e., `curl "http://localhost:11014/is_healthy/"`) will be jointly determined by `Xxx` and `Yyy`. Only when `Xxx` and `Yyy` are both `healthcheck.Serving`, the HTTP status code of the server is `200`. When `Xxx` and `Yyy` are at least one `healthcheck.Unknown` (the default initial state of the service registered using `admin.RegisterHealthCheck`), the HTTP status code of the server is `404`. Otherwise, the HTTP status code of the server is `503`. 62 63 In short, you only need to remember that the entire server is `200` only when all registered health check services are `healthcheck.Serving`. 64 65 ## Cooperate with Polaris Heartbeat 66 67 The heartbeat of [`naming-polarismesh`](https://github.com/trpc-ecosystem/go-naming-polarismesh) can cooperate with health check. 68 69 For any service that has not explicitly registered for health check, its heartbeat start immediately after server started (same as older version). 70 For any service that has explicitly registered for health check, only when its status become `healthcheck.Serving`, the first heartbeat starts. If the status changed to `healthcheck.NotServing` or `healthcheck.Unknown`, Polaris heartbeat will be paused until status changed to `healthcheck.Serving` (a heartbeat will be immediately sent upon change).