trpc.group/trpc-go/trpc-go@v1.0.3/healthcheck/README.md (about)

     1  English | [中文](README.zh_CN.md)
     2  
     3  ## Introduction
     4  
     5  When a process starts, the code service may not have finished initializing, such as services that require hot loading during startup.  
     6  Long-running services may eventually enter an inconsistent state and be unable to provide services normally to the outside world unless they are restarted.  
     7  Similar to K8s [readiness](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-readiness-probes) and [liveness](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#define-a-liveness-http-request), tRPC also provides a health check function for services.
     8  
     9  ## Quick Start
    10  
    11  The health check of tRPC-Go is built into the `admin` package and needs to be enabled in `trpc_go.yaml`:
    12  ```yaml
    13  server:
    14    admin:
    15      port: 11014
    16  ```
    17  You can then use `curl "http://localhost:11014/is_healthy/"` to determine the status of the service. The corresponding relationship between HTTP status codes and service status is as follows:
    18  
    19  | HTTP status code | Service status |
    20  | :-: | :-: |
    21  | `200` | Healthy |
    22  | `404` | Unknown |
    23  | `503` | Unhealthy |
    24  
    25  ## Detailed Introduction
    26  
    27  In the "Quick Start" section, as long as the `/is_healthy/` of admin is called, the entire service is healthy, and you do not need to care about which services are under the server, which is suitable for most default scenarios. For scenarios that require setting specific service status, we provide an API at the code level:
    28  ```go
    29  // trpc.go
    30  // GetAdminService gets admin service from server.Server.
    31  func GetAdminService(s *server.Server) (*admin.TrpcAdminServer, error)
    32  
    33  // admin/admin.go
    34  // RegisterHealthCheck registers a new service and return two functions, one for unregistering the service and one for
    35  // updating the status of the service.
    36  func (s *TrpcAdminServer) RegisterHealthCheck(serviceName string) (unregister func(), update func(healthcheck.Status), err error)
    37  ```
    38  For example, in the following sample:
    39  ```go
    40  func main() {
    41  	s := trpc.NewServer()
    42  	admin, err := trpc.GetAdminService(s)
    43  	if err != nil { panic(err) }
    44  	
    45  	unregisterXxx, updateXxx, err := admin.RegisterHealthCheck("Xxx")
    46  	if err != nil { panic(err) }
    47  	_, updateYyy, err := admin.RegisterHealthCheck("Yyy")
    48  	if err != nil { panic(err) }
    49  	
    50  	// When you no longer care about Xxx and want it to not affect the overall status of the server, you can call unregisterXxx
    51  	// In the implementation of Xxx/Yyy, updateXxx/updateYyy is called to update their health status
    52  	pb.RegisterXxxService(s, newXxxImpl(unregisterXxx, updateXxx))
    53  	pb.RegisterYyyService(s, newYyyImpl(updateYyy))
    54  	pb.RegisterZzzService(s, newZzzImpl()) // We don't care about Zzz
    55  	
    56  	log.Info(s.serve())
    57  }
    58  ```
    59  You register three services, but only `Xxx` and `Yyy` have registered health checks. At this time, you can obtain the status of service `Xxx` separately by appending `Xxx` to the URL: `curl "http://localhost:11014/is_healthy/Xxx"`. For the unregistered service `Zzz`, its HTTP status code is `404`.
    60  
    61  Because we have registered health checks for `Xxx` and `Yyy`, the status of the entire server (i.e., `curl "http://localhost:11014/is_healthy/"`) will be jointly determined by `Xxx` and `Yyy`. Only when `Xxx` and `Yyy` are both `healthcheck.Serving`, the HTTP status code of the server is `200`. When `Xxx` and `Yyy` are at least one `healthcheck.Unknown` (the default initial state of the service registered using `admin.RegisterHealthCheck`), the HTTP status code of the server is `404`. Otherwise, the HTTP status code of the server is `503`.
    62  
    63  In short, you only need to remember that the entire server is `200` only when all registered health check services are `healthcheck.Serving`.
    64  
    65  ## Cooperate with Polaris Heartbeat
    66  
    67  The heartbeat of [`naming-polarismesh`](https://github.com/trpc-ecosystem/go-naming-polarismesh) can cooperate with health check.
    68  
    69  For any service that has not explicitly registered for health check, its heartbeat start immediately after server started (same as older version).  
    70  For any service that has explicitly registered for health check, only when its status become `healthcheck.Serving`, the first heartbeat starts. If the status changed to `healthcheck.NotServing` or `healthcheck.Unknown`, Polaris heartbeat will be paused until status changed to `healthcheck.Serving` (a heartbeat will be immediately sent upon change).