github.com/google/cloudprober@v0.11.3/docs/content/how-to/external-probe.md (about) 1 --- 2 menu: 3 main: 4 parent: "How-Tos" 5 weight: 25 6 title: "External Probe" 7 date: 2017-10-08T17:24:32-07:00 8 --- 9 External probe type allows you to run arbitrary, complex probes through 10 Cloudprober. An external probe runs an independent external program for actual 11 probing. Cloudprober calculates probe metrics based on program's exit status 12 and time elapsed in execution. 13 14 Cloudprober also allows external programs to provide additional metrics. 15 Every message sent to `stdout` will be parsed as a new metrics to be emitted. 16 For general logging you can use another I/O stream like `stderr`. 17 18 ## Sample Probe 19 To understand how it works, lets create a sample probe that sets and gets a key 20 in a redis server. Here is the `main` function of such a probe: 21 22 {{< highlight go >}} 23 func main() { 24 var client redis.Client 25 var key = "hello" 26 startTime := time.Now() 27 client.Set(key, []byte("world")) 28 fmt.Printf("set_latency_ms %f\n", float64(time.Since(startTime).Nanoseconds())/1e6) 29 30 startTime = time.Now() 31 val, _ := client.Get("hello") 32 log.Printf("%s=%s", key, string(val)) 33 fmt.Printf("get_latency_ms %f\n", float64(time.Since(startTime).Nanoseconds())/1e6) 34 } 35 {{< / highlight >}} 36 37 (Full listing: https://github.com/google/cloudprober/blob/master/examples/external/redis_probe.go) 38 39 This program sets and gets a key in redis and prints the time taken for both operations. 40 `set_latency_ms` and `get_latency_ms` will be emitted as metrics. You could also define your own labels using this format: 41 ``` 42 fmt.Printf("get_latency_ms{region=%v, cluster=%v} %f\n", region, cluster, float64(time.Since(startTime).Nanoseconds())/1e6) 43 ``` 44 45 Cloudprober can use this program as an external probe, to verify 46 the availability and performance of the redis server. This program assumes that 47 redis server is running locally, at its default port. For the sake of demonstration, lets run a local redis server (you can also easily modify this program to use a different server.) 48 49 {{< highlight bash >}} 50 #!bash 51 brew install redis 52 {{< / highlight >}} 53 54 Let's compile our probe program (redis_probe.go) and verify that it's working 55 as expected: 56 57 {{< highlight bash >}} 58 #!bash 59 CGO_ENABLED=0 go build -ldflags “-extldflags=-static” ./redis_probe.go 60 ./redis_probe 61 set_latency_ms 22.656588 62 2018/02/26 15:16:14 hello=world 63 get_latency_ms 2.173560 64 {{< / highlight >}} 65 66 67 ## Configuration 68 Here is the external probe configuration that makes use of this program: 69 70 Full example in [examples/external/cloudprober.cfg](https://github.com/google/cloudprober/blob/master/examples/external/cloudprober.cfg). 71 72 {{< highlight shell >}} 73 # Run an external probe that executes a command from the current working 74 # directory. 75 probe { 76 name: "redis_probe" 77 type: EXTERNAL 78 targets { dummy_targets {} } 79 external_probe { 80 mode: ONCE 81 command: "./redis_probe" 82 } 83 } 84 {{< / highlight >}} 85 86 Note: To pass target information to your external program, 87 you can send target information as arguments using the `@label@` notation. 88 Supported fields are: target, address, port, probe, and target labels like target.label.fqdn. 89 ``` 90 command: "./redis_probe" -host=@address@ -port=@port@ 91 ``` 92 93 Running it through cloudprober, you'll see the following output: 94 95 {{< highlight bash >}} 96 # Launch cloudprober 97 cloudprober --config_file=cloudprober.cfg 98 99 cloudprober 1519..0 1519583408 labels=ptype=external,probe=redis_probe,dst= success=1 total=1 latency=12143.765 100 cloudprober 1519..1 1519583408 labels=ptype=external,probe=redis_probe,dst= set_latency_ms=0.516 get_latency_ms=0.491 101 cloudprober 1519..2 1519583410 labels=ptype=external,probe=redis_probe,dst= success=2 total=2 latency=30585.915 102 cloudprober 1519..3 1519583410 labels=ptype=external,probe=redis_probe,dst= set_latency_ms=0.636 get_latency_ms=0.994 103 cloudprober 1519..4 1519583412 labels=ptype=external,probe=redis_probe,dst= success=3 total=3 latency=42621.871 104 {{< / highlight >}} 105 106 You can import this data in prometheus following the process outlined at: 107 [Running Prometheus]({{< ref "/getting-started.md#running-prometheus" >}}). Before doing that, let's make it more interesting. 108 109 ## Distributions 110 How nice will it be if we could find distribution of the set and get latency. If tail latency was too high, it could explain the random timeouts in your application. Fortunately, it's very easy to create distributions in Cloudprober. You just need to add the following section to your probe definition: 111 112 Full example in [examples/external/cloudprober_aggregate.cfg](https://github.com/google/cloudprober/blob/master/examples/external/cloudprober_aggregate.cfg). 113 114 {{< highlight shell >}} 115 # Run an external probe and aggregate metrics in cloudprober. 116 ... 117 output_metrics_options { 118 aggregate_in_cloudprober: true 119 120 # Create distributions for get_latency_ms and set_latency_ms. 121 dist_metric { 122 key: "get_latency_ms" 123 value: { 124 explicit_buckets: "0.1,0.2,0.4,0.6,0.8,1.0,2.0" 125 } 126 } 127 dist_metric { 128 key: "set_latency_ms" 129 value: { 130 explicit_buckets: "0.1,0.2,0.4,0.6,0.8,1.0,2.0" 131 } 132 } 133 } 134 {{< / highlight >}} 135 136 This configuration adds options to aggregate the metrics in the cloudprober and configures "get\_latency\_ms" and "set\_latency\_ms" as distribution metrics with explicit buckets. Cloudprober will now build cumulative distributions using 137 for these metrics. We can import this data in Stackdriver or Prometheus and get the percentiles of the "get" and "set" latencies. Following screenshot shows the 138 grafana dashboard built using these metrics. 139 140 <a href="/diagrams/redis_probe_screenshot.png"><img style="float: center;" width=300px src="/diagrams/redis_probe_screenshot.png"></a> 141 142 ## Server Mode 143 144 The probe that we created above forks out a new `redis_probe` process for every 145 probe cycle. This can get expensive if probe frequency is high and the process is big (e.g. a Java binary). Also, what if you want to keep some state across probes, for example, lets say you want to monitor performance over HTTP/2 where you keep using the same TCP connection for multiple HTTP requests. A new process 146 every time makes keeping state impossible. 147 148 External probe's server mode provides a way to run the external probe process in daemon mode. Cloudprober communicates with this process over stdout/stdin (connected with OS pipes), using serialized protobuf messages. Cloudprober comes with a serverutils package that makes it easy to build external probe servers in Go. 149 150  151 152 Please see the code at 153 [examples/external/redis_probe.go](https://github.com/google/cloudprober/blob/master/examples/external/redis_probe.go) for server mode implementation of the above probe. Here is the corresponding 154 cloudprober config to run this probe in server mode: [examples/external/cloudprober_server.cfg]( 155 https://github.com/google/cloudprober/blob/master/examples/external/cloudprober_server.cfg). 156 157 In server mode, if external probe process dies for reason, it's restarted by Cloudprober.