github.com/google/cloudprober@v0.11.3/docs/content/how-to/external-probe.md (about)

     1  ---
     2  menu:
     3      main:
     4          parent: "How-Tos"
     5          weight: 25
     6  title: "External Probe"
     7  date: 2017-10-08T17:24:32-07:00
     8  ---
     9  External probe type allows you to run arbitrary, complex probes through
    10  Cloudprober. An external probe runs an independent external program for actual
    11  probing. Cloudprober calculates probe metrics based on program's exit status
    12  and time elapsed in execution. 
    13  
    14  Cloudprober also allows external programs to provide additional metrics.
    15  Every message sent to `stdout` will be parsed as a new metrics to be emitted.
    16  For general logging you can use another I/O stream  like `stderr`.
    17  
    18  ## Sample Probe
    19  To understand how it works, lets create a sample probe that sets and gets a key
    20  in a redis server. Here is the `main` function of such a probe:
    21  
    22  {{< highlight go >}}
    23  func main() {
    24      var client redis.Client
    25      var key = "hello"
    26      startTime := time.Now()
    27      client.Set(key, []byte("world"))
    28      fmt.Printf("set_latency_ms %f\n", float64(time.Since(startTime).Nanoseconds())/1e6)
    29  
    30      startTime = time.Now()
    31      val, _ := client.Get("hello")
    32      log.Printf("%s=%s", key, string(val))
    33      fmt.Printf("get_latency_ms %f\n", float64(time.Since(startTime).Nanoseconds())/1e6)
    34  }
    35  {{< / highlight >}}
    36  
    37  (Full listing: https://github.com/google/cloudprober/blob/master/examples/external/redis_probe.go)
    38  
    39  This program sets and gets a key in redis and prints the time taken for both operations. 
    40  `set_latency_ms` and `get_latency_ms` will be emitted as metrics. You could also define your own labels using this format:
    41  ```
    42  fmt.Printf("get_latency_ms{region=%v, cluster=%v} %f\n", region, cluster, float64(time.Since(startTime).Nanoseconds())/1e6)
    43  ```
    44  
    45  Cloudprober can use this program as an external probe, to verify
    46  the availability and performance of the redis server. This program assumes that
    47  redis server is running locally, at its default port. For the sake of demonstration, lets run a local redis server (you can also easily modify this program to use a different server.)
    48  
    49  {{< highlight bash >}}
    50  #!bash
    51  brew install redis
    52  {{< / highlight >}}
    53  
    54  Let's compile our probe program (redis_probe.go) and verify that it's working
    55  as expected:
    56  
    57  {{< highlight bash >}}
    58  #!bash
    59  CGO_ENABLED=0 go build -ldflags “-extldflags=-static” ./redis_probe.go
    60  ./redis_probe
    61  set_latency_ms 22.656588
    62  2018/02/26 15:16:14 hello=world
    63  get_latency_ms 2.173560
    64  {{< / highlight >}}
    65  
    66  
    67  ## Configuration
    68  Here is the external probe configuration that makes use of this program:
    69  
    70  Full example in [examples/external/cloudprober.cfg](https://github.com/google/cloudprober/blob/master/examples/external/cloudprober.cfg).
    71  
    72  {{< highlight shell >}}
    73  # Run an external probe that executes a command from the current working
    74  # directory.
    75  probe {
    76    name: "redis_probe"
    77    type: EXTERNAL
    78    targets { dummy_targets {} }
    79    external_probe {
    80      mode: ONCE
    81      command: "./redis_probe"
    82    }
    83  }
    84  {{< / highlight >}}
    85  
    86  Note: To pass target information to your external program, 
    87  you can send target information as arguments using the `@label@` notation.  
    88  Supported fields are: target, address, port, probe, and target labels like target.label.fqdn.
    89  ```
    90  command: "./redis_probe" -host=@address@ -port=@port@
    91  ```
    92  
    93  Running it through cloudprober, you'll see the following output:
    94  
    95  {{< highlight bash >}}
    96  # Launch cloudprober
    97  cloudprober --config_file=cloudprober.cfg
    98  
    99  cloudprober 1519..0 1519583408 labels=ptype=external,probe=redis_probe,dst= success=1 total=1 latency=12143.765
   100  cloudprober 1519..1 1519583408 labels=ptype=external,probe=redis_probe,dst= set_latency_ms=0.516 get_latency_ms=0.491
   101  cloudprober 1519..2 1519583410 labels=ptype=external,probe=redis_probe,dst= success=2 total=2 latency=30585.915
   102  cloudprober 1519..3 1519583410 labels=ptype=external,probe=redis_probe,dst= set_latency_ms=0.636 get_latency_ms=0.994
   103  cloudprober 1519..4 1519583412 labels=ptype=external,probe=redis_probe,dst= success=3 total=3 latency=42621.871
   104  {{< / highlight >}}
   105  
   106  You can import this data in prometheus following the process outlined at:
   107  [Running Prometheus]({{< ref "/getting-started.md#running-prometheus" >}}). Before doing that, let's make it more interesting.
   108  
   109  ## Distributions
   110  How nice will it be if we could find distribution of the set and get latency. If tail latency was too high, it could explain the random timeouts in your application. Fortunately, it's very easy to create distributions in Cloudprober. You just need to add the following section to your probe definition:
   111  
   112  Full example in [examples/external/cloudprober_aggregate.cfg](https://github.com/google/cloudprober/blob/master/examples/external/cloudprober_aggregate.cfg).
   113  
   114  {{< highlight shell >}}
   115  # Run an external probe and aggregate metrics in cloudprober.
   116  ...
   117  output_metrics_options {
   118    aggregate_in_cloudprober: true
   119  
   120    # Create distributions for get_latency_ms and set_latency_ms.
   121    dist_metric {
   122      key: "get_latency_ms"
   123      value: {
   124        explicit_buckets: "0.1,0.2,0.4,0.6,0.8,1.0,2.0"
   125      }
   126    }
   127    dist_metric {
   128      key: "set_latency_ms"
   129      value: {
   130        explicit_buckets: "0.1,0.2,0.4,0.6,0.8,1.0,2.0"
   131      }
   132    }
   133  }
   134  {{< / highlight >}}
   135  
   136  This configuration adds options to aggregate the metrics in the cloudprober and configures "get\_latency\_ms" and "set\_latency\_ms" as distribution metrics with explicit buckets. Cloudprober will now build cumulative distributions using
   137  for these metrics. We can import this data in Stackdriver or Prometheus and get the percentiles of the "get" and "set" latencies. Following screenshot shows the
   138  grafana dashboard built using these metrics.
   139  
   140  <a href="/diagrams/redis_probe_screenshot.png"><img style="float: center;" width=300px src="/diagrams/redis_probe_screenshot.png"></a>
   141  
   142  ## Server Mode
   143  
   144  The probe that we created above forks out a new `redis_probe` process for every
   145  probe cycle. This can get expensive if probe frequency is high and the process is big (e.g. a Java binary). Also, what if you want to keep some state across probes, for example, lets say you want to monitor performance over HTTP/2 where you keep using the same TCP connection for multiple HTTP requests. A new process
   146  every time makes keeping state impossible.
   147  
   148  External probe's server mode provides a way to run the external probe process in daemon mode. Cloudprober communicates with this process over stdout/stdin (connected with OS pipes), using serialized protobuf messages. Cloudprober comes with a serverutils package that makes it easy to build external probe servers in Go.
   149  
   150  ![External Probe Server](/diagrams/external_probe_server.svg)
   151  
   152  Please see the code at 
   153  [examples/external/redis_probe.go](https://github.com/google/cloudprober/blob/master/examples/external/redis_probe.go) for server mode implementation of the above probe. Here is the corresponding
   154  cloudprober config to run this probe in server mode: [examples/external/cloudprober_server.cfg](
   155  https://github.com/google/cloudprober/blob/master/examples/external/cloudprober_server.cfg).
   156  
   157  In server mode, if external probe process dies for reason, it's restarted by Cloudprober.