bosun.org@v0.0.0-20210513094433-e25bc3e69a1f/docs/configuration.md (about) 1 --- 2 layout: default 3 title: Configuration (0.5.0 and earlier) 4 order: 3 5 --- 6 7 <div class="row"> 8 <div class="col-sm-3" > 9 <div class="sidebar" data-spy="affix" data-offset-top="0" data-offset-bottom="0" markdown="1"> 10 11 * Some TOC 12 {:toc} 13 14 </div> 15 </div> 16 17 <div class="doc-body col-sm-9" markdown="1"> 18 19 <p class="title h1">{{page.title}}</p> 20 21 <div class="admonition"> 22 <p class="admonition-title">Attention</p> 23 <p>This documentation is for versions prior to 0.6.0. For 0.6.0 there are two different documentation sections that replace this section: <a href="/system_configuration">system configuration</a> and <a href="/definitions">definitions</a>.</p> 24 </div> 25 26 {% raw %} 27 28 Syntax is sectional, with each section having a type and a name, followed by `{` and ending with `}`. Key/value pairs follow of the form `key = value`. Key names are non-whitespace characters before the `=`. The value goes until end of line and is a string. Multi-line strings are supported using backticks to delimit start and end of string. Comments go from a `#` to end of line (unless the `#` appears in a backtick string). Whitespace is trimmed at ends of values and keys. Files are UTF-8 encoded. 29 30 ## Variables 31 32 Variables perform simple text replacement - they are not intelligent. They are any key whose name begins with `$`, and may also be surrounded by braces (`{`, `}`) to disambiguate between shorter keys (ex: `${var}`) Before an expression is evaluated, all variables are evaluated in the text. Variables can be defined at any scope, and will shadow other variables with the same name of higher scope. 33 34 ### Environment Variables 35 36 Environment variables may be used similarly to variables, but with `env.` preceding the name. For example: `tsdbHost = ${env.TSDBHOST}` (with or without braces). It is an error to specify a non-existent or empty environment variable. 37 38 ## Sections 39 40 ### globals 41 42 Globals are all key=value pairs not in a section. These are generally placed at the top of the file. 43 Every variable is optional, though you should enable at least 1 backend. 44 45 #### backends 46 47 * tsdbHost: OpenTSDB host. Must be GZIP-aware (use the [next branch](https://github.com/opentsdb/opentsdb/tree/next)). Can specify both host and port: `tsdb-host:4242`. Defaults to port 4242 if no port specified. If you use opentsdb without relaying the data through Bosun currently the following won't work (and this isn't something we officially support): 48 * Tag value glob matching, for example `avg:metric.name{tag=something-*}`. However single asterisks like `tag=*` will still work. 49 * The items page. 50 * The graph page's tag list. 51 * tsdbVersion: Defaults to 2.1 if not present. Should always be specified as Number.Number. Various OpenTSDB features are added with newer versions. 52 * relayListen: Listen on the given address (i.e., set to :4242) and will pass through all /api/X calls to your OpenTSDB server. This is an optinal parameter when using OpenTSDB so it is not required for any Bosun functionality 53 * graphiteHost: an ip, hostname, ip:port, hostname:port or a URL, defaults to standard http/https ports, defaults to "/render" path. Any non-zero path (even "/" overrides path) 54 * graphiteHeader: a http header to be sent to graphite on each request in 'key:value' format. optional. can be specified multiple times. 55 * logstashElasticHosts: Elasticsearch hosts populated by logstash. Must be a CSV list of URLs and only works with elastic pre-v2. The hosts you list are used to discover all hosts in the cluster. 56 * elasticHosts: Elasticsearch hosts. This is not limited to logstash's schema. It must be a CSV list of URLs and only works with elastic v2 and later. The hosts you list are used to discover all hosts in the cluster. 57 * annotateElasticHosts: Enables annotations by setting this. Is a CSV list of URLs like elasticHosts. More on annotations in the [usage documentation](http://bosun.org/usage#annotations). By default the index is named "annotate" and will be created if it doesn't exist. You can change which index to use/create with the annotateIndex setting. 58 * influxHost: InfluxDB host address ip:port pair. 59 * influxUsername: InfluxDB username. If empty will attempt to connect without authentication. 60 * influxPassword: InfluxDB password. If empty will attempt to connect without authentication. 61 * influxTLS: Whether to use TLS when connecting to InfluxDB. Default is false. 62 * influxTimeout: Timeout duration for connections to InfluxDB. 63 64 #### data storage 65 66 With bosun v0.5.0, bosun uses redis as a storage mechanism for it's internal state. You can either run a redis instance to hold this data, or bosun can use an embedded server if you would rather run standalone (using [ledisDb](http://ledisdb.com/)). Redis is recommend for production use. [This gist](https://gist.github.com/kylebrandt/3fdc97171b96ba46fd9e1d14abd03027) shows an example redis config, tested redis version, and an example cron job for backing up the redis data. 67 68 Config items: 69 70 * redisHost: redis server to use. Ex: `localhost:6379`. Redis 3.0 or greater is required. 71 * redisDb: redis database to use. Default is `0`. 72 * redisPassword: redis password. 73 * ledisDir: directory for ledisDb to store it's data. Will default to `ledis_data` in working dir if no redis host is provided. 74 * ledisBindAddr: Address and port for ledis to bind to, defaults to `127.0.0.1:9565`. 75 76 #### settings 77 78 * checkFrequency: time between alert checks, defaults to `5m` 79 * defaultRunEvery: default multiplier of check frequency to run alerts. Defaults to `1`. 80 * emailFrom: from address for notification emails, required for email notifications 81 * httpListen: HTTP listen address, defaults to `:8070` 82 * hostname: when generating links in templates, use this value as the hostname instead of using the system's hostname 83 * minGroupSize: minimum group size for alerts to be grouped together on dashboard. Default `5`. 84 * ping: if present, will ping all values tagged with host 85 * responseLimit: number of bytes to limit OpenTSDB responses, defaults to 1MB (`1048576`) 86 * searchSince: duration of time to filter by during certain searches, defaults to `3d`; currently used by the hosts list on the items page 87 * smtpHost: SMTP server, required for email notifications 88 * squelch: see [alert squelch](#squelch) 89 * stateFile: bosun state file, defaults to `bosun.state` 90 * unknownTemplate: name of the template for unknown alerts 91 * shortURLKey: goo.gl API key, needed if you hit usage limits when using the short link button 92 * timeAndDate: The configuration parameter for the worldclock links is timeAndDate, i.e. `timeAndDate = 202,75,179,136` adds adds Portland, Denver, New York, and London to the datetime links generated in alerts. See [timeanddate.com documentation](http://www.timeanddate.com/worldclock/converter-about.html) 93 94 #### SMTP Authentication 95 96 These optional fields, if either is specified, will authenticate with the SMTP server 97 98 * smtpUsername: SMTP username 99 * smtpPassword: SMTP password 100 101 ### macro 102 103 Macros are sections that can define anything (including variables). It is not an error to reference an unknown variable in a macro. Other sections can reference the macro with `macro = name`. The macro's data will be expanded with the current variable definitions and inserted at that point in the section. Multiple macros may be thus referenced at any time. Macros may reference other macros. For example: 104 105 ~~~ 106 $default_time = "2m" 107 108 macro m1 { 109 $w = 80 110 warnNotification = default 111 } 112 113 macro m2 { 114 macro = m1 115 $c = 90 116 } 117 118 alert os.high_cpu { 119 $q = avg(q("avg:rate:os.cpu{host=ny-nexpose01}", $default_time, "")) 120 macro = m2 121 warn = $q > $w 122 crit = $q >= $c 123 } 124 ~~~ 125 126 Will yield a warn expression for the os.high_cpu alert: 127 128 ~~~ 129 avg(q("avg:rate:os.cpu{host=ny-nexpose01}", "2m", "")) > 80 130 ~~~ 131 132 and set `warnNotification = default` for that alert. 133 134 ### template 135 136 Templates are the message body for emails that are sent when an alert is triggered. Syntax is the golang [text/template](http://golang.org/pkg/text/template/) package. Variable expansion is not performed on templates because `$` is used in the template language, but a `V()` function is provided instead. Email bodies are HTML, subjects are plaintext. Macro support is currently disabled for the same reason due to implementation details. 137 138 * body: message body (HTML) 139 * subject: message subject (plaintext) 140 141 #### Variables available to alert templates: 142 143 * Ack: URL for alert acknowledgement 144 * Expr: string of evaluated expression 145 * Group: dictionary of tags for this alert (i.e., host=ny-redis01, db=42) 146 * History: array of Events. An Event has a `Status` field (an integer) with a textual string representation; and a `Time` field. Most recent last. The status fields have identification methods: `IsNormal()`, `IsWarning()`, `IsCritical()`, `IsUnknown()`, `IsError()`. 147 * Incident: URL for incident page 148 * IsEmail: true if template is being rendered for an email. Needed because email clients often modify HTML. 149 * Last: last Event of History array 150 * Subject: string of template subject 151 * Touched: time this alert was last updated 152 * Alert: dictionary of rule data (but the first letter of each is uppercase) 153 * Crit 154 * IncidentId 155 * Name 156 * Vars: alert variables, prefixed without the `$`. For example: `{{.Alert.Vars.q}}` to print `$q`. 157 * Warn 158 159 #### Functions available to alert templates: 160 161 * Eval(string): executes the given expression and returns the first result with identical tags, or `nil` tags if none exists, otherwise `nil`. 162 * EvalAll(string): executes the given expression and returns all results. The `DescByValue` function may be called on the result of this to sort descending by value: `{{(.EvalAll .Alert.Vars.expr).DescByValue}}`. 163 * GetMeta(metric, name, tags): Returns metadata data for the given combination of metric, metadata name, and tag. `metric` and `name` are strings. `tags` may be a tag string (`"tagk=tagv,tag2=val2"`) or a tag set (`.Group`). If If `name` is the empty string, a slice of metadata matching the metric and tag is returned. Otherwise, only the metadata value is returned for the given name, or `nil` for no match. 164 * Graph(expression, y_label): returns an SVG graph of the expression with tags identical to the alert instance. `expression` is a string or an expression and `y_label` is a string. `y_label` is an optional argument. 165 * GraphLink(expression): returns a link to the graph tab for the expression page for the given expression. The time is set to the time of the alert. `expression` is a string. 166 * GraphAll(expression, y_label): returns an SVG graph of the expression. `expression` is a string or an expression and `y_label` is a string. `y_label` is an optional argument. 167 * LeftJoin(expr, expr[, expr...]): results of the first expression (which may be a string or an expression) are left joined to results from all following expressions. 168 * Lookup("table", "key"): Looks up the value for the key based on the tagset of the alert in the specified lookup table 169 * LookupAll("table", "key", "tag=val,tag2=val2"): Looks up the value for the key based on the tagset specified in the given lookup table 170 * HTTPGet("url"): Performs an http get and returns the raw text of the url 171 * HTTPGetJSON("url"): Performs an http get for the url and returns a [jsonq.JsonQuery object](https://godoc.org/github.com/jmoiron/jsonq) 172 * LSQuery("indexRoot", "filterString", "startDuration", "endDuration", nResults). Returns an array of a length up to nResults of Marshaled Json documents (Go: marshaled to interface{}). This is like the lscount and lsstat functions. There is no `keyString` because the group (aka tags) if the alert is used. 173 * LSQueryAll("indexRoot", "keyString" filterString", "startDuration", "endDuration", nResults). Like LSQuery but you have to specify the `keyString` since it is not scoped to the alert. 174 * ESQuery(index ESIndexer, filter ESQuery, startDuration string, endDuration string, nResults Scalar). Returns an array of a length up to nResults of Marshaled Json documents (Go: marshaled to interface{}). This is like the escount and esstat functions. The group (aka tags) of the alert is used to further filter the results. 175 * ESQueryAll((index ESIndexer, filter ESQuery, startDuration string, endDuration string, nResults Scalar). Like ESQuery but the results are not filtered based on the tagset (aka group) of the alert. As an example: 176 177 ``` 178 template test { 179 subject = {{.Last.Status}}: {{.Alert.Name}} on {{.Group.host}} 180 body = ` 181 {{ $filter := (.Eval .Alert.Vars.filter)}} 182 {{ $index := (.Eval .Alert.Vars.index)}} 183 {{range $i, $x := .ESQuery $index $filter "5m" "" 10 }} 184 <p>{{$x.machinename}}</p> 185 {{end}} 186 ` 187 } 188 189 alert test { 190 template = test 191 $index = esls("logstash") 192 $filter = esand(esregexp("source", ".*"), esregexp("machinename", "ls-dc.*")) 193 crit = avg(escount($index, "source,machinename", $filter, "2m", "10m", "")) 194 } 195 ``` 196 197 Global template functions: 198 199 * V: performs variable expansion on the argument and returns it. Needed since normal variable expansion is not done due to the `$` character being used by the Go template syntax. 200 * bytes: converts the string input into a human-readable number of bytes with extension KB, MB, GB, etc. 201 * pct: formats the float argument as a percentage. For example: `{{5.1 | pct}}` -> `5.10%`. 202 * replace: [strings.Replace](http://golang.org/pkg/strings/#Replace) 203 * short: Trims the string to everything before the first period. Useful for turning a FQDN into a shortname. For example: `{{short "foo.baz.com"}}` -> `foo`. 204 * parseDuration: [time.ParseDuration](http://golang.org/pkg/time/#ParseDuration). Useful when working with an alert's .Last.Time.Add method to generate urls to other systems. 205 * html: takes a string and renders it as html. Useful for when you have alert variables that contain html. For example in the alert you may have `$notes = <a href="...">Foo</a>` and the in the template you can render it as html with `{{ html .Alert.Vars.notes }}` 206 207 All body templates are associated, and so may be executed from another. Use the name of the other template section for inclusion. Subject templates are similarly associated. 208 209 An example: 210 211 ~~~ 212 template name { 213 body = Name: {{.Alert.Name}} 214 } 215 template ex { 216 body = `Alert definition: 217 {{template "name" .}} 218 Crit: {{.Alert.Crit}} 219 220 Tags:{{range $k, $v := .Group}} 221 {{$k}}: {{$v}}{{end}} 222 ` 223 subject = {{.Alert.Name}}: {{.Alert.Vars.q | .E}} on {{.Group.host}} 224 } 225 ~~~ 226 227 #### unknown template 228 229 The unknown template (set by the global option `unknownTemplate`) acts differently than alert templates. It receives groups of alerts since unknowns tend to happen in groups (i.e., a host stops reporting and all alerts for that host trigger unknown at the same time). 230 231 Variables and function available to the unknown template: 232 233 * Group: list of names of alerts 234 * Name: group name 235 * Time: [time](http://golang.org/pkg/time/#Time) this group triggered unknown 236 237 Example: 238 239 ~~~ 240 template ut { 241 subject = {{.Name}}: {{.Group | len}} unknown alerts 242 body = ` 243 <p>Time: {{.Time}} 244 <p>Name: {{.Name}} 245 <p>Alerts: 246 {{range .Group}} 247 <br>{{.}} 248 {{end}}` 249 } 250 251 unknownTemplate = ut 252 ~~~ 253 254 ### alert 255 256 An alert is an evaluated expression which can trigger actions like emailing or logging. The expression must yield a scalar. The alert triggers if not equal to zero. Alerts act on each tag set returned by the query. It is an error for alerts to specify start or end times. Those will be determined by the various functions and the alerting system. 257 258 * crit: expression of a critical alert (which will send an email) 259 * critNotification: comma-separated list of notifications to trigger on critical. This line may appear multiple times and duplicate notifications, which will be merged so only one of each notification is triggered. Lookup tables may be used when `lookup("table", "key")` is an entire `critNotification` value. See example below. 260 * depends: expression that this alert depends on. If the expression is non-zero, this alert is unevaluated. Unevaluated alerts do not change state or become unknown. 261 * ignoreUnknown: if present, will prevent alert from becoming unknown 262 * unknownIsNormal: will convert unkown events into normal events. For example, if you are alerting for the existence of error log messages, when there are none, that means things are normal. Using `ignoreUnknown` with this setting would be uneccesary. 263 * runEvery: multiple of global `checkFrequency` at which to run this alert. If unspecified, the global `defaultRunEvery` will be used. 264 * squelch: <a name="squelch"></a> comma-separated list of `tagk=tagv` pairs. `tagv` is a regex. If the current tag group matches all values, the alert is squelched, and will not trigger as crit or warn. For example, `squelch = host=ny-web.*,tier=prod` will match any group that has at least that host and tier. Note that the group may have other tags assigned to it, but since all elements of the squelch list were met, it is considered a match. Multiple squelch lines may appear; a tag group matches if any of the squelch lines match. 265 * template: name of template 266 * unjoinedOk: if present, will ignore unjoined expression errors 267 * unknown: time at which to mark an alert unknown if it cannot be evaluated; defaults to global checkFrequency 268 * warn: expression of a warning alert (viewable on the web interface) 269 * warnNotification: identical to critNotification, but for warnings 270 * log: setting `log = true` will make the alert behave as a "log alert". It will never show up on the dashboard, but will execute notifications every check interval where the status is abnormal. 271 * maxLogFrequency: will throttle log notifications to the specified duration. `maxLogFrequency = 5m` will ensure that notifications only fire once every 5 minutes for any given alert key. Only valid on log alerts. 272 273 Example of notification lookups: 274 275 ~~~ 276 notification all { 277 #... 278 } 279 280 notification n { 281 #... 282 } 283 284 notification d { 285 #... 286 } 287 288 lookup l { 289 entry host=a { 290 v = n 291 entry host=b* { 292 v = d 293 } 294 } 295 296 alert a { 297 crit = 1 298 critNotification = all # All alerts have the all notification. 299 # Other alerts are passed through the l lookup table and may add n or d. 300 # If the host tag does not match a or b*, no other notification is added. 301 critNotification = lookup("l", "v") 302 # Do not evaluate this alert if its host is down. 303 depends = alert("host.down", "crit") 304 } 305 ~~~ 306 307 ### notification 308 309 A notification is a chained action to perform. The chaining continues until the chain ends or the alert is acknowledged. At least one action must be specified. `next` and `timeout` are optional. Notifications are independent of each other and executed concurrently (if there are many notifications for an alert, one will not block another). 310 311 * body: overrides the default POST body. The alert subject is passed as the templates `.` variable. The `V` function is available as in other templates. Additionally, a `json` function will output JSON-encoded data. 312 * next: name of next notification to execute after timeout. Can be itself. 313 * timeout: duration to wait until next is executed. If not specified, will happen immediately. 314 * contentType: If your body for a POST notification requires a different Content-Type header than the default of `application/x-www-form-urlencoded`, you may set the contentType variable. 315 * runOnActions: Exclude this notification from action notifications. Notifications will be sent on ack/close/forget actions using a built-in template to all root level notifications for an alert, *unless* the notification specifies `runOnActions = false`. 316 317 #### actions 318 319 * email: list of email address of contacts. Comma separated. Supports formats `Person Name <addr@domain.com>` and `addr@domain.com`. Alert template subject and body used for the email. 320 * get: HTTP get to given URL 321 * post: HTTP post to given URL. Alert subject sent as request body. Content type is set as `application/x-www-form-urlencoded` by default, but may be overriden by setting the `contentType` variable for the notification. 322 * print: prints template subject to stdout. print value is ignored, so just use: `print = true` 323 324 Example: 325 326 ~~~ 327 # HTTP Post to a chatroom, email in 10m if not ack'd 328 notification chat { 329 next = email 330 timeout = 10m 331 post = http://chat.meta.stackoverflow.com/room/318?key=KEY&message=whatever 332 } 333 334 # email sysadmins and Nick each day until ack'd 335 notification email { 336 email = sysadmins@stackoverflow.com, nick@stackoverflow.com 337 next = email 338 timeout = 1d 339 } 340 341 # post to a slack.com chatroom via Incoming Webhooks integration 342 notification slack{ 343 post = https://hooks.slack.com/services/abcdef 344 body = {"text": {{.|json}}} 345 } 346 347 #post json 348 notification json{ 349 post = https://someurl.com/submit 350 body = {"text": {{.|json}}, apiKey="2847abc23"} 351 contentType = application/json 352 } 353 ~~~ 354 355 ### lookup 356 357 Lookups are used when different values are needed based on the group. For example, an alert for high CPU use may have a general setting, but need to be higher for known high-CPU machines. Lookups have subsections for lookup entries. Each entry subsection is named with an OpenTSDB tag group, and supports globbing. Entry subsections have arbitrary key/value pairs. 358 359 The `lookup` function can be used in expressions to query lookup data. It takes two arguments: the name of the lookup table and the key to be extracted. When the function is executed, all possible combinations of tags are fetched from the search service, matched to the correct rule, and returned. The first successful match is used. Unmatched groups are ignored. 360 361 For example, to filter based on host: 362 363 ~~~ 364 lookup cpu { 365 entry host=web-* { 366 high = 0.5 367 } 368 entry host=sql-* { 369 high = 0.8 370 } 371 entry host=* { 372 high = 0.3 373 } 374 } 375 376 alert cpu { 377 crit = avg(q("avg:rate:os.cpu{host=*}", "5m", "")) > lookup("cpu", "high") 378 } 379 ~~~ 380 381 Multiple groups are supported and separated by commas. For example: 382 383 ~~~ 384 lookup cpu { 385 entry host=web-*,dc=eu { 386 high = 0.5 387 } 388 entry host=sql-*,dc=us { 389 high = 0.8 390 } 391 entry host=*,dc=us { 392 high = 0.3 393 } 394 entry host=*,dc=* { 395 high = 0.4 396 } 397 } 398 399 alert cpu { 400 crit = avg(q("avg:rate:os.cpu{host=*,dc=*}", "5m", "")) > lookup("cpu", "high") 401 } 402 ~~~ 403 404 # Example File 405 406 ~~~ 407 tsdbHost = tsdb01.stackoverflow.com:4242 408 smtpHost = mail.stackoverflow.com:25 409 410 template cpu { 411 body = `Alert definition: 412 Name: {{.Alert.Name}} 413 Crit: {{.Alert.Crit}} 414 415 Tags:{{range $k, $v := .Group}} 416 {{$k}}: {{$v}}{{end}} 417 ` 418 subject = cpu idle at {{.Alert.Vars.q | .E}} on {{.Group.host}} 419 } 420 421 notification default { 422 email = someone@domain.com 423 next = default 424 timeout = 1h 425 } 426 427 alert cpu { 428 template = cpu 429 $q = avg(q("sum:rate:linux.cpu{host=*,type=idle}", "1m")) 430 crit = $q < 40 431 notification = default 432 } 433 ~~~ 434 435 {% endraw %} 436 437 </div> 438 </div>