bosun.org@v0.0.0-20210513094433-e25bc3e69a1f/docs/configuration.md (about)

     1  ---
     2  layout: default
     3  title: Configuration (0.5.0 and earlier)
     4  order: 3
     5  ---
     6  
     7  <div class="row">
     8  <div class="col-sm-3" >
     9    <div class="sidebar" data-spy="affix" data-offset-top="0" data-offset-bottom="0" markdown="1">
    10   
    11   * Some TOC
    12   {:toc}
    13   
    14    </div>
    15  </div>
    16  
    17  <div class="doc-body col-sm-9" markdown="1">
    18  
    19  <p class="title h1">{{page.title}}</p>
    20  
    21  <div class="admonition">
    22  <p class="admonition-title">Attention</p>
    23  <p>This documentation is for versions prior to 0.6.0. For 0.6.0 there are two different documentation sections that replace this section: <a href="/system_configuration">system configuration</a> and <a href="/definitions">definitions</a>.</p>
    24  </div>
    25  
    26  {% raw %}
    27  
    28  Syntax is sectional, with each section having a type and a name, followed by `{` and ending with `}`. Key/value pairs follow of the form `key = value`. Key names are non-whitespace characters before the `=`. The value goes until end of line and is a string. Multi-line strings are supported using backticks to delimit start and end of string. Comments go from a `#` to end of line (unless the `#` appears in a backtick string). Whitespace is trimmed at ends of values and keys. Files are UTF-8 encoded.
    29  
    30  ## Variables
    31  
    32  Variables perform simple text replacement - they are not intelligent. They are any key whose name begins with `$`, and may also be surrounded by braces (`{`, `}`) to disambiguate between shorter keys (ex: `${var}`) Before an expression is evaluated, all variables are evaluated in the text. Variables can be defined at any scope, and will shadow other variables with the same name of higher scope.
    33  
    34  ### Environment Variables
    35  
    36  Environment variables may be used similarly to variables, but with `env.` preceding the name. For example: `tsdbHost = ${env.TSDBHOST}` (with or without braces). It is an error to specify a non-existent or empty environment variable.
    37  
    38  ## Sections
    39  
    40  ### globals
    41  
    42  Globals are all key=value pairs not in a section. These are generally placed at the top of the file.
    43  Every variable is optional, though you should enable at least 1 backend.
    44  
    45  #### backends
    46  
    47  * tsdbHost: OpenTSDB host. Must be GZIP-aware (use the [next branch](https://github.com/opentsdb/opentsdb/tree/next)). Can specify both host and port: `tsdb-host:4242`. Defaults to port 4242 if no port specified. If you use opentsdb without relaying the data through Bosun currently the following won't work (and this isn't something we officially support):
    48    * Tag value glob matching, for example `avg:metric.name{tag=something-*}`. However single asterisks like `tag=*` will still work.
    49    * The items page.
    50    * The graph page's tag list.
    51  * tsdbVersion: Defaults to 2.1 if not present. Should always be specified as Number.Number. Various OpenTSDB features are added with newer versions.
    52  * relayListen: Listen on the given address (i.e., set to :4242) and will pass through all /api/X calls to your OpenTSDB server. This is an optinal parameter when using OpenTSDB so it is not required for any Bosun functionality
    53  * graphiteHost: an ip, hostname, ip:port, hostname:port or a URL, defaults to standard http/https ports, defaults to "/render" path.  Any non-zero path (even "/" overrides path)
    54  * graphiteHeader: a http header to be sent to graphite on each request in 'key:value' format. optional. can be specified multiple times.
    55  * logstashElasticHosts: Elasticsearch hosts populated by logstash. Must be a CSV list of URLs and only works with elastic pre-v2. The hosts you list are used to discover all hosts in the cluster.
    56  * elasticHosts: Elasticsearch hosts. This is not limited to logstash's schema. It must be a CSV list of URLs and only works with elastic v2 and later. The hosts you list are used to discover all hosts in the cluster.
    57  * annotateElasticHosts: Enables annotations by setting this. Is a CSV list of URLs like elasticHosts. More on annotations in the [usage documentation](http://bosun.org/usage#annotations). By default the index is named "annotate" and will be created if it doesn't exist. You can change which index to use/create with the annotateIndex setting.
    58  * influxHost: InfluxDB host address ip:port pair.
    59  * influxUsername: InfluxDB username. If empty will attempt to connect without authentication.
    60  * influxPassword: InfluxDB password. If empty will attempt to connect without authentication.
    61  * influxTLS: Whether to use TLS when connecting to InfluxDB. Default is false.
    62  * influxTimeout: Timeout duration for connections to InfluxDB.
    63  
    64  #### data storage
    65  
    66  With bosun v0.5.0, bosun uses redis as a storage mechanism for it's internal state. You can either run a redis instance to hold this data, or bosun can use an embedded server if you would rather run standalone (using [ledisDb](http://ledisdb.com/)). Redis is recommend for production use. [This gist](https://gist.github.com/kylebrandt/3fdc97171b96ba46fd9e1d14abd03027) shows an example redis config, tested redis version, and an example cron job for backing up the redis data.
    67  
    68  Config items:  
    69  
    70  * redisHost: redis server to use. Ex: `localhost:6379`. Redis 3.0 or greater is required.
    71  * redisDb: redis database to use. Default is `0`.
    72  * redisPassword: redis password.
    73  * ledisDir: directory for ledisDb to store it's data. Will default to `ledis_data` in working dir if no redis host is provided.
    74  * ledisBindAddr: Address and port for ledis to bind to, defaults to `127.0.0.1:9565`.
    75  
    76  #### settings
    77  
    78  * checkFrequency: time between alert checks, defaults to `5m`
    79  * defaultRunEvery: default multiplier of check frequency to run alerts. Defaults to `1`.
    80  * emailFrom: from address for notification emails, required for email notifications
    81  * httpListen: HTTP listen address, defaults to `:8070`
    82  * hostname: when generating links in templates, use this value as the hostname instead of using the system's hostname
    83  * minGroupSize: minimum group size for alerts to be grouped together on dashboard. Default `5`.
    84  * ping: if present, will ping all values tagged with host
    85  * responseLimit: number of bytes to limit OpenTSDB responses, defaults to 1MB (`1048576`)
    86  * searchSince: duration of time to filter by during certain searches, defaults to `3d`; currently used by the hosts list on the items page
    87  * smtpHost: SMTP server, required for email notifications
    88  * squelch: see [alert squelch](#squelch)
    89  * stateFile: bosun state file, defaults to `bosun.state`
    90  * unknownTemplate: name of the template for unknown alerts
    91  * shortURLKey: goo.gl API key, needed if you hit usage limits when using the short link button
    92  * timeAndDate: The configuration parameter for the worldclock links is timeAndDate, i.e. `timeAndDate = 202,75,179,136` adds adds Portland, Denver, New York, and London to the datetime links generated in alerts. See [timeanddate.com documentation](http://www.timeanddate.com/worldclock/converter-about.html)
    93  
    94  #### SMTP Authentication
    95  
    96  These optional fields, if either is specified, will authenticate with the SMTP server
    97  
    98  * smtpUsername: SMTP username
    99  * smtpPassword: SMTP password
   100  
   101  ### macro
   102  
   103  Macros are sections that can define anything (including variables). It is not an error to reference an unknown variable in a macro. Other sections can reference the macro with `macro = name`. The macro's data will be expanded with the current variable definitions and inserted at that point in the section. Multiple macros may be thus referenced at any time. Macros may reference other macros. For example:
   104  
   105  ~~~
   106  $default_time = "2m"
   107  
   108  macro m1 {
   109  	$w = 80
   110  	warnNotification = default
   111  }
   112  
   113  macro m2 {
   114  	macro = m1
   115  	$c = 90
   116  }
   117  
   118  alert os.high_cpu {
   119  	$q = avg(q("avg:rate:os.cpu{host=ny-nexpose01}", $default_time, ""))
   120  	macro = m2
   121  	warn = $q > $w
   122  	crit = $q >= $c
   123  }
   124  ~~~
   125  
   126  Will yield a warn expression for the os.high_cpu alert:
   127  
   128  ~~~
   129  avg(q("avg:rate:os.cpu{host=ny-nexpose01}", "2m", "")) > 80
   130  ~~~
   131  
   132  and set `warnNotification = default` for that alert.
   133  
   134  ### template
   135  
   136  Templates are the message body for emails that are sent when an alert is triggered. Syntax is the golang [text/template](http://golang.org/pkg/text/template/) package. Variable expansion is not performed on templates because `$` is used in the template language, but a `V()` function is provided instead. Email bodies are HTML, subjects are plaintext. Macro support is currently disabled for the same reason due to implementation details.
   137  
   138  * body: message body (HTML)
   139  * subject: message subject (plaintext)
   140  
   141  #### Variables available to alert templates:
   142  
   143  * Ack: URL for alert acknowledgement
   144  * Expr: string of evaluated expression
   145  * Group: dictionary of tags for this alert (i.e., host=ny-redis01, db=42)
   146  * History: array of Events. An Event has a `Status` field (an integer) with a textual string representation; and a `Time` field. Most recent last. The status fields have identification methods: `IsNormal()`, `IsWarning()`, `IsCritical()`, `IsUnknown()`, `IsError()`.
   147  * Incident: URL for incident page
   148  * IsEmail: true if template is being rendered for an email. Needed because email clients often modify HTML.
   149  * Last: last Event of History array
   150  * Subject: string of template subject
   151  * Touched: time this alert was last updated
   152  * Alert: dictionary of rule data (but the first letter of each is uppercase)
   153    * Crit
   154    * IncidentId
   155    * Name
   156    * Vars: alert variables, prefixed without the `$`. For example: `{{.Alert.Vars.q}}` to print `$q`.
   157    * Warn
   158  
   159  #### Functions available to alert templates:
   160  
   161  * Eval(string): executes the given expression and returns the first result with identical tags, or `nil` tags if none exists, otherwise `nil`.
   162  * EvalAll(string): executes the given expression and returns all results. The `DescByValue` function may be called on the result of this to sort descending by value: `{{(.EvalAll .Alert.Vars.expr).DescByValue}}`.
   163  * GetMeta(metric, name, tags): Returns metadata data for the given combination of metric, metadata name, and tag. `metric` and `name` are strings. `tags` may be a tag string (`"tagk=tagv,tag2=val2"`) or a tag set (`.Group`). If If `name` is the empty string, a slice of metadata matching the metric and tag is returned. Otherwise, only the metadata value is returned for the given name, or `nil` for no match.
   164  * Graph(expression, y_label): returns an SVG graph of the expression with tags identical to the alert instance. `expression` is a string or an expression and `y_label` is a string. `y_label` is an optional argument.
   165  * GraphLink(expression): returns a link to the graph tab for the expression page for the given expression. The time is set to the time of the alert. `expression` is a string.
   166  * GraphAll(expression, y_label): returns an SVG graph of the expression. `expression` is a string or an expression and `y_label` is a string. `y_label` is an optional argument.
   167  * LeftJoin(expr, expr[, expr...]): results of the first expression (which may be a string or an expression) are left joined to results from all following expressions.
   168  * Lookup("table", "key"): Looks up the value for the key based on the tagset of the alert in the specified lookup table
   169  * LookupAll("table", "key", "tag=val,tag2=val2"): Looks up the value for the key based on the tagset specified in the given lookup table
   170  * HTTPGet("url"): Performs an http get and returns the raw text of the url
   171  * HTTPGetJSON("url"): Performs an http get for the url and returns a [jsonq.JsonQuery object](https://godoc.org/github.com/jmoiron/jsonq)
   172  * LSQuery("indexRoot", "filterString", "startDuration", "endDuration", nResults). Returns an array of a length up to nResults of Marshaled Json documents (Go: marshaled to interface{}). This is like the lscount and lsstat functions. There is no `keyString` because the group (aka tags) if the alert is used.
   173  * LSQueryAll("indexRoot", "keyString" filterString", "startDuration", "endDuration", nResults). Like LSQuery but you have to specify the `keyString` since it is not scoped to the alert.
   174  * ESQuery(index ESIndexer, filter ESQuery, startDuration string, endDuration string, nResults Scalar). Returns an array of a length up to nResults of Marshaled Json documents (Go: marshaled to interface{}). This is like the escount and esstat functions. The group (aka tags) of the alert is used to further filter the results.
   175  * ESQueryAll((index ESIndexer, filter ESQuery, startDuration string, endDuration string, nResults Scalar). Like ESQuery but the results are not filtered based on the tagset (aka group) of the alert. As an example:
   176  
   177  ```
   178  template test {
   179  	subject = {{.Last.Status}}: {{.Alert.Name}} on {{.Group.host}}
   180  	body = `
   181  	    {{ $filter := (.Eval .Alert.Vars.filter)}}
   182  	    {{ $index := (.Eval .Alert.Vars.index)}}
   183  	    {{range $i, $x := .ESQuery $index $filter "5m" "" 10 }}
   184  	        <p>{{$x.machinename}}</p>
   185  	    {{end}}
   186  	`
   187  }
   188  
   189  alert test {
   190  	template = test
   191  	$index = esls("logstash")
   192  	$filter = esand(esregexp("source", ".*"), esregexp("machinename", "ls-dc.*"))
   193      crit = avg(escount($index, "source,machinename", $filter, "2m", "10m", ""))
   194  }
   195  ```
   196  
   197  Global template functions:
   198  
   199  * V: performs variable expansion on the argument and returns it. Needed since normal variable expansion is not done due to the `$` character being used by the Go template syntax.
   200  * bytes: converts the string input into a human-readable number of bytes with extension KB, MB, GB, etc.
   201  * pct: formats the float argument as a percentage. For example: `{{5.1 | pct}}` -> `5.10%`.
   202  * replace: [strings.Replace](http://golang.org/pkg/strings/#Replace)
   203  * short: Trims the string to everything before the first period. Useful for turning a FQDN into a shortname. For example: `{{short "foo.baz.com"}}` -> `foo`.
   204  * parseDuration: [time.ParseDuration](http://golang.org/pkg/time/#ParseDuration). Useful when working with an alert's .Last.Time.Add method to generate urls to other systems.
   205  * html: takes a string and renders it as html. Useful for when you have alert variables that contain html. For example in the alert you may have `$notes = <a href="...">Foo</a>` and the in the template you can render it as html with `{{ html .Alert.Vars.notes }}` 
   206  
   207  All body templates are associated, and so may be executed from another. Use the name of the other template section for inclusion. Subject templates are similarly associated.
   208  
   209  An example:
   210  
   211  ~~~
   212  template name {
   213  	body = Name: {{.Alert.Name}}
   214  }
   215  template ex {
   216  	body = `Alert definition:
   217  	{{template "name" .}}
   218  	Crit: {{.Alert.Crit}}
   219  
   220  	Tags:{{range $k, $v := .Group}}
   221  	{{$k}}: {{$v}}{{end}}
   222  	`
   223  	subject = {{.Alert.Name}}: {{.Alert.Vars.q | .E}} on {{.Group.host}}
   224  }
   225  ~~~
   226  
   227  #### unknown template
   228  
   229  The unknown template (set by the global option `unknownTemplate`) acts differently than alert templates. It receives groups of alerts since unknowns tend to happen in groups (i.e., a host stops reporting and all alerts for that host trigger unknown at the same time).
   230  
   231  Variables and function available to the unknown template:
   232  
   233  * Group: list of names of alerts
   234  * Name: group name
   235  * Time: [time](http://golang.org/pkg/time/#Time) this group triggered unknown
   236  
   237  Example:
   238  
   239  ~~~
   240  template ut {
   241  	subject = {{.Name}}: {{.Group | len}} unknown alerts
   242  	body = `
   243  	<p>Time: {{.Time}}
   244  	<p>Name: {{.Name}}
   245  	<p>Alerts:
   246  	{{range .Group}}
   247  		<br>{{.}}
   248  	{{end}}`
   249  }
   250  
   251  unknownTemplate = ut
   252  ~~~
   253  
   254  ### alert
   255  
   256  An alert is an evaluated expression which can trigger actions like emailing or logging. The expression must yield a scalar. The alert triggers if not equal to zero. Alerts act on each tag set returned by the query. It is an error for alerts to specify start or end times. Those will be determined by the various functions and the alerting system.
   257  
   258  * crit: expression of a critical alert (which will send an email)
   259  * critNotification: comma-separated list of notifications to trigger on critical. This line may appear multiple times and duplicate notifications, which will be merged so only one of each notification is triggered. Lookup tables may be used when `lookup("table", "key")` is an entire `critNotification` value. See example below.
   260  * depends: expression that this alert depends on. If the expression is non-zero, this alert is unevaluated. Unevaluated alerts do not change state or become unknown.
   261  * ignoreUnknown: if present, will prevent alert from becoming unknown
   262  * unknownIsNormal: will convert unkown events into normal events. For example, if you are alerting for the existence of error log messages, when there are none, that means things are normal. Using `ignoreUnknown` with this setting would be uneccesary.
   263  * runEvery: multiple of global `checkFrequency` at which to run this alert. If unspecified, the global `defaultRunEvery` will be used.
   264  * squelch: <a name="squelch"></a> comma-separated list of `tagk=tagv` pairs. `tagv` is a regex. If the current tag group matches all values, the alert is squelched, and will not trigger as crit or warn. For example, `squelch = host=ny-web.*,tier=prod` will match any group that has at least that host and tier. Note that the group may have other tags assigned to it, but since all elements of the squelch list were met, it is considered a match. Multiple squelch lines may appear; a tag group matches if any of the squelch lines match.
   265  * template: name of template
   266  * unjoinedOk: if present, will ignore unjoined expression errors
   267  * unknown: time at which to mark an alert unknown if it cannot be evaluated; defaults to global checkFrequency
   268  * warn: expression of a warning alert (viewable on the web interface)
   269  * warnNotification: identical to critNotification, but for warnings
   270  * log: setting `log = true` will make the alert behave as a "log alert". It will never show up on the dashboard, but will execute notifications every check interval where the status is abnormal.
   271  * maxLogFrequency: will throttle log notifications to the specified duration. `maxLogFrequency = 5m` will ensure that notifications only fire once every 5 minutes for any given alert key. Only valid on log alerts.
   272  
   273  Example of notification lookups:
   274  
   275  ~~~
   276  notification all {
   277  	#...
   278  }
   279  
   280  notification n {
   281  	#...
   282  }
   283  
   284  notification d {
   285  	#...
   286  }
   287  
   288  lookup l {
   289  	entry host=a {
   290  		v = n
   291  	entry host=b* {
   292  		v = d
   293  	}
   294  }
   295  
   296  alert a {
   297  	crit = 1
   298  	critNotification = all # All alerts have the all notification.
   299  	# Other alerts are passed through the l lookup table and may add n or d.
   300  	# If the host tag does not match a or b*, no other notification is added.
   301  	critNotification = lookup("l", "v")
   302  	# Do not evaluate this alert if its host is down.
   303  	depends = alert("host.down", "crit")
   304  }
   305  ~~~
   306  
   307  ### notification
   308  
   309  A notification is a chained action to perform. The chaining continues until the chain ends or the alert is acknowledged. At least one action must be specified. `next` and `timeout` are optional. Notifications are independent of each other and executed concurrently (if there are many notifications for an alert, one will not block another).
   310  
   311  * body: overrides the default POST body. The alert subject is passed as the templates `.` variable. The `V` function is available as in other templates. Additionally, a `json` function will output JSON-encoded data.
   312  * next: name of next notification to execute after timeout. Can be itself.
   313  * timeout: duration to wait until next is executed. If not specified, will happen immediately.
   314  * contentType: If your body for a POST notification requires a different Content-Type header than the default of `application/x-www-form-urlencoded`, you may set the contentType variable. 
   315  * runOnActions: Exclude this notification from action notifications. Notifications will be sent on ack/close/forget actions using a built-in template to all root level notifications for an alert, *unless* the notification specifies `runOnActions = false`. 
   316  
   317  #### actions
   318  
   319  * email: list of email address of contacts. Comma separated. Supports formats `Person Name <addr@domain.com>` and `addr@domain.com`.  Alert template subject and body used for the email.
   320  * get: HTTP get to given URL
   321  * post: HTTP post to given URL. Alert subject sent as request body. Content type is set as `application/x-www-form-urlencoded` by default, but may be overriden by setting the `contentType` variable for the notification.
   322  * print: prints template subject to stdout. print value is ignored, so just use: `print = true`
   323  
   324  Example:
   325  
   326  ~~~
   327  # HTTP Post to a chatroom, email in 10m if not ack'd
   328  notification chat {
   329  	next = email
   330  	timeout = 10m
   331  	post = http://chat.meta.stackoverflow.com/room/318?key=KEY&message=whatever
   332  }
   333  
   334  # email sysadmins and Nick each day until ack'd
   335  notification email {
   336  	email = sysadmins@stackoverflow.com, nick@stackoverflow.com
   337  	next = email
   338  	timeout = 1d
   339  }
   340  
   341  # post to a slack.com chatroom via Incoming Webhooks integration
   342  notification slack{
   343  	post = https://hooks.slack.com/services/abcdef
   344  	body = {"text": {{.|json}}}
   345  }
   346  
   347  #post json
   348  notification json{
   349  	post = https://someurl.com/submit
   350  	body = {"text": {{.|json}}, apiKey="2847abc23"}
   351  	contentType = application/json
   352  }
   353  ~~~
   354  
   355  ### lookup
   356  
   357  Lookups are used when different values are needed based on the group. For example, an alert for high CPU use may have a general setting, but need to be higher for known high-CPU machines. Lookups have subsections for lookup entries. Each entry subsection is named with an OpenTSDB tag group, and supports globbing. Entry subsections have arbitrary key/value pairs.
   358  
   359  The `lookup` function can be used in expressions to query lookup data. It takes two arguments: the name of the lookup table and the key to be extracted. When the function is executed, all possible combinations of tags are fetched from the search service, matched to the correct rule, and returned. The first successful match is used. Unmatched groups are ignored.
   360  
   361  For example, to filter based on host:
   362  
   363  ~~~
   364  lookup cpu {
   365  	entry host=web-* {
   366  		high = 0.5
   367  	}
   368  	entry host=sql-* {
   369  		high = 0.8
   370  	}
   371  	entry host=* {
   372  		high = 0.3
   373  	}
   374  }
   375  
   376  alert cpu {
   377  	crit = avg(q("avg:rate:os.cpu{host=*}", "5m", "")) > lookup("cpu", "high")
   378  }
   379  ~~~
   380  
   381  Multiple groups are supported and separated by commas. For example:
   382  
   383  ~~~
   384  lookup cpu {
   385  	entry host=web-*,dc=eu {
   386  		high = 0.5
   387  	}
   388  	entry host=sql-*,dc=us {
   389  		high = 0.8
   390  	}
   391  	entry host=*,dc=us {
   392  		high = 0.3
   393  	}
   394  	entry host=*,dc=* {
   395  		high = 0.4
   396  	}
   397  }
   398  
   399  alert cpu {
   400  	crit = avg(q("avg:rate:os.cpu{host=*,dc=*}", "5m", "")) > lookup("cpu", "high")
   401  }
   402  ~~~
   403  
   404  # Example File
   405  
   406  ~~~
   407  tsdbHost = tsdb01.stackoverflow.com:4242
   408  smtpHost = mail.stackoverflow.com:25
   409  
   410  template cpu {
   411  	body = `Alert definition:
   412  	Name: {{.Alert.Name}}
   413  	Crit: {{.Alert.Crit}}
   414  	
   415  	Tags:{{range $k, $v := .Group}}
   416  	{{$k}}: {{$v}}{{end}}
   417  	`
   418  	subject = cpu idle at {{.Alert.Vars.q | .E}} on {{.Group.host}}
   419  }
   420  
   421  notification default {
   422  	email = someone@domain.com
   423  	next = default
   424  	timeout = 1h
   425  }
   426  
   427  alert cpu {
   428  	template = cpu
   429  	$q = avg(q("sum:rate:linux.cpu{host=*,type=idle}", "1m"))
   430  	crit = $q < 40
   431  	notification = default
   432  }
   433  ~~~
   434  
   435  {% endraw %}
   436  
   437  </div>
   438  </div>