bosun.org@v0.0.0-20210513094433-e25bc3e69a1f/cmd/scollector/doc.go (about)

     1  /*
     2  
     3  Scollector is a metric collection agent for OpenTSDB 2.0 and Bosun.
     4  
     5  tcollector (https://github.com/OpenTSDB/tcollector) is OpenTSDB's data
     6  collection framework built for OpenTSDB 1.0. scollector aims to be tcollector
     7  for OpenTSDB 2.0 and is one method of sending data to Bosun (http://bosun.org/)
     8  for monitoring.
     9  
    10  Unlike tcollector, scollector is a single binary where all collectors are
    11  compiled into scollector itself. scollector supports external collectors, but
    12  your goal should be to use those temporarily until the go version is written or
    13  the target system send data directly to OpenTSDB or Bosun. scollector has
    14  native collectors for Linux, Darwin, and Windows and can pull data from other
    15  systems such as AWS, SNMP, and vSphere.
    16  
    17  Usage:
    18  	scollector [flag]
    19  
    20  The flags are:
    21  
    22  	-h=""
    23  		OpenTSDB or Bosun host. Overrides Host in conf file.
    24  	-f=""
    25  		Only include collectors matching these comma separated terms. Prefix
    26  		with - to invert match and exclude collectors matching those terms. Use
    27  		*,-term,-anotherterm to include all collectors except excluded terms.
    28  	-b=0
    29  		OpenTSDB batch size. Default is 500.
    30  	-conf=""
    31  		Location of configuration file. Defaults to scollector.toml in directory of
    32  		the scollector executable.
    33  	-l
    34  		List available collectors (after Filter is applied).
    35  	-m
    36  		Disable sending of metadata.
    37  	-version
    38  		Prints the version and exits.
    39  
    40  Additional flags on Windows:
    41  	-winsvc=""
    42  		Windows Service management; can be: install, remove, start, stop
    43  
    44  Debug flags:
    45  	-d
    46  		enables debug output
    47  	-p
    48  		print to screen instead of sending to a host
    49  	-fake=0
    50  		generates X fake data points per second on the test.fake metric
    51  
    52  The only required paremeter is the host, which may be specified in the conf
    53  file or with -h.
    54  
    55  Warning
    56  
    57  scollector has not been tested outside of the Stack Exchange environment, and
    58  thus may act incorrectly elsewhere.
    59  
    60  scollector requires the new HTTP API of OpenTSDB 2.1 with gzip support. Ensure
    61  that is in use if not using the OpenTSDB docker image.
    62  
    63  Logs
    64  
    65  If started with -p or -d, scollector logs to Stdout. Otherwise, on Unixes,
    66  scollector logs to syslog. On Windows when started as a service, the Event Log
    67  is used.
    68  
    69  External Collectors
    70  
    71  See http://bosun.org/scollector/external-collectors for details about using
    72  external scripts or programs to collect metrics.
    73  
    74  Configuration File
    75  
    76  If scollector.toml exists in the same directory as the scollector
    77  executable or is specified via the -conf="" flag, it's content
    78  will be used to set configuration flags. The format is toml
    79  (https://github.com/toml-lang/toml/blob/master/versions/en/toml-v0.2.0.md).
    80  Available keys are:
    81  
    82  Host (string): the OpenTSDB or Bosun host to send data, supports TLS and
    83  HTTP Basic Auth.
    84  
    85  	Host = "https://user:password@example.com/"
    86  
    87  FullHost (boolean): enables full hostnames: doesn't truncate to first ".".
    88  
    89  ColDir (string): is the external collectors directory.
    90  
    91  Tags (table of strings): are added to every datapoint. If a collector specifies
    92  the same tag key, this one will be overwritten. The host tag is not supported.
    93  
    94  Hostname (string): overrides the system hostname.
    95  
    96  DisableSelf (boolean): disables sending of scollector self metrics.
    97  
    98  Freq (integer): is the default frequency in seconds for most collectors.
    99  
   100  BatchSize (integer): is the number of metrics that will be sent in each batch.
   101  Default is 500.
   102  
   103  MaxQueueLen (integer): is the number of metrics keept internally.
   104  Default is 200000.
   105  
   106  UserAgentMessage (string): is an optional message that will be appended to the
   107  User Agent when making HTTP requests. This can be used to add contact details
   108  so external services are aware of who is making the requests.
   109  Example: Scollector/0.6.0 (UserAgentMessage added here)
   110  
   111  Filter (array of string): Only include collectors matching these terms. Prefix
   112  with - to invert match and exclude collectors matching those terms. Use
   113  *,-term,-anotherterm to include all collectors except excluded terms.
   114  
   115  MetricFilters (array of string): only send metrics matching these regular
   116  expressions. Example ['^(win\.cpu|win\.system\..*)$', 'free']
   117  
   118  IfaceExpr (string): Replaces the default regular expression for interface name
   119  matching on Linux.
   120  
   121  PProf (string): optional IP:Port binding to be used for debugging with pprof.
   122  Examples: localhost:6060 for loopback or :6060 for all IP addresses.
   123  
   124  MetricPrefix (string): optional Prefix prepended to all metrics path.
   125  
   126  Collector configuration keys
   127  
   128  Following are configurations for collectors that do not autodetect.
   129  
   130  KeepalivedCommunity (string): if not empty, enables the Keepalived collector
   131  with the specified community.
   132  
   133  	KeepalivedCommunity = "keepalivedcom"
   134  
   135  HAProxy (array of table, keys are User, Password, Instances): HAProxy instances
   136  to poll. The Instances key is an array of table with keys User, Password, Tier,
   137  and URL. If User is specified for an instance, User and Password override the
   138  common ones.
   139  
   140  	[[HAProxy]]
   141  	  User = "hauser"
   142  	  Password = "hapass"
   143  	  [[HAProxy.Instances]]
   144  	    Tier = "1"
   145  	    URL = "http://ny-host01:17/haproxy\;csv"
   146  	  [[HAProxy.Instances]]
   147  	    Tier = "2"
   148  	    URL = "http://ny-host01:26/haproxy\;csv"
   149  	  [[HAProxy.Instances]]
   150  	    Tier = "3"
   151  	    URL = "http://ny-host01:40/haproxy\;csv"
   152  	  [[HAProxy.Instances]]
   153  	    User = "hauser2"
   154  	    Password = "hapass2"
   155  	    Tier = "1"
   156  	    URL = "http://ny-host01:80/haproxy\;csv"
   157  
   158  SNMP (array of table, keys are Community and Host): SNMP hosts to connect
   159  to at a 5 minute poll interval.
   160  
   161  	[[SNMP]]
   162  	  Community = "com"
   163  	  Host = "host"
   164  	  MIBs = ["cisco"]
   165  	[[SNMP]]
   166  	  Community = "com2"
   167  	  Host = "host2"
   168  	  # List of mibs to run for this host. Default is built-in set of ["ifaces","cisco"]
   169  	  MIBs = ["custom", "ifaces"]
   170  
   171  MIBs (map of string to table): Allows user-specified, custom SNMP configurations.
   172  
   173      [MIBs]
   174        [MIBs.cisco] #can name anything you want
   175          BaseOid = "1.3.6.1.4.1.9.9" # common base for all metrics in this mib
   176  
   177          # simple, single key metrics
   178          [[MIBs.cisco.Metrics]]
   179            Metric = "cisco.cpu"
   180            Oid = ".109.1.1.1.1.6"
   181            Unit = "percent"
   182            RateType = "gauge"
   183            Description = "cpu percent used by this device"
   184  
   185          # can also iterate over snmp tables
   186          [[MIBs.cisco.Trees]]
   187            BaseOid = ".48.1.1.1" #common base oid for this tree
   188  
   189            # tags to apply to metrics in this tree. Can come from another oid, or specify "idx" to use
   190            # the numeric index as the tag value. Can specify multiple tags, but must supply one.
   191            # all tags and metrics should have the same number of rows per query.
   192            [[MIBs.cisco.Trees.Tags]]
   193              Key = "name"
   194              Oid = ".2"
   195            [[MIBs.cisco.Trees.Metrics]]
   196              Metric = "cisco.mem.used"
   197              Oid = ".5"
   198            [[MIBs.cisco.Trees.Metrics]]
   199              Metric = "cisco.mem.free"
   200              Oid = ".6"
   201  
   202  ICMP (array of table, keys are Host): ICMP hosts to ping.
   203  
   204  	[[ICMP]]
   205  	  Host = "internal-router"
   206  	[[ICMP]]
   207  	  Host = "backup-router"
   208  
   209  Vsphere (array of table, keys are Host, User, Password): vSphere hosts to poll.
   210  
   211  	[[Vsphere]]
   212  	  Host = "vsphere01"
   213  	  User = "vuser"
   214  	  Password = "pass"
   215  
   216  AWS (array of table, keys are AccessKey, SecretKey, Region, BillingProductCodesRegex,
   217  BillingBucketName, BillingBucketPath, BillingPurgeDays): AWS hosts to poll, and associated
   218  billing information.
   219  
   220  To report AWS billing information to OpenTSDB or Bosun, you need to configure AWS to
   221  generate billing reports, which will be put into an S3 bucket. See for more detail:
   222  http://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/detailed-billing-reports.html
   223  
   224  Once the reports are going into the S3, bucket, the Bucket Name and the Prefix Path that
   225  you entered during the report setup need to be entered below. Do not enter a blank bucket
   226  path as this is not supported.
   227  
   228  Reports that are over a certain number of days old are purged by scollector. Set the key
   229  BillingPurgeDays to 0 to disable purging of old reports (not that this may increase your S3
   230  usage costs as all reports are processed each time the collector runs).
   231  
   232  Do not populate the Billing keys if you do not wish to load billing data into OpenTSDB or
   233  Bosun.
   234  
   235  Only products whose name matches the BillingProductCodesRegex key will have their billing
   236  data sent to OpenTSDB or Bosun.
   237  
   238  	[[AWS]]
   239  	  AccessKey = "aoesnuth"
   240  	  SecretKey = "snch0d"
   241  	  Region = "somewhere"
   242  	  BillingProductCodesRegex = "^Amazon(S3|Glacier|Route53)$"
   243  	  BillingBucketName = "mybucket.billing"
   244  	  BillingBucketPath = "reports"
   245  	  BillingPurgeDays = 2
   246  
   247  
   248  AzureEA (array of table, keys are EANumber, APIKey and LogBillingDetails): Azure Enterprise
   249  Agreements to poll for billing information.
   250  
   251  EANumber is your Enterprise Agreement number. You can find this in your Enterprise Agreement portal.
   252  
   253  APIKey is the API key as provided by the Azure EA Portal. To generate your API key for this collector,
   254  you will need to log into your Azure Enterprise Agreement portal (ea.azure.com), click the
   255  "Download Usage" link, then choose "API Key" on the download page. You can then generate your API
   256  key there. Keys are valid 6 months, so you will require some maintenance of this collector twice a year.
   257  
   258  LogBillingDetails tells scollector to add the following tags to your metrics:
   259     - costcenter
   260  	 - accountname
   261  	 - subscription
   262  
   263  LogResourceDetails tell scollector to add the following tags to your metrics:
   264     - resoucegroup
   265  	 - resourcelocation
   266  
   267  LogExtraTags tells scollector to take resource tags and add them to your metrics. Careful: this will
   268  add all tags as they exist in Azure, so you may end up with a large number of distinct tags if you
   269  are not careful. It will not process any tags that begin with "hidden".
   270  
   271  If you are a heavy Azure EA user, then these additional tags may be useful for breaking down costs.
   272  
   273  	[[AzureEA]]
   274  	  EANumber = "123456"
   275  	  APIKey = "joiIiwiaXNzIjoiZWEubWljcm9zb2Z0YXp1cmUuY29tIiwiYXVkIjoiY2xpZW50LmVhLm1"
   276  	  LogBillingDetails = false
   277  	  LogResourceDetails = false
   278  	  LogExtraTags = false
   279  
   280  Process: processes to monitor.
   281  
   282  ProcessDotNet: .NET processes to monitor on Windows.
   283  
   284  See http://bosun.org/scollector/process-monitoring for details about Process and
   285  ProcessDotNet.
   286  
   287  HTTPUnit (array of table, keys are TOML, Hiera): httpunit TOML and Hiera
   288  files to read and monitor. See https://github.com/StackExchange/httpunit
   289  for documentation about the toml file. TOML and Hiera may both be specified,
   290  or just one. Freq is collector frequency as a duration string (default 5m).
   291  
   292  	[[HTTPUnit]]
   293  	  TOML = "/path/to/httpunit.toml"
   294  	  Hiera = "/path/to/listeners.json"
   295  	[[HTTPUnit]]
   296  	  TOML = "/some/other.toml"
   297  	  Freq = "30s"
   298  
   299  Riak (array of table, keys are URL): Riak hosts to poll.
   300  
   301  	[[Riak]]
   302  	  URL = "http://localhost:8098/stats"
   303  
   304  RabbitMQ (array of table, keys are URL): RabbitMQ hosts to poll.
   305  Regardless of config the collector will automatically poll
   306  management plugin on http://guest:guest@127.0.0.1:15672/ .
   307  
   308  	[[RabbitMQ]]
   309  	  URL = "https://user:password@hostname:15671"
   310  
   311  Cadvisor: Cadvisor endpoints to poll.
   312  Cadvisor collects system statistics about running containers.
   313  See https://github.com/google/cadvisor/ for documentation about configuring
   314  cadvisor. You can enable per cpu usage metric reporting optionally, and
   315  optionally use IsRemote to disable block device lookups.
   316  
   317  	[[Cadvisor]]
   318  		URL = "http://localhost:8080"
   319  		PerCpuUsage = true
   320  		IsRemote = false
   321  
   322  RedisCounters: Reads a hash of metric/counters from a redis database.
   323  
   324      [[RedisCounters]]
   325          Server = "localhost:6379"
   326          Database = 2
   327  
   328  Expects data populated via bosun's udp listener in the "scollectorCounters" hash.
   329  
   330  ExtraHop (array of table): ExtraHop hosts to poll. The two filter options specify how
   331  scollector should filter out traffic from being submitted. The valid options are:
   332  
   333  	- namedprotocols (Only protocols that have an explicit name are submitted. The rest of the
   334  					  traffic will be pushed into proto=unnamed. So any protocol that begins with
   335  					  "tcp", "udp" or "SSL" will not be submitted (with the exception of SSL443).
   336  	- toppercent 	 (The top n% of traffic by volume will be submitted. The rest of the traffic
   337  				  	  will be pushed into proto=otherproto)
   338  	- none 			 (All protocols of any size will be submitted)
   339  
   340  FilterPercent applies when the FilterBy option is set to "toppercent". Only protocols that account
   341  for this much traffic will be logged. For example, if this is set to 90, then if the protocol
   342  accounts for less than 10% of the traffic, it will be dropped. This is OK if your traffic is
   343  heavilly dominated by asmall set of protocols, but if you have a fairly even spread of protocols
   344  then this filtering loses its usefulness.
   345  
   346  AdditionalMetrics is formatted as such: [object_type].[object_id].[metric_category].[metric_spec_name]
   347  
   348      - object_type:  is one of: "network", "device", "application", "vlan", "device_group", "activity_group"
   349      - object_id:    can be found by querying the ExtraHop API (through the API Explorer) under the endpoint
   350                      for the object type. For example, for "application", you would query the "/applications/"
   351                      endpoint and locate the ID of the application you want to query.
   352      - metric_category:  can be found in the Metric Catalogue for the metric you are wanting to query. e.g. for
   353                          custom metrics, this is always "custom_detail"
   354      - metric_spec_name: can be found in the Metric Catalogue for the metric you are wanting to query. e.g. for
   355                          custom metrics, this is name you have specified in metricAddDetailCount() function in
   356                          a trigger.
   357  
   358  For these additional metrics, it is expected that the key for the metric is in a keyvalue, comma seperated pair.
   359  This key will be converted into an OpenTSDB tagset. For example, if you have a key of
   360  "client=192.168.0.1,server=192.168.0.9,port=21441", this will be converted into an OpenTSDB tagset of the same
   361  values.
   362  
   363  CAUTION: Do not include unbounded values in your key if you can help it. Putting in something like client IP, or
   364  source/destination port, which are out of your control and specified by people external to your network, could
   365  end up putting millions of different keys into your Bosun instance - something you probably don't want.
   366  
   367  CertificateSubjectMatch and CertificateActivityGroup are used for collecting SSL information from ExtraHop. The
   368  key CertificateSubjectMatch is used to match against the certificate subject. If there is no match, we discard
   369  the certificate record. This is important as certificate subjects are essentially unbound, as EH return all
   370  certificates it sees, regardless of where they originated.
   371  
   372  The key CertificateActivityGroup is the Activity Group you want to pass through to ExtraHop to pull the certificates
   373  from. There is a group called "SSL Servers" which is most likely the group you want to use. You will need to discover
   374  the group number for this group and put it in here.
   375  
   376  	[[ExtraHop]]
   377  	  Host = "extrahop01"
   378  	  APIkey = "abcdef1234567890"
   379  	  FilterBy = "toppercent"
   380  	  FilterPercent = 75
   381      AdditionalMetrics = [ "application.12.custom_detail.my trigger metric" ]
   382  		CertificateSubjectMatch = "example.(com|org|net)"
   383  		CertificateActivityGroup = 46
   384  
   385  LocalListener (string): local_listener will listen for HTTP request and forward
   386  the request to the configured OpenTSDB host while adding defined tags to
   387  metrics.
   388  
   389  	LocalListener = "localhost:4242"
   390  
   391  TagOverride (array of tables, key are CollectorExpr, MatchedTags and Tags): if a collector
   392  name matches CollectorExpr MatchedTags and Tags will be merged to all outgoing message
   393  produced by the collector, in that order. MatchedTags will apply a regexp to the tag
   394  defined by the key name and add tags based on the named match groups defined in the
   395  regexp. After tags defined in Tags will be merged, defining a tag as empty string
   396  will deletes it.
   397  
   398  	[[TagOverride]]
   399  	  CollectorExpr = 'cadvisor'
   400  	  [TagOverride.MatchedTags]
   401  	    docker_name = 'k8s_(?P<container_name>[^\.]+)\.[0-9a-z]+_(?P<pod_name>[^-]+)'
   402  	    docker_id = '^(?P<docker_id>.{12})'
   403  	  [TagOverride.Tags]
   404  	    docker_name = ''
   405  	    source = 'kubelet'
   406  
   407  Oracles (array of table, keys are ClusterName, Instances): Oracle database
   408  instances to poll. The Instances key is an array of table with keys
   409  ConnectionString and Role, which are the same as using sqlplus.
   410  
   411  	[[Oracles]]
   412  	  ClusterName = "oracle rac name"
   413  	  [[Oracles.instances]]
   414  	    ConnectionString = "/"
   415  	    Role = "sysdba"
   416  	  [[Oracles.instances]]
   417  	    ConnectionString = "username/password@oraclehost/sid"
   418  	  [[Oracles.instances]]
   419  	    ConnectionString = "/@localnodevip/sid"
   420  	    Role = "sysdba"
   421  
   422  By default Elastic nodes are auto-detected on localhost:9200, but if you have a
   423  node running on another network interface, a non-standard port or even multiple
   424  nodes running on the same host you can use the Elastic configuration. Also lets
   425  you specify basic auth credentials and using TLS by setting the Scheme to https:
   426  
   427  	[[Elastic]]
   428  	  Host = "192.168.1.1"
   429  	  Port = 9201
   430  	  ClusterInterval = "10s"
   431  	  IndexInterval = "1m"
   432  	  User = "user"
   433  	  Password = "pass"
   434  	  Scheme = "https"
   435  
   436  	[[Elastic]]
   437  	  Host = "192.168.1.1"
   438  	  Port = 9202
   439  	  ClusterInterval = "10s"
   440  	  IndexInterval = "1m"
   441  
   442  Windows
   443  
   444  scollector has full Windows support. It can be run standalone, or installed as a
   445  service (see -winsvc). The Event Log is used when installed as a service.
   446  
   447  
   448  */
   449  package main