github.com/hyperion-hyn/go-ethereum@v2.4.0+incompatible/docs/Privacy/Tessera/Usage/Monitoring.md (about)

     1  Tessera can be used with InfluxDB and Prometheus time-series databases to record API usage metrics.  The data recorded can be visualised either by creating a custom dashboard or by using an existing dashboarding tool such as Grafana.
     2  
     3  In addition, Tessera logs can be searched, analyzed and monitored using Splunk.  Splunk can be set up in such a way that the logs for multiple Tessera nodes in a network are accessible from a single centralized Splunk instance.
     4  
     5  ## API Metrics
     6  Tessera can record the following usage metrics for each endpoint of its API:
     7  
     8  * Average Response Time
     9  * Max Response Time
    10  * Min Response Time
    11  * Request Count
    12  * Requests Per Second
    13  
    14  These metrics can be stored in an InfluxDB or Prometheus time-series database for further analysis.
    15  
    16  * [InfluxDB](https://www.influxdata.com/time-series-platform/influxdb/) should be used when it is preferred for metrics to be "pushed" from Tessera to the DB (i.e. Tessera starts a service which periodically writes the latest metrics to the DB by calling the DBs API)
    17  * [Prometheus](https://prometheus.io/) should be used when it is preferred for metrics to be "pulled" from Tessera by the DB (i.e. Tessera exposes a `/metrics` API endpoint which the DB periodically calls to fetch the latest metrics)
    18  
    19  Both databases integrate well with the open source dashboard editor [Grafana](https://grafana.com/) to allow for easy creation of dashboards to visualise the data being captured from Tessera.
    20  
    21  ### Using InfluxDB
    22  See the [InfluxDB documentation](https://docs.influxdata.com/influxdb) for details on how to set up an InfluxDB database ready for use with Tessera.  A summary of the steps is as follows:
    23  
    24  1. [Install InfluxDB](https://docs.influxdata.com/influxdb/v1.7/introduction/installation/)
    25  1. Start the InfluxDB server
    26      ```bash
    27      influxd -config /path/to/influx.conf
    28      ```
    29      For local development/testing the default configuration file (Linux: `/etc/influxdb/influxdb.conf`, macOS: `/usr/local/etc/influxdb.conf`), should be sufficient.  For further configuration options see [Configuring InfluxDB](https://docs.influxdata.com/influxdb/v1.7/administration/config/) 
    30  1. Connect to the InfluxDB server using the [`influx` CLI](https://docs.influxdata.com/influxdb/v1.7/tools/shell/) and create a new DB.  If using the default config, this is simply:
    31      ```bash
    32      influx
    33      > CREATE DATABASE myDb
    34      ```
    35  1. To view data stored in the database use the [Influx Query Language](https://docs.influxdata.com/influxdb/v1.7/query_language/)
    36      ```bash
    37      influx
    38      > USE myDb
    39      > SHOW MEASUREMENTS
    40      > SELECT * FROM <measurement>
    41      ```
    42      
    43  !!! info
    44      The InfluxDB HTTP API can be called directly as an alternative to using the `influx` CLI
    45  
    46  Each Tessera server type (i.e. `P2P`, `Q2T`, `ADMIN`, `THIRDPARTY`, `ENCLAVE`) can be configured to store API metrics in an InfluxDB.  These servers can be configured to store metrics to the same DB or separate ones.  Not all servers need to be configured to store metrics.
    47  
    48  To configure a server to use an InfluxDB, add `influxConfig` to the server config.  For example:
    49  
    50  ```json
    51  "serverConfigs": [
    52      {
    53          "app":"Q2T",
    54          "enabled": true,
    55          "serverAddress":"unix:/path/to/tm.ipc",
    56          "communicationType" : "REST",
    57          "influxConfig": {
    58              "serverAddress": "https://localhost:8086",  // InfluxDB server address
    59              "dbName": "myDb",                           // InfluxDB DB name (DB must already exist)
    60              "pushIntervalInSecs": 15,                   // How frequently Tessera will push new metrics to the DB
    61              "sslConfig": {                              // Config required if InfluxDB server is using TLS
    62                  "tls": "STRICT",
    63                  "sslConfigType": "CLIENT_ONLY",
    64                  "clientTrustMode": "CA",
    65                  "clientTrustStore": "/path/to/truststore.jks",
    66                  "clientTrustStorePassword": "password",
    67                  "clientKeyStore": "path/to/truststore.jks",
    68                  "clientKeyStorePassword": "password"
    69              }
    70          }
    71      },
    72      {
    73          "app":"P2P",
    74          "enabled": true,
    75          "serverAddress":"http://localhost:9001",
    76          "communicationType" : "REST",
    77          "influxConfig": {
    78              "serverAddress": "http://localhost:8087",
    79              "dbName": "anotherDb",
    80              "pushIntervalInSecs": 15
    81          }
    82      }
    83  ]
    84  ```
    85  
    86  #### InfluxDB TLS Configuration
    87  InfluxDB supports 1-way TLS.  This allows clients to validate the identity of the InfluxDB server and provides data encryption.  
    88  
    89  See [Enabling HTTPS with InfluxDB](https://docs.influxdata.com/influxdb/v1.7/administration/https_setup/) for details on how to secure an InfluxDB server with TLS.  A summary of the steps is as follows:
    90  
    91  1. Obtain a CA/self-signed certificate and key (either as separate `.crt` and `.key` files or as a combined `.pem` file)
    92  1. Enable HTTPS in `influx.conf`:
    93      ``` bash
    94      # Determines whether HTTPS is enabled.
    95      https-enabled = true
    96   
    97      # The SSL certificate to use when HTTPS is enabled.
    98      https-certificate = "/path/to/certAndKey.pem"
    99      
   100      # Use a separate private key location.
   101      https-private-key = "/path/to/certAndKey.pem"
   102      ```
   103  1. Restart the InfluxDB server to apply the config changes
   104  
   105  To allow Tessera to communicate with a TLS-secured InfluxDB, `sslConfig` must be provided.  To configure Tessera as the client in 1-way TLS:
   106  ```json
   107  "sslConfig": {
   108      "tls": "STRICT",
   109      "sslConfigType": "CLIENT_ONLY",
   110      "clientTrustMode": "CA",
   111      "clientTrustStore": "/path/to/truststore.jks",
   112      "clientTrustStorePassword": "password",
   113      "clientKeyStore": "path/to/truststore.jks",
   114      "clientKeyStorePassword": "password",
   115      "environmentVariablePrefix": "INFLUX"
   116  }
   117  ```
   118  where `truststore.jks` is a Java KeyStore format file containing the trusted certificates for the Tessera client (e.g. the certificate of the CA used to create the InfluxDB certificate).  
   119  
   120  If securing the keystore with a password this password should be provided.  Passwords can be provided either in the config (e.g. `clientTrustStorePassword`) or as environment variables (using `environmentVariablePrefix` and setting `<PREFIX>_TESSERA_CLIENT_TRUSTSTORE_PWD`).  The [TLS Config](../../Configuration/TLS) documentation explains this in more detail.
   121  
   122  As Tessera expects 2-way TLS, a `.jks` file for the `clientKeyStore` must also be provided.  This will not be used so can simply be set as the truststore.
   123  
   124  ### Using Prometheus
   125  The [Prometheus documentation](https://prometheus.io/docs/introduction/overview/) provides all the information needed to get Prometheus setup and ready to integrate with Tessera.  The [Prometheus First Steps](https://prometheus.io/docs/introduction/first_steps/) is a good starting point.  A summary of the steps to store Tessera metrics in a Prometheus DB are as follows:
   126  
   127  1. Install Prometheus
   128  1. Create a `prometheus.yml` configuration file to provide Prometheus with the necessary information to pull metrics from Tessera.  A simple Prometheus config for use with the [7nodes example network](../../../../Getting Started/7Nodes) is:
   129      ```yaml
   130      global:
   131        scrape_interval:     15s
   132        evaluation_interval: 15s 
   133      
   134      scrape_configs:
   135        - job_name: tessera-7nodes
   136          static_configs:
   137            - targets: ['localhost:9001', 'localhost:9002', 'localhost:9003', 'localhost:9004', 'localhost:9005', 'localhost:9006', 'localhost:9007']
   138      ```
   139  1. Start Tessera.  As Tessera always exposes the `metrics` endpoint no additional configuration of Tessera is required
   140  1. Start Prometheus
   141      ```bash
   142      prometheus --config.file=prometheus.yml
   143      ```
   144  1. To view data stored in the database, access the Prometheus UI (by default `localhost:9090`, this address can be changed in `prometheus.yml`) and use the [Prometheus Query Language](https://prometheus.io/docs/prometheus/latest/querying/basics/)
   145  
   146  ### Creating a Grafana dashboard
   147  Grafana can be used to create dashboards from data stored in InfluxDB or Prometheus databases.  See the [Grafana documentation](http://docs.grafana.org/) and [Grafana Getting Started](https://grafana.com/docs/guides/getting_started/) for details on how to set up a Grafana instance and integrate it with databases.  A summary of the steps is as follows:
   148  
   149  1. [Install and start Grafana](https://grafana.com/docs/) as described for your OS (if using the default config, Grafana will start on port `3000` and require login/password `admin/admin` to access the dashboard)
   150  1. Create a Data Source to provide the necessary details to connect to the database 
   151  1. Create a new Dashboard
   152  1. Add panels to the dashboard.  Panels are the graphs, tables, statistics etc. that make up a dashboard. The New Panel wizard allows the components of the panel to be configured:
   153      * Queries: Details the query to use retrieve data from the datasource, see the following links for info on using the Query Editor for [InfluxDB](https://grafana.com/docs/features/datasources/influxdb/) and [Prometheus](https://grafana.com/docs/features/datasources/prometheus/)
   154      * Visualization: How to present the data queried, including panel type, axis headings etc.
   155      
   156  #### Example dashboard
   157  [![example-grafana-dashboard.png](../../../../images/tessera/monitoring/example-grafana-dashboard.png)](../../../../images/tessera/monitoring/example-grafana-dashboard.png)
   158  
   159  To create this dashboard, a [7nodes example network](../../../../Getting Started/7Nodes) was started, with each Tessera node configured to store its `P2P` and `Q2T` metrics to the same InfluxDB.  Several runs of the Quorum Acceptance Tests were run against this network to simulate network activity.  
   160  
   161  As can be seen in the top-right corner, the dashboard was set to only show data collected in the past 15 mins.  
   162  
   163  To create a dashboard similar to this:
   164  
   165  1. Create an InfluxDB datasource within Grafana:
   166      1. Hover over the cog icon in the left sidebar
   167      1. Data Sources
   168      1. Add data source
   169      1. Select the type of DB to connect to (e.g. InfluxDB or Prometheus)
   170      1. Fill out the form to provide all necessary DB connection information, e.g.: 
   171      [![grafana-influxdb-datasource.png](../../../../images/tessera/monitoring/grafana-influxdb-datasource.png)](../../../../images/tessera/monitoring/grafana-influxdb-datasource.png)
   172  
   173  1. Create a new dashboard
   174      1. Hover over the plus icon in the left sidebar
   175      1. Dashboard
   176      1. Add Query to configure the first panel
   177      1. Add Panel in the top-right to add additional panels
   178      [![grafana-new-dashboard.png](../../../../images/tessera/monitoring/grafana-new-dashboard.png)](../../../../images/tessera/monitoring/grafana-new-dashboard.png)
   179  
   180      !!! note
   181          For each of the following examples, additional options such as titles, axis labels and formatting can be configured by navigating the menus in the left-hand sidebar
   182      
   183          [![grafana-panel-sidebar.png](../../../../images/tessera/monitoring/grafana-panel-sidebar.png)](../../../../images/tessera/monitoring/grafana-panel-sidebar.png)
   184  
   185  1. Create *sendRaw requests* panel
   186      1. Select the correct datasource from the *Queries to* dropdown list
   187      1. Construct the query as shown in the below image.  This retrieves the data for the `sendraw` API from the InfluxDB, finds the sum of the `RequestCount` for this data (i.e. the total number of requests) and groups by `instance` (i.e. each Tessera node).  `time($_interval)` automatically scales the graph resolution for the time range and graph width.
   188      [![grafana-send-raw-query.png](../../../../images/tessera/monitoring/grafana-send-raw-query.png)](../../../../images/tessera/monitoring/grafana-send-raw-query.png)
   189  
   190      This panel shows the number of private payloads sent to Tessera using the `sendraw` API over time.
   191  
   192  1. Create *receiveRaw requests* panel
   193      1. Select the correct datasource from the *Queries to* dropdown list
   194      1. Construct the query as shown in the below image.  This retrieves the data for the `receiveraw` API from the InfluxDB, finds the sum of the `RequestCount` for this data (i.e. the total number of requests) and groups by `instance` (i.e. each Tessera node).  `time($_interval)` automatically scales the graph resolution for the time range and graph width.
   195      [![grafana-receive-raw-query.png](../../../../images/tessera/monitoring/grafana-receive-raw-query.png)](../../../../images/tessera/monitoring/grafana-receive-raw-query.png)
   196  
   197      This panel shows the number of private payloads retrieved from Tessera using the `receiveraw` API over time.
   198  
   199  1. Create *partyinfo request rate (Tessera network health)* panel
   200      1. Select the correct datasource from the *Queries to* dropdown list
   201      1. Construct the query as shown in the below image.  This retrieves the data for the `partyinfo` API from the InfluxDB, finds the non-negative derivative of the `RequestCount` for this data and groups by `instance` (i.e. each Tessera node).  `non_negative_derivative(1s)` calculates the per second change in `RequestCount` and ignores negative values that will occur if a node is stopped and restarted.
   202      [![grafana-partyinfo-rate.png](../../../../images/tessera/monitoring/grafana-partyinfo-rate.png)](../../../../images/tessera/monitoring/grafana-partyinfo-rate.png)
   203  
   204      This panel shows the rate of POST requests per second to `partyinfo`. For this network of 7 healthy nodes, this rate fluctuates between 5.5 and 6.5 requests/sec.  At approx 09:37 node 1 was killed and the partyinfo rate across all nodes immediately drops.  This is because they are no longer receiving requests to their `partyinfo` API from node 1.  At 09:41 node 1 is restarted and the rates return to their original values.  
   205      
   206      This metric can be used as an indirect method of monitoring the health of the network.  Using some of the more advanced InfluxDB query options available in Grafana and the other metrics measurements available it may be possible to make this result more explicit. 
   207  
   208      [Alerts and rules](https://grafana.com/docs/alerting/notifications/) can be configured to determine when a node has disconnected and send notifications to pre-configured channels (e.g. Slack, email, etc.).   
   209  
   210  1. Create *sendRaw rate* panel
   211      1. Select the correct datasource from the *Queries to* dropdown list
   212      1. Construct the query as shown in the below image.  This retrieves the data for the `sendraw` API from the InfluxDB, finds the sum of the `RequestRate` for this data and groups by `instance` (i.e. each Tessera node).  `time($_interval)` automatically scales the graph resolution for the time range and graph width.
   213      [![grafana-sendraw-rate-query.png](../../../../images/tessera/monitoring/grafana-sendraw-rate-query.png)](../../../../images/tessera/monitoring/grafana-sendraw-rate-query.png)
   214  
   215      The POST `sendraw` API is used by Quorum whenever a private transaction is sent using the `eth_sendTransaction` or `personal_sendTransaction` API.  This panel gives a good indication of the private tx throughput in Quorum.  Note that if the `sendraw` API is called by another process, the count will not be a true representation of Quorum traffic.
   216  
   217  ## Monitoring a Tessera network with Splunk
   218  Splunk can be used to search, analyze and monitor the logs of Tessera nodes.  
   219  
   220  To consolidate the logs from multiple Tessera nodes in a network requires setting up Splunk and Splunk Universal Forwarders.  The following pages from the Splunk documentation are a good starting point for understanding how to achieve this:
   221  
   222  * [Consolidate data from multiple hosts](http://docs.splunk.com/Documentation/Forwarder/7.1.2/Forwarder/Consolidatedatafrommultiplehosts)
   223  * [Set up the Universal Forwarder](http://docs.splunk.com/Documentation/Splunk/7.1.2/Forwarding/EnableforwardingonaSplunkEnterpriseinstance#Set_up_the_universal_forwarder)
   224  * [Configure the Universal Forwarder](http://docs.splunk.com/Documentation/Forwarder/7.1.2/Forwarder/Configuretheuniversalforwarder)
   225  *  [Enable a receiver](http://docs.splunk.com/Documentation/Forwarder/7.1.2/Forwarder/Enableareceiver)
   226  
   227  The general steps to consolidate the logs for a Tessera network in Splunk are:
   228  
   229  1. Set up a central Splunk instance if one does not already exist.  Typically this will be on a separate host to the hosts running the Tessera nodes.  This is known as the *Receiver*.
   230  1. Configure the Tessera hosts to forward their node's logs to the *Receiver* by:
   231      1. Configuring the format and output location of the node's logs.  This is achieved by configuring logback (the logging framework used by Tessera) at node start-up.  
   232      
   233          The following example XML configures logback to save Tessera's logs to a file.  See the [Logback documentation](https://logback.qos.ch/manual/configuration.html#syntax) for more information on configuring logback:
   234          ``` xml
   235          <?xml version="1.0" encoding="UTF-8"?>
   236             <configuration>            
   237                 <appender name="FILE" class="ch.qos.logback.core.FileAppender">
   238                     <file>/path/to/file.log</file>
   239                     <encoder>
   240                         <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
   241                     </encoder>
   242                 </appender>    
   243                 
   244                 <logger name="org.glassfish.jersey.internal.inject.Providers" level="ERROR" />
   245                 <logger name="org.hibernate.validator.internal.util.Version" level="ERROR" />
   246                 <logger name="org.hibernate.validator.internal.engine.ConfigurationImpl" level="ERROR" />
   247          
   248                 <root level="INFO">
   249                     <appender-ref ref="FILE"/>
   250                 </root>
   251             </configuration>
   252          ```
   253      
   254          To start Tessera with an XML configuration file:
   255           
   256          ``` bash
   257          java -Dlogback.configurationFile=/path/to/logback-config.xml -jar /path/to/tessera-app-<version>-app.jar -configfile /path/to/config.json
   258          ```
   259          
   260      1. Set up Splunk *Universal Forwarders* (lightweight Splunk clients) on each Tessera host to forward log data for their node to the *Receiver*
   261      1. Set up the Splunk *Receiver* to listen and receive logging data from the *Universal Forwarders*
   262  
   263