github.com/zorawar87/trillian@v1.2.1/quota/etcd/README.md

github.com/zorawar87/trillian@v1.2.1/quota/etcd/README.md (about)

     1  # Etcd quotas
     2  
     3  Package etcd (and its subpackages) contain an etcd-based
     4  [quota.Manager](https://github.com/google/trillian/blob/3cf59cdfd0/quota/quota.go#L101)
     5  implementation, with a corresponding REST-based configuration service.
     6  
     7  ## Usage
     8  
     9  First, ensure both `logserver` and `logsigner` are started with the
    10  `--etcd_servers` and `--quota_system=etcd` flags, in addition to other flags.
    11  `logserver` must also be started with a non-empty `--http_endpoint` flag, so the
    12  REST quota API can be bound.
    13  
    14  For example:
    15  
    16  ```bash
    17  trillian_log_server \
    18    --etcd_servers=... \
    19    --http_endpoint=localhost:8091 \
    20    --quota_system=etcd
    21  
    22  trillian_log_signer --etcd_servers=... --quota_system=etcd
    23  ```
    24  
    25  If correctly started, the servers will be using etcd quotas. The default
    26  configuration is empty, which means no quotas are enforced.
    27  
    28  The REST quota API may be used to create and update configurations.
    29  
    30  For example, the command below creates a sequencing-based, `global/write` quota.
    31  Assuming an expected sequencing performance of 50 QPS, the `max_tokens`
    32  specified below implies a backlog of 4h.
    33  
    34  ```bash
    35  curl \
    36    -d '@-' \
    37    -s \
    38    -H 'Content-Type: application/json' \
    39    -X POST \
    40    'localhost:8091/v1beta1/quotas/global/write/config' <<EOF
    41  {
    42    "name": "quotas/global/write/config",
    43    "config": {
    44      "state": "ENABLED",
    45      "max_tokens": 288000,
    46      "sequencing_based": {
    47      }
    48    }
    49  }
    50  EOF
    51  ```
    52  
    53  To list all configured quotas, run:
    54  
    55  ```bash
    56  curl 'localhost:8091/v1beta1/quotas?view=FULL'
    57  ```
    58  
    59  Quotas may be retrieved individually or via a series of filters, updated and
    60  deleted through the REST API as well. See
    61  [quotapb.proto](https://github.com/google/trillian/blob/master/quota/etcd/quotapb/quotapb.proto)
    62  for an in-depth description of entities and available methods.
    63  
    64  ### Maintenance and token exhaustion
    65  
    66  During regular system operation, no quota-related maintenance should be
    67  required, as the system should generate at least as many tokens as it spends.
    68  
    69  If token exhaustion occurs, there are a few built-in mechanisms that allow for
    70  manual intervention. The question of whether intervention is needed, though, is
    71  an important one and should be answered before any attempts are made to bypass
    72  the system. For example:
    73  
    74  * is the `logsigner` working properly and able to keep with the current demand?
    75  * is there a spike in requests that may justify the current token exhaustion?
    76  
    77  For "genuine" token exhaustion (i.e. the system really is under a load it can't
    78  cope with), it may be beneficial to let the quota system deny requests until
    79  regular operation is resumed.
    80  
    81  That said, the sections below describe actions that may taken to deal with token
    82  exhaustion. All examples use `global/read` as the quota in question; substitute
    83  the name as appropriate.
    84  
    85  #### Resetting quotas
    86  
    87  Resetting a quota restores its current token count to the configured
    88  `max_tokens` value.
    89  
    90  ```bash
    91  curl -X PATCH \
    92    'localhost:8091/v1beta1/quotas/global/read/config?reset_quota=true'
    93  ```
    94  
    95  #### Disabling quotas
    96  
    97  Disabling a quota makes it inactive, effective immediately. Disabled quotas may
    98  be enabled again with a similar update (changing "DISABLED" to "ENABLED").
    99  
   100  ```bash
   101  curl \
   102    -d '@-' \
   103    -s \
   104    -H 'Content-Type: application/json' \
   105    -X PATCH \
   106    'localhost:8091/v1beta1/quotas/global/read/config' <<EOF
   107  {
   108    "config": {
   109      "state": "DISABLED"
   110    },
   111    "update_mask": ["state"]
   112  }
   113  EOF
   114  ```
   115  
   116  #### Deleting quotas
   117  
   118  Permanently deletes a quota. Consider disabling for a temporary solution.
   119  
   120  ```bash
   121  curl -X DELETE 'localhost:8091/v1beta1/quotas/global/read/config'
   122  ```
   123  
   124  ### Flags
   125  
   126  The following flags apply to etcd quotas:
   127  
   128  * [--quota_dry_run](https://github.com/google/trillian/blob/3cf59cdfd0/server/trillian_log_server/main.go#L61)
   129    (log and map servers)
   130  * [--quota_increase_factor](https://github.com/google/trillian/blob/3cf59cdfd0/server/trillian_log_signer/main.go#L60)
   131    (logsigner)
   132  * [quota_max_cache_entries](https://github.com/google/trillian/blob/c0a332878f/server/trillian_log_server/main.go#L71)
   133    (log and map servers)
   134  * [quota_min_batch_size](https://github.com/google/trillian/blob/c0a332878f/server/trillian_log_server/main.go#L69)
   135    (log and map servers)
   136  
   137  `--quota_dry_run`, when set to true, stops quota depletion from blocking
   138  requests. This applies to all quotas, so it's only recommended in early
   139  evaluations of the quota system.
   140  
   141  `--quota_increase_factor` is related to token leakage protection. It applies
   142  only to sequencing-based quotas. If `--quota_increase_factor` is 1, each new
   143  leaf sequenced by `logsigner` restores exactly one token. If it's higher than 1,
   144  more tokens are restored per leaf batch. A value slightly higher than 1 (e.g.
   145  1.1) is recommended, so there is some protection against token leakage without
   146  too much compromise of the quota system in exceptional situations.
   147  
   148  `--quota_max_cache_entries` and `--quota_min_batch_size` are related to token
   149  caching. Some level of token caching (i.e. both flags having values > 0) is
   150  recommended to lessen the latency impact of rate limiting.
   151  
   152  `--quota_min_batch_size` is the minimum number of tokens acquired from etcd. If
   153  a particular request demands fewer tokens than the minimal batch size, the
   154  remaining tokens are kept in memory, potentially saving new requests to etcd
   155  until those are consumed.
   156  
   157  `--quota_max_cache_entries` determines how many quota Specs are cached. Tokens
   158  are cached per Spec using a LRU replacement policy. In case of systems with a
   159  high number of trees or users, the least used ones are evicted from the cache
   160  (and their tokens returned).
   161  
   162  ### Monitoring
   163  
   164  The following metrics are relevant when considering quota behavior:
   165  
   166  * [interceptor_request_count](https://github.com/google/trillian/blob/3cf59cdfd0/server/interceptor/interceptor.go#L91)
   167  * [interceptor_request_denied_count](https://github.com/google/trillian/blob/3cf59cdfd0/server/interceptor/interceptor.go#L95)
   168  * [quota_acquired_tokens](https://github.com/google/trillian/blob/3cf59cdfd0/quota/metrics.go#L70)
   169  * [quota_returned_tokens](https://github.com/google/trillian/blob/3cf59cdfd0/quota/metrics.go#L71)
   170  * [quota_replenished_tokens](https://github.com/google/trillian/blob/3cf59cdfd0/quota/metrics.go#L71)
   171  
   172  Requests denied due to token shortage are labeled on
   173  **interceptor_request_denied_count** as
   174  [insufficient_tokens](https://github.com/google/trillian/blob/3cf59cdfd0/server/interceptor/interceptor.go#L38).
   175  The ratio between **denied_with_insufficient_tokens** and
   176  **interceptor_request_count** is a strong indicator of token exhaustion.
   177  
   178  ## General concepts
   179  
   180  Trillian quotas have a finite number of tokens that get consumed by requests.
   181  Once a quota reaches zero tokens, all requests that would otherwise consume a
   182  token from it will fail with a **resource_exhausted** error. Tokens are
   183  replenished by different mechanisms, depending on the quota configuration (e.g,
   184  X tokens every Y seconds).
   185  
   186  Quotas are designed so that a set of quotas, in different levels of granularity,
   187  apply to a single request.
   188  
   189  A quota
   190  [Spec](https://github.com/google/trillian/blob/3cf59cdfd0/quota/quota.go#L56)
   191  identifies a particular quota and represents to which requests it applies. Specs
   192  contain a
   193  [Group](https://github.com/google/trillian/blob/3cf59cdfd0/quota/quota.go#L27)
   194  (`global`, `tree` and `user`) and
   195  [Kind](https://github.com/google/trillian/blob/3cf59cdfd0/quota/quota.go#L44)
   196  (`read` or `write`).
   197  
   198  A few Spec examples are:
   199  
   200  * `global/read` (all read requests)
   201  * `global/write` (all write requests)
   202  * `trees/123/write` (write requests for tree 123)
   203  * `users/alice/read` (read requests made by user "alice")
   204  
   205  Each request, depending on whether it's a read or write request, subtracts
   206  tokens from the following Specs:
   207  
   208  | read requests  | write requests  |
   209  | -------------- | --------------- |
   210  | users/$id/read | users/$id/write |
   211  | trees/$id/read | trees/$id/write |
   212  | global/read    | global/write    |
   213  
   214  Quotas that aren't explicitly configured are considered infinite and won't block
   215  requests.
   216  
   217  ## Etcd quotas
   218  
   219  Etcd quotas implement the concepts described above by storing the quota
   220  configuration and token count in etcd.
   221  
   222  Two replenishment mechanisms are available: sequencing-based and time-based.
   223  
   224  Sequencing-based replenishment is tied to `logsigner's` progress. A token is
   225  restored for each leaf sequenced from the `Unsequenced` table. As such, it's
   226  only applicable to `global/write` and `trees/write` quotas.
   227  
   228  Time-based sequencing replenishes X tokens every Y seconds. It may be applied to
   229  all quotas.
   230  
   231  ### MMD protection
   232  
   233  Sequencing-based quotas may be used as a form of MMD protection. If the number
   234  of write requests accepted by Trillian going beyond the `logsigner's` configured
   235  processing capability, tokens will eventually get exhausted and the system will
   236  fail new write requests with a **resource_exhausted** error. While not ideal,
   237  this helps avoid an eventual MMD loss, which may be a graver offense than
   238  temporary loss of availability.
   239  
   240  Both `global/write` and `trees/write` quotas may be used for MMD protection
   241  purposes. It's strongly recommended that `global/write` is set up as a last line
   242  of defense for all systems.
   243  
   244  ### QPS limits
   245  
   246  Time-based quotas effectively work as QPS (queries-per-second) limits (X tokens
   247  in Y seconds is roughly equivalent to X/Y QPS).
   248  
   249  All quotas may be configured as time-based, but they may be particularly useful
   250  as per-tree (e.g. limiting test or archival trees) or as per-user.
   251  
   252  ### Default quotas
   253  
   254  Default quotas are pre-configured limits that get automatically applied to new
   255  trees or users.
   256  
   257  TODO(codingllama): Default quotas are not yet implemented.
   258  
   259  ### Quota users
   260  
   261  User level quotas are applied to "quota users". Trillian makes no assumptions
   262  about what a quota user is. Therefore, initially, there's a single default user
   263  that is charged for all requests (note that, since no quotas are created by
   264  default, this user charges quotas that are effectively infinite).