github.com/mre-fog/trillianxx@v1.1.2-0.20180615153820-ae375a99d36a/docs/howto/freeze_a_ct_log.md (about)

     1  # How To Freeze a Log (CT Example)
     2  
     3  ## Prerequisites
     4  
     5  Some of the tools and metrics that will be used were added in the `v1.0.8`
     6  release. Ensure that your have upgraded to this release or later. If
     7  using MySQL storage ensure that the database schema is updated to at least
     8  that of `v1.0.8`.
     9  
    10  The `log_signer` process(es) must be exporting metrics so the queue state
    11  and sequencing can be monitored to [check](#monitor-queue--integration)
    12  that all pending entries have been integrated. Check that their
    13  `--http_endpoint` flag is set to an appropriate value. If it's empty then
    14  update the configuration appropriately and restart them before proceeding.
    15  
    16  We will assume that the log tree to be frozen is one that's being used
    17  by Trillian CTFE to serve a Certificate Transparency log. If this is
    18  not the case then consult the documentation for the appropriate application.
    19  
    20  ## Preparation
    21  
    22  ### Find the Log ID
    23  
    24  Obtain the ID of the tree that is backing the log that is to be frozen. This
    25  can be found in the CTFE config file. Locate the section of the config
    26  file that matches the log to be frozen and pull out the value of `log_id`.
    27  For example with the following config the `log_id` to use would be `987654321`.
    28  
    29  ```
    30  config {
    31  	log_id: 987654321
    32  	prefix: "the_name_of_the_log"
    33  	roots_pem_file: "... roots file name ...."
    34  	public_key: {
    35  		der: ".... bytes of the public key"
    36  	}
    37  	private_key: {
    38  		[type.googleapis.com/keyspb.PrivateKey] {
    39  			der: ".... bytes of the private key ...."
    40  		}
    41  	}
    42  }
    43  ```
    44  
    45  ### Setup Environment
    46  
    47  Build the `updatetree` command if this hasn't already been done and ensure
    48  that it is on your `PATH`.
    49  
    50  ```
    51  go install github.com/google/trillian/cmd/updatetree
    52  export PATH=${GOPATH}/bin:$PATH
    53  ```
    54  
    55  Set environment variables to the correct log_id and metrics HTTP endpoint.
    56  For example:
    57  
    58  ```
    59  LOG_ID=987654321
    60  METRICS_URI=http://signer-1:8091/metrics
    61  ```
    62  
    63  ## Set Log Tree To Draining State
    64  
    65  Use `updatetree` to set the log tree to a `DRAINING` state.
    66  
    67  `updatetree --tree_id=${LOG_ID} --tree-state=DRAINING`
    68  
    69  Make sure the above command succeeds. At this point the log will not
    70  accept new entries but there may be some that have already been
    71  submitted but not yet integrated.
    72  
    73  ## Monitor Queue / Integration
    74  
    75  If you have monitoring dashboards showing signer mastership e.g. in
    76  Prometheus then this information might be easily available and you
    77  may already have a global view of the state of all the Trillian
    78  servers in the etcd cluster. For the rest of the document we will assume
    79  that this is not the case.
    80  
    81  The necessary information can be obtained from the raw metrics
    82  that the server exports. Note that it is possible for elections /
    83  resignations or cluster operations to change the signer responsible for a
    84  tree during the following process. So if you are using metrics directly
    85  from servers be aware that this could happen while you're watching the queue.
    86  
    87  Wait until you're sure that the log has finished integrating the
    88  queued leaves. This will be indicated by an incrementing count of
    89  signer runs for the tree, no increase in errors for the tree and zero
    90  leaves being processed for the tree by the signer. The following example
    91  should make this clear.
    92  
    93  ### Find The Signer
    94  
    95  Monitor the statistics available on ${METRICS_URI}. For example:
    96  
    97  `curl ${METRICS_URI} | grep ${LOG_ID} | grep -v delay | grep -v latency | grep -v quota`
    98  
    99  This might produce output similar to this:
   100  
   101  ```
   102  entries_added{logid="987654321"} 54
   103  is_master{logid="987654321"} 1
   104  known_logs{logid="987654321"} 1
   105  master_resignations{logid="987654321"} 7
   106  sequencer_batches{logid="987654321"} 54
   107  sequencer_sequenced{logid="987654321"} 54
   108  sequencer_tree_size{logid="987654321"} 7.095373e+07
   109  signing_runs{logid="987654321"} 54
   110  ```
   111  
   112  First check that `is_master` is not zero. If it is then one of the other
   113  signers is currently handling the tree. Try the command on `signer-2`
   114  or whatever the next cluster member is called until you find the right one.
   115  
   116  ### Wait For The Queue To Drain
   117  
   118  Next check that there is no entry present for `failed_signing_runs` for
   119  the tree. If there is do not proceed until you understand the cause and
   120  confirm that it has been fixed. If signing is failing and this number is
   121  incrementing then the other metrics will not be reliable.
   122  
   123  Then check that `signing_runs` is incrementing for the log along with
   124  `sequencer_batches` and then that `entries_added` and `sequencer_sequenced`
   125  remain static. These are counting the number of leaves integrated into
   126  the log by each signing run. While these values are increasing the queue
   127  is being drained.
   128  
   129  Continue to monitor the output from accessing `${METRICS_URI}` until you
   130  are sure that the queue has been drained for the log. Remember to ensure
   131  that `is_master` remains non zero during this time. If not you may have
   132  to go back and find the currently active signer.
   133  
   134  For additional safety keep watching the metrics for a further number of 
   135  signer runs until you are are that there is no further sequencing activity 
   136  for the log. Because some of the available storage options use queue
   137  sharding (e.g. CloudSpanner) it is not sufficient to rely on no activity
   138  in a single signer run.
   139  
   140  ## Set Log Tree To Frozen State
   141  
   142  **Warning**: Be sure to have completed the queue monitoring process set out
   143  in the previous section. If there are still queued leaves that have not been
   144  integrated then setting the tree to frozen will put the log on a path to 
   145  exceeding its MMD.
   146  
   147  Use `updatetree` to set the log tree to a `FROZEN` state.
   148  
   149  `updatetree --tree_id=${LOG_ID} --tree-state=FROZEN`
   150  
   151  Make sure the above command succeeds. The log is now frozen.