github.com/cockroachdb/cockroach@v20.2.0-alpha.1+incompatible/docs/RFCS/20150820_store_pool.md (about)

     1  - Feature Name: Store Pool
     2  - Status: completed
     3  - Start Date: 2015-08-20
     4  - RFC PR: [#2286](https://github.com/cockroachdb/cockroach/pull/2286),
     5            [#2336](https://github.com/cockroachdb/cockroach/pull/2336)
     6  - Cockroach Issue: [#2149](https://github.com/cockroachdb/cockroach/issues/2149),
     7                     [#620](https://github.com/cockroachdb/cockroach/issues/620)
     8  
     9  # Summary
    10  
    11  Add a new `StorePool` service on each node that monitors all the stores and
    12  reports on their current status and health. Based on just a store ID, the
    13  pool will report the health of the store. Initially this health will only
    14  be if the store is dead or not, but will expand to include other factors in the
    15  future. This will also be the ideal location to add any calculations about
    16  which store would be best suited to take on a new replica, subsuming some of
    17  the work from the allocator.
    18  
    19  This new service will work perfectly with #2153 and #2171
    20  
    21  # Motivation
    22  
    23  The decisions about when to add/remove replicas for rebalancing and repairing
    24  require the knowledge about the health of other stores. There needs to be a
    25  local source of truth for those decisions.
    26  
    27  # Detailed design
    28  
    29  ## Configuration
    30  Add a new configuration setting called `TimeUntilStoreDead` which contains
    31  the number of seconds after which if a store was not heard from, it is
    32  considered dead. The default value for this will be 5 minutes.
    33  
    34  ## Monitor
    35  Add a new service called `StorePool` that starts when the node is started.
    36  This new service will run until the stopper is called and have access to
    37  gossip.
    38  
    39  `StorePool` will maintain a map of store IDs to store descriptors and a variety
    40  of heath statistic about the store. It will also maintain a `lastUpdatedTime`
    41  which will be set whenever a store descriptor is updated. When this happens,
    42  if the store was previously marked as dead, it will restored. To maintain this
    43  map, a callback from gossip for store descriptors will be added. When this
    44  `lastUpdatedTime` is longer than the `TimeUntilStoreDead`, the store is
    45  considered dead and any replicas on this store may be removed. Note that that
    46  the work to remove replicas is performed elsewhere.
    47  
    48  Monitor will maintain a timespan `timeUntilNextDead` which is calculated by
    49  taking the nearest `lastUpdatedTime` for all the stores and adding
    50  `TimeUntilStoreDead` and the store ID associated with the timeout.
    51  
    52  Monitor will trigger on `timeUntilNextDead` which when triggered checks to see
    53  if that store has not been updated.
    54  If the store hasn't been updated, it will mark it as dead.
    55  Then it will calculate the next `timeUntilNextDead` to wake up the service.
    56  
    57  # Drawbacks
    58  
    59  Can't think of any right now. Perhaps that we're adding a new service, but it
    60  should be very lightweight.
    61  
    62  # Alternatives
    63  
    64  1. Instead of creating a new store monitoring service, add all of this into
    65     gossip. Gossip already has most of the store information in it.
    66     - Gossip requires a good refactoring and I can see this as one of the first
    67     steps to do so. This will mean that you can get store lists from somewhere
    68     other than gossip and it will include more details as well. Adding these
    69     calculations into gossip seems cumbersome.
    70  2. Instead of creating a store pool, create a node pool. This will allow each
    71     node to choose which store of theirs the new range should be assigned to and
    72     give the nodes more control over their internal systems.
    73     - Right now, this is the wrong direction, but in the longer team, giving the
    74     node more control might simplify the decision making for allocations,
    75     repairs and rebalances and each node can report their own version of
    76     capacity and free space.
    77  
    78  # Unresolved questions
    79  
    80  If RFCs #2153 and #2171 aren't implemented, should we consider another option?
    81  
    82