vitess.io/vitess@v0.16.2/doc/design-docs/TabletRouting.md (about)

     1  This document describes how Vitess routes queries to healthy tablets. It is
     2  meant to be up to date with the current status of the code, and describe
     3  possible extensions we're working on.
     4  
     5  # Current architecture and concepts
     6  
     7  ## Vtgate and SrvKeyspace
     8  
     9  Vtgate receives queries from the client. Depending on the API, it has:
    10  
    11  * The shard to execute the query on.
    12  * The Keyspace Id to use.
    13  * Enough routing information with the VSchema to figure out the Keyspace Id.
    14  
    15  At this point, vtgate has a cache of the SrvKeyspace object for each keyspace,
    16  that contains the shard map. Each SrvKeyspace is specific to a cell. Vtgate
    17  retrieves the SrvKeyspace for the current cell, and uses it to find out the
    18  shard to use.
    19  
    20  The SrvKeyspace object contains the following information about a Keyspace, for
    21  a given cell:
    22  
    23  * The served_from field: for a given tablet type (primary/master, replica, rdonly),
    24    another keyspace is used to serve the data. This is used for vertical
    25    resharding.
    26  * The partitions field: for a given tablet type (primary/master, replica, read-only),
    27    the list of shards to use. This can change when we perform horizontal
    28    resharding.
    29  
    30  Both these fields are cell and tablet type specific, as when we reshard
    31  (horizontally or vertically), we have the option to migrate serving of any type
    32  in any cell forward or backward (with some constraints).
    33  
    34  Note that the VSchema query engine uses this information in slightly different
    35  ways that the older API, as it needs to know if two queries are on the same
    36  shard, for instance.
    37  
    38  ## Tablet health
    39  
    40  Each tablet is responsible for figuring out its own status and health. When a
    41  tablet is considered for serving, a StreamHealth RPC is sent to the tablet. The
    42  tablet in turn returns:
    43  
    44  * Its keyspace, shard and tablet type.
    45  * If it is serving or not (typically, if the query service is running).
    46  * Realtime stats about the tablet, including the replication lag.
    47  * the last time it received a 'TabletExternallyReparented' event (used to break
    48    ties during reparents).
    49  * The tablet alias of this tablet (so the source of the StreamHealth RPC can
    50    check it is the right tablet, and not another table that restarted on the same
    51    host / port, as can happen in container environments).
    52    
    53  That information is updated on a regular basis (every health check interval, and
    54  when something changes), and streamed back.
    55  
    56  ## Discovery module
    57  
    58  The go/vt/discovery module provides libraries to keep track of the health of a
    59  group of tablets.
    60  
    61  As an input, it needs the list of tablets to watch. The TopologyWatcher object
    62  is responsible for finding the tablets to watch. It can watch:
    63  
    64  * All the tablets in a cell. Uses polling of the `tablets/` directory of the
    65    topo service in that cell to figure out a list.
    66  * The tablets for a given cell / keyspace / shard. It polls the ShardReplication
    67    object.
    68  
    69  Tablets to watch are added / removed from the main list kept by the
    70  TabletStatsCache object. It just contains a giant map indexed by cell, keyspace,
    71  shard and tablet type of all the tablets it is in contact with.
    72  
    73  When a tablet is in the list of tablets to watch, this module maintains a
    74  StreamHealth streaming connection to the tablet, as described in the previous
    75  section.
    76  
    77  This module also provides helper methods to find a tablet to route traffic
    78  to. Since it knows the health status of all tablets for a given keyspace / shard
    79  / tablet type, and their replication lag, it can provide a good list of tablets.
    80  
    81  ## Vtgate TabletGateway
    82  
    83  An important component inside vtgate is the TabletGateway. It can send
    84  queries to a tablet by keyspace, shard, and tablet type.
    85  
    86  As mentioned previously, the higher levels inside vtgate can resolve queries to
    87  a keyspace, shard and tablet type. The queries are then passed to the TabletGateway inside vtgate,
    88  to route them to the right tablet.
    89  
    90  TabletGateway combines a set of TopologyWatchers (described in the
    91  discovery section, one per cell) as a source of tablets, a HealthCheck module
    92  to watch their health, and a tabletHealthCheck per tablet to collect all the health
    93  information. Based on this data, it can find the best tablet to use.
    94    
    95  # Extensions, work in progress
    96  
    97  ## Config-based routing
    98  
    99  Another possible extension would be to group all routing options for vtgate in a
   100  configuration (and possibly distribute that configuration in the topology
   101  service). The following parameters now exist in different places:
   102  
   103  * vttablet has a replication delay threshold after which it reports
   104    unhealthy, `-unhealthy_threshold`. Unrelated to vtgate's
   105    `discovery_high_replication_lag_minimum_serving` parameter.
   106  * vttablet has a `-degraded_threshold` parameter after which it shows as
   107    unhealthy in its status page. No impact on serving. Independent from vtgate's
   108    `-discovery_low_replication_lag` parameter, although if they match the user
   109    experience is better.
   110  * vtgate also has a `-min_number_serving_vttablets` that is used to not just
   111    return one tablet and overwhelm it.
   112  
   113  We also now have the capability to route to different Cells in the same
   114  Region. Configuring when to use a different cell in corner cases is hard. If
   115  there is only one tablet with somewhat high replication lag in the current cell,
   116  is it better than up-to-date tablets in other cells?
   117  
   118  A possible solution for this would be a configuration file, that lists what to
   119  do, in order of preference, something like:
   120  
   121  * use local tablets if more than two with replication lag smaller than 30s.
   122  * use remote tablets if all local tablets are above 30s lag.
   123  * use local tablets if lag is lower than 2h, minimum 2.
   124  * ...