github.com/jlmeeker/kismatic@v1.10.1-0.20180612190640-57f9005a1f1a/docs/plan.md

github.com/jlmeeker/kismatic@v1.10.1-0.20180612190640-57f9005a1f1a/docs/plan.md (about)

     1  ## Planning Your Cluster
     2  
     3  To get started with Kubernetes quickly, you can use Kismatic to stand up a small cluster in AWS or virtualized on a personal computer.
     4  
     5  But setting up a proper cluster takes a little forethought. Depending on your intent, you may need to engage multiple teams within your organization to correctly provision the required infrastructure. Planning will also help you identify provisioning tasks and know what infromation will be needed to proceed with installation.
     6  
     7  Planning focuses mainly on three areas of concern:
     8  
     9  * The machines that will form a cluster
    10  * The network the cluster will operate on
    11  * Other services the cluster be interacting with
    12  
    13  ## <a name="compute"></a>Compute resources
    14  
    15  <table>
    16    <tr>
    17      <td>Etcd Nodes <br />
    18  Suggested: 3</td>
    19      <td>1      <b>3</b>     5     7</td>
    20    </tr>
    21    <tr>
    22      <td>Master Nodes <br />
    23  Suggested: 2</td>
    24      <td>1      <b>2</b> </td>
    25    </tr>
    26  </table>
    27  
    28  Kubernetes is installed on multiple physical or virtual machines running Linux. These machines become **nodes** of the Kubernetes **cluster**.
    29  
    30  In a Kismatic installation of Kubernetes, nodes are specialized to one of three distinct roles within the cluster: **etcd**, **master** or **worker**.
    31  
    32  * etcd
    33    * These nodes provide data storage for the master.
    34  * master
    35    * These nodes provide API endpoints and manage the Pods installed on workers.
    36  * worker
    37    * These nodes are where your Pods are instantiated.
    38  
    39  Nodes within a cluster should have latencies between them of 10ms or lower to prevent instability. If you would like to host workloads at multiple data centers, or in a hybrid cloud scenario, you should expect to set up at least one cluster in each geographically seperated region.
    40  
    41  ### Hardware & Operating System
    42  
    43  Infrastructure supported:
    44  
    45  * bare metal
    46  * virtual machines
    47  * AWS EC2
    48  * Packet.net
    49  
    50  If using VMs or IaaS, we suggest avoiding virtualization strategies that rely on the assignment of partial CPUs to your VM. This includes avoiding AWS T2 instances or CPU oversubscription with VMs.
    51  
    52  Operating Systems supported:
    53  
    54  * RHEL 7
    55  * Centos 7
    56  * Ubuntu 16.04
    57  
    58  Minimum hardware requirements:
    59  
    60  <table>
    61    <tr>
    62      <td>Node Role</td>
    63      <td>CPU</td>
    64      <td>Ram</td>
    65      <td>Disk (Prototyping<sup>1</sup>)</td>
    66      <td>Disk (Production<sup>1</sup>)</td>
    67    </tr>
    68    <tr>
    69      <td>etcd</td>
    70      <td>1 CPU Core, 2 GHz</td>
    71      <td>1 GB</td>
    72      <td>8 GB</td>
    73      <td>50 GB</td>
    74    </tr>
    75    <tr>
    76      <td>master</td>
    77      <td>1 CPU Cores, 2 GHz</td>
    78      <td>2 GB</td>
    79      <td>8 GB</td>
    80      <td>50 GB</td>
    81    </tr>
    82    <tr>
    83      <td>worker</td>
    84      <td>1 CPU Core, 2 GHz</td>
    85      <td>1 GB</td>
    86      <td>8 GB</td>
    87      <td>200 GB</td>
    88    </tr>
    89  </table>
    90  
    91  <sup>1</sup>A Prototype cluster is one you build for a short term use case (less than a week or so). It can have smaller drives, but you wouldn't want to run like this for extended use.
    92  
    93  [Recommended Master sizing:](http://kubernetes.io/docs/admin/cluster-large/#size-of-master-and-master-components)
    94  
    95  Worker Count | CPUs | RAM
    96  ---          | ---  | ---
    97  < 5          | 1    | 3.75
    98  < 10	       | 2	  | 7.5
    99  < 100	       | 4	  | 15
   100  < 250	       | 8	  | 30
   101  < 500	       | 16	  | 30
   102  < 1000	     | 32	  | 60
   103  
   104  ### Swap Memory
   105  Kubernetes nodes must have swap memory disabled. Otherwise, the Kubelet will fail
   106  to start. If you want to run your Kubernetes nodes with swap memory enabled, you
   107  must override the Kubelet configuration to disable the swap check:
   108  
   109  ```
   110  cluster:
   111    # ... 
   112    kubelet:
   113      option_overrides:
   114        fail-swap-on: false
   115  ```
   116  
   117  ### Planning for etcd nodes:
   118  
   119  Each etcd node receives all the data for a cluster to help protect against data loss in the event that something happens to one of the nodes. A Kubernetes cluster is able to operate as long as more than 50% of its etcd nodes are online. Always use an odd number of etcd nodes. Count of etcd nodes is primarily an availability concern, as adding etcd nodes can decrease Kubernetes performance.
   120  
   121  <table>
   122    <tr>
   123      <td>Node Count</td>
   124      <td>Safe for</td>
   125    </tr>
   126    <tr>
   127      <td>1</td>
   128      <td>Unsafe. Use only for small development clusters</td>
   129    </tr>
   130    <tr>
   131      <td>3</td>
   132      <td>Failure of any one node</td>
   133    </tr>
   134    <tr>
   135      <td>5</td>
   136      <td>Simultaneous failure of two nodes</td>
   137    </tr>
   138    <tr>
   139      <td>7</td>
   140      <td>Simultaneous failure of three nodes</td>
   141    </tr>
   142  </table>
   143  
   144  ### Planning for master nodes:
   145  
   146  Master nodes provide API endpoints and keep Kubernetes workloads running. A Kubernetes cluster is able to operate as long as one of its master nodes is online. We suggest at least two master nodes for availability.
   147  
   148  <table>
   149    <tr>
   150      <td>Node Count</td>
   151      <td>Safe for</td>
   152    </tr>
   153    <tr>
   154      <td>1</td>
   155      <td>Unsafe. Use only for small development clusters.</td>
   156    </tr>
   157    <tr>
   158      <td>2</td>
   159      <td>Failure of any one node.</td>
   160    </tr>
   161  </table>
   162  
   163  Both users of Kubernetes and Kubernetes itself occasionally attempt to communicate with master via a URL. With two or more masters, we suggest introducing a load balanced url (via a virtual IP or DNS CNAME). This is required to allow clients and components with Kubernetes to balance between them or to provide uninterrupted operation in the event that a master node goes offline.
   164  
   165  ### Planning for worker nodes:
   166  
   167  Worker nodes are where your applications will run. your initial worker count should be large enough to hold all the workloads you intend to deploy to it plus enough slack to handle a partial failure. You can add more as necessary after the initial setup without interrupting operation of the cluster.
   168  
   169  ## Network
   170  
   171  <table>
   172    <tr>
   173      <td>Networking Technique</td>
   174      <td>Routed<br />
   175  Overlay</td>
   176    </tr>
   177    <tr>
   178      <td>How hostnames will be resolved for nodes</td>
   179      <td>Use DNS<br />
   180  Let Kismatic Manage Hosts Files on nodes</td>
   181    </tr>
   182    <tr>
   183      <td>Network Policy Control</td>
   184      <td>No network policy<br />
   185  Calico-managed network policy</td>
   186    </tr>
   187    <tr>
   188      <td>Pod Network CIDR Block</td>
   189      <td></td>
   190    </tr>
   191    <tr>
   192      <td>Services Network CIDR Block</td>
   193      <td></td>
   194    </tr>
   195    <tr>
   196      <td>Load-Balanced URL for Master Nodes</td>
   197      <td></td>
   198    </tr>
   199  </table>
   200  
   201  
   202  Kubernetes allocates a unique IP address for every Pod created on a cluster. Within a cluster, all Pods are visible to all other Pods and directly addressable by IP, simplifying point to point communications.
   203  
   204  Similarly, Kubernetes uses a special network for Services, allowing them to talk to each other via an address that is stable even as the underlying cluster topology changes.
   205  
   206  For this to work, Kubernetes makes use of technologies built in to Docker and Linux (including iptables, bridge networking and the Container Networking Interface). We tie these together with a network technology from Tigera networks called Calico.
   207  
   208  ### Pod and Service CIDR blocks
   209  
   210  To provide these behaviors, Kubernetes needs to be able to issue IP addresses from two IP ranges: a **pod network** and a **services network**. This is in addition to the IP addresses nodes will be assigned on their **local network**.
   211  
   212  The pod and service network ranges each need to be assigned a single contiguous CIDR block large enough to handle your workloads and any future scaling. With Calico, Worker and Master nodes are assigned IP addresses for allocation in blocks of 64 IPs; newly created pods will receive an address from this block until all IPs are consumed, at which point an additional block will be allocated to the node.
   213  
   214  Thus, your pod network must be sized so that:
   215  
   216  `Pod Network IP Block Size >= (Worker Node Count + Master Node Count) * 64`
   217  
   218  Our default CIDR block for a pod is **172.16.0.0/16**, which would allow for a maximum of roughly 65k pods in total or roughly 1000 nodes with 64 pods per node or fewer.
   219  
   220  Similarly, the service network needs to be large enough to handle all of the Services that might be created on the cluster. Our default is **172.20.0.0/16**, which would allow for 65k services and that ought to be enough for anybody.
   221  
   222  Care should be taken that the IP addresses under management by Kubernetes do not collide with IP addresses on the local network, including omitting these ranges from control of  DHCP.
   223  
   224  ### Pod Networking
   225  
   226  There are two techniques we support for pod networking on Kubernetes: **overlay** and **routed**.
   227  
   228  In an **overlay** network, communications between pods happen on a virtual network that is only visible to machines that are running an agent. This agent communicates with other agents via the node's local network and establishes IP-over-IP tunnels through which Kubernetes Pod traffic is routed.
   229  
   230  In this model, no work has to be done to allow pods to communicate with each other (other than ensuring that you are not blocking IP-over-IP traffic). Two or more Kubernetes clusters might even operate on the same pod and services IP ranges, without being able to see each others’ traffic.
   231  
   232  However, work does need to be done to expose pods to the local network. This role is usually filled by a Kubernetes Ingress Controller.
   233  
   234  Overlay networks work best for development clusters.
   235  
   236  In a **routed** network, communications between pods happen on a network that is accessible to all machines on their local network. In this model, each node acts as a router for the ip ranges of the pods that it hosts. The cluster communicates with existing network routers via BGP to establish the responsibility of nodes for routing these addresses. Once routing is in place, a request to a pod or service IP is treated the same as any other request on the network. There is no tunnel or IP wrapping involved. This may also make it easier to inspect traffic via tools like wire shark and tcpdump.
   237  
   238  In a routed model, cluster communications often work out of the box. Sometimes routers need to be configured to expect and acknowledge BGP messages from a cluster.
   239  
   240  Routed networks work best when you want a majority of your workloads to be automatically visible to clients that aren't on Kubernetes, including other systems on the local network.
   241  
   242  Sometimes, it is valuable to peer nodes in the cluster with a network router that is physically near to them. For this purpose, the cluster announces its BGP messages with an **AS Number** that may be specified when Kubernetes is installed. Our default AS Number is 64511.
   243  
   244  ### Pod Network Policy Enforcement
   245  
   246  By default, Pods can talk to any port and any other Pod, Service or node on its network. Pod to pod network access is a requirement of Kubernetes, but this degree of openness is not.
   247  
   248  When policy is enabled, access to all Pods is restricted and managed in part by Kubernetes and the Calico networking plugin. When adding new Pods, any ports that are identified within the definition will be made accessible to other pods. Access can be further opened or closed using the Calico command line tools installed on every Master node -- for example, you may grant access to a pod, or a namespace of pods, to a developer’s machine.
   249  
   250  Network policy is an experimental feature that can make prototyping the cluster more difficult. It’s turned off by default.
   251  
   252  ### DNS & Load Balancing
   253  
   254  All nodes in the cluster will need a short name with which they can communicate with each other. DNS is one way to provide this.
   255  
   256  It's also valuable to have a load balanced alias for the master servers in your cluster, allowing for transparent failover if a master node goes offline. This can be performed either via DNS load balancing or via a Virtual IP if your network has a load balancer already. Pick a FQDN and short name for this alias to master that defines your cluster's intent -- for example, if this is the only Kubernetes cluster on your network, [kubernetes.yourdomain.com](http://kubernetes.yourdomain.com) would be ideal.
   257  
   258  If you do not wish to run DNS, you may optionally allow the Kismatic installer to manage hosts files on all of your nodes. Be aware that this option will not scale beyond a few dozen nodes, as adding or removing nodes through the installer will force a hosts file update to all nodes on the cluster.
   259  
   260  ### Firewall Rules
   261  
   262  Kubernetes must be allowed to manage network policy for any IP range it manages.
   263  
   264  Network policies for the local network on which nodes reside will need to be set up prior to construction of the cluster, or installation will fail.
   265  
   266  <table>
   267    <tr>
   268      <td><b>Purpose for rule</b></td>
   269      <td><b>Target node types</b></td>
   270      <td><b>Source IP range</b></td>
   271      <td><b>Allow Rules</b></td>
   272    </tr>
   273    <tr>
   274      <td>To allow communication with the kismatic inspector</td>
   275      <td>all</td>
   276      <td>installer node</td>
   277      <td>tcp:8888</td>
   278    </tr>
   279    <tr>
   280      <td>To allow acces to the API server</td>
   281      <td>worker</td>
   282      <td>worker nodes<br/>
   283          master nodes<br/>
   284          The IP ranges of any machines you want to be able to manage Kubernetes workloads </td>
   285      <td>tcp:6443</td>
   286    </tr>
   287    <tr>
   288      <td>To allow all internal traffic between Kubernetes nodes</td>
   289      <td>all</td>
   290      <td>All nodes in the Kubernetes cluster</td>
   291      <td>tcp:0-65535<br />
   292  udp:0-65535</td>
   293    </tr>
   294    <tr>
   295      <td>To allow SSH</td>
   296      <td>all</td>
   297      <td>worker nodes<br/>
   298          master nodes<br/>
   299          The IP ranges of any machines you want to be able to manage Kubernetes nodes
   300      <td>tcp:22</td>
   301    </tr> 
   302    <tr>
   303    	<td>To allow communications between ETCD nodes</td>
   304      <td>etcd</td>
   305      <td>etcd nodes</td>
   306      <td>tcp:2380<br/>
   307   tcp:6660</td>
   308    </tr>
   309    <tr>
   310    	<td>To allow communications between Kubernetes nodes and ETCD</td>
   311      <td>etcd</td>
   312      <td>master nodes</td>
   313      <td>tcp:2379</td>
   314    </tr>
   315    <td>To allow communications between Calico networking and ETCD</td>
   316      <td>etcd</td>
   317      <td>etcd nodes</td>
   318      <td>tcp:6666</td>
   319    </tr>
   320  </table>
   321  
   322  ## Certificates and Keys
   323  
   324  <table>
   325    <tr>
   326      <td>Expiration period for certificates<br/>
   327  <i>default 17520h</i></td>
   328      <td></td>
   329    </tr>
   330  </table>
   331  
   332  Kismatic will automate generation and installation of TLS certificates and keys used for intra-cluster security. It does this using the open source CloudFlare SSL library. These certificates and keys are exclusively used to encrypt and authorize traffic between Kubernetes components; they are not presented to end-users.
   333  
   334  The default expiry period for certificates is **17520h** (2 years). Certificates must be updated prior to expiration or the cluster will cease to operate without warning. Replacing certificates will cause momentary downtime with Kubernetes as of version 1.4; future versions should allow for certificate "rolling" without downtime.