sigs.k8s.io/cluster-api@v1.7.1/docs/proposals/20200330-spot-instances.md (about) 1 --- 2 title: Add support for Spot Instances 3 authors: 4 - "@JoelSpeed" 5 reviewers: 6 - "@enxebre" 7 - "@vincepri" 8 - "@detiber" 9 - "@ncdc" 10 - "@CecileRobertMichon" 11 - "@randomvariable" 12 creation-date: 2020-03-30 13 last-updated: 2020-03-30 14 status: provisional 15 see-also: 16 replaces: 17 superseded-by: 18 --- 19 20 # Add support for Spot Instances 21 22 ## Table of contents 23 24 <!-- START doctoc generated TOC please keep comment here to allow auto update --> 25 <!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --> 26 27 - [Glossary](#glossary) 28 - [Summary](#summary) 29 - [Motivation](#motivation) 30 - [Goals](#goals) 31 - [Non-Goals/Future Work](#non-goalsfuture-work) 32 - [Proposal](#proposal) 33 - [User Stories](#user-stories) 34 - [Story 1](#story-1) 35 - [Story 2](#story-2) 36 - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) 37 - [Cloud Provider Implementation Specifics](#cloud-provider-implementation-specifics) 38 - [AWS](#aws) 39 - [Launching instances](#launching-instances) 40 - [GCP](#gcp) 41 - [Launching instances](#launching-instances-1) 42 - [Azure](#azure) 43 - [Launching Instances](#launching-instances) 44 - [Deallocation](#deallocation) 45 - ['Interruptible' label](#interruptible-label) 46 - [Future Work](#future-work) 47 - [Termination handler](#termination-handler) 48 - [Support for MachinePools](#support-for-machinepools) 49 - [Risks and Mitigations](#risks-and-mitigations) 50 - [Control-Plane instances](#control-plane-instances) 51 - [Cloud Provider rate limits](#cloud-provider-rate-limits) 52 - [Alternatives](#alternatives) 53 - [Reserved Instances](#reserved-instances) 54 - [Upgrade Strategy](#upgrade-strategy) 55 - [Additional Details](#additional-details) 56 - [Non-Guaranteed instances](#non-guaranteed-instances) 57 - [AWS Spot Instances](#aws-spot-instances) 58 - [Spot backed Autoscaling Groups](#spot-backed-autoscaling-groups) 59 - [Spot Fleet](#spot-fleet) 60 - [Singular Spot Instances](#singular-spot-instances) 61 - [Other AWS Spot features of note](#other-aws-spot-features-of-note) 62 - [Stop/Hibernate](#stophibernate) 63 - [Termination Notices](#termination-notices) 64 - [Persistent Requests](#persistent-requests) 65 - [GCP Preemptible instances](#gcp-preemptible-instances) 66 - [Instance Groups](#instance-groups) 67 - [Single Instance](#single-instance) 68 - [Limitations of Preemptible](#limitations-of-preemptible) 69 - [24 Hour limitation](#24-hour-limitation) 70 - [Shutdown warning](#shutdown-warning) 71 - [Azure Spot VMs](#azure-spot-vms) 72 - [Scale Sets](#scale-sets) 73 - [Single Instances](#single-instances) 74 - [Important Spot VM notes](#important-spot-vm-notes) 75 - [Termination Notices](#termination-notices-1) 76 - [Eviction Policies](#eviction-policies) 77 - [Implementation History](#implementation-history) 78 79 <!-- END doctoc generated TOC please keep comment here to allow auto update --> 80 81 ## Glossary 82 83 Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html). 84 85 ## Summary 86 87 Enable Cluster API users to leverage cheaper, non-guaranteed instances to back Cluster API Machines across multiple cloud providers. 88 89 ## Motivation 90 91 Allow users to cut costs of running Kubernetes clusters on cloud providers by moving interruptible workloads onto non-guaranteed instances. 92 93 ### Goals 94 95 - Provide sophisticated provider-specific automation for running Machines on non-guaranteed instances 96 97 - Utilise as much of the existing Cluster API as possible 98 99 ### Non-Goals/Future Work 100 101 - Any logic for choosing instances types based on availability from the cloud provider 102 103 - A one to one map for each provider available mechanism for deploying spot instances, e.g. aws fleet. 104 105 - Support Spot instances via MachinePool for any cloud provider that doesn't already support MachinePool 106 107 - Ensure graceful shutdown of pods is attempted on non-guaranteed instances 108 109 ## Proposal 110 111 To provide a consistent behaviour using non-guaranteed instances (Spot on AWS and Azure, Preepmtible on GCP) 112 across cloud providers, we must define a common behaviour based on the common features across each provider. 113 114 Based on the research on [non-guaranteed instances](#non-guaranteed-instances), 115 the following requirements for integration will work for each of AWS, Azure and GCP: 116 117 - Required configuration for enabling spot/preemptible instances should be added to the Infrastructure MachineSpec 118 - No configuration should be required outside of this scope 119 - MachineSpecs are part of the Infrastructure Templates used to create new Machines and as such, consistency is guaranteed across all instances built from this Template 120 - All instances created by a MachineSet/MachinePool will either be on spot/preemptible or on on-demand instances 121 122 - A Machine should be paired 1:1 with an instance on the cloud provider 123 - If the instance is preempted/terminated, the Infrastructure controller should not replace it 124 - If the instance is preempted/terminated, the cloud provider should not replace it 125 126 - The Infrastructure controller is responsible for creation of the instance only and should not attempt to remediate problems 127 128 - The Infrastructure controller should not attempt to verify that an instance can be created before attempting to create the instance 129 - If the cloud provider does not have capacity, the Machine Health Checker can (given required MHC) remove the Machine after a period. 130 MachineSets will ensure the correct number of Machines are created. 131 132 - Initially, support will focus on Machine/MachineSets with MachinePool support being added at a later date 133 134 ### User Stories 135 136 #### Story 1 137 138 As an operator of a Management Cluster, I want to reduce costs where possible by leveraging cheaper nodes for interruptible workloads on my Workload Clusters. 139 140 #### Story 2 141 142 As a user of a Workload Cluster, when a spot/preemptible node is due for termination, I want my workloads to be gracefully moved onto other nodes to minimise interruptions to my service. 143 144 ### Implementation Details/Notes/Constraints 145 146 #### Cloud Provider Implementation Specifics 147 148 ##### AWS 149 150 ###### Launching instances 151 152 To launch an instance as a Spot instance on AWS, a [SpotMarketOptions](https://docs.aws.amazon.com/sdk-for-go/api/service/ec2/#SpotMarketOptions) 153 needs to be added to the `RunInstancesInput`. Within this there are 3 options that matter: 154 155 - InstanceInterruptionBehaviour (default: terminate): This must be set to `terminate` otherwise the SpotInstanceType cannot be `one-time` 156 157 - SpotInstanceType (default: one-time): This must be set to `one-time` to ensure that each Machine only creates on EC2 instance and that the spot request is 158 159 - MaxPrice (default: On-Demand price): This can be **optionally** set to a string representation of the hourly maximum spot price. 160 If not set, the option will default to the On-Demand price of the EC2 instance type requested 161 162 The only option from this that needs exposing to the user from this is the `MaxPrice`, this option should be in an optional struct, if the struct is not nil, 163 then spot instances should be used, if the MaxPrice is set, this should be used instead of the default On-Demand price. 164 165 ``` 166 type SpotMarketOptions struct { 167 MaxPrice *string `json:”maxPrice,omitempty”` 168 } 169 170 type AWSMachineSpec struct { 171 ... 172 173 SpotMarketOptions *SpotMarketOptions `json:”spotMarketOptions,omitempty”` 174 } 175 ``` 176 177 ##### GCP 178 179 ###### Launching instances 180 181 To launch an instance as Preemptible on GCP, the `Preemptible` field must be set: 182 183 ``` 184 &compute.Instance{ 185 ... 186 Scheduling: &compute.Scheduling{ 187 ... 188 Preemptible: true, 189 }, 190 } 191 ``` 192 193 Therefore, to make the choice up to the user, this field should be added to the `GCPMachineSpec`: 194 195 ``` 196 type GCPMachineSpec struct { 197 ... 198 Preemptible bool `json:”preemptible”` 199 } 200 ``` 201 202 ##### Azure 203 204 ###### Launching Instances 205 206 To launch a VM as a Spot VM on Azure, the following 3 options need to be set within the [VirtualMachineProperties](https://github.com/Azure/azure-sdk-for-go/blob/8d7ac6eb6a149f992df6f0392eebf48544e2564a/services/compute/mgmt/2019-07-01/compute/models.go#L10274-L10309) 207 when the instance is created: 208 209 - Priority: This must be set to `Spot` to request a Spot VM 210 211 - Eviction Policy: This has two options, `Deallocate` or `Delete`. 212 Only `Deallocate` is valid when using singular Spot VMs and as such, this must be set to `Deallocate`. 213 (Delete is supported for VMs as part of VMSS only). 214 215 - BillingProfile (default: -1) : This is a struct containing a single field, `MaxPrice`. 216 This is a string representation of the maximum price the user wishes to pay for their VM. 217 Uses a string representation because [floats are disallowed](https://github.com/kubernetes-sigs/controller-tools/issues/245#issuecomment-518465214) in Kubernetes APIs. 218 This defaults to -1 which makes the maximum price the On-Demand price for the instance type. 219 This also means the instance will never be evicted for price reasons as Azure caps Spot Market prices at the On-Demand price. 220 (Note instances may still be evicted based on resource pressure within a region). 221 222 The only option that a user needs to interact with is the `MaxPrice` field within the `BillingProfile`, other fields only have 1 valid choice and as such can be inferred. 223 Similar to AWS, we can make an optional struct for SpotVMOptions, which, if present, implies the priority is `Spot`. 224 225 ``` 226 type SpotVMOptions struct { 227 MaxPrice *string `json:”maxPrice,omitempty”` 228 } 229 230 type AzureMachineSpec struct { 231 ... 232 233 SpotVMOptions *SpotVMOptions `json:”spotVMOptions,omitempty”` 234 } 235 ``` 236 237 ###### Deallocation 238 239 Since Spot VMs are not deleted when they are preempted and instead are deallocated, 240 users should utilise a MachineHealthCheck to monitor for preempted instances and replace them once they are stopped. 241 If they are left deallocated, their Disks and Networking are still active and chargeable by Azure. 242 243 When the MachineHealthCheck triggers a delete on the VM, 244 this will trigger the VM to be deleted which in turn will delete the other resources created as part of the VM. 245 246 **Note**: Because the instance is stopped, its Node is not removed from the API. 247 The Node will transition to an unready state which would be detected by a MachineHealthCheck, 248 though there may be some delay depending on the configuration of the MachineHealthCheck. 249 In the future, a termination handler could trigger the Machine to be deleted sooner. 250 251 252 253 254 ### 'Interruptible' label 255 256 In order to deploy the termination handler, we'll need to create a DaemonSet that runs it on each spot instance node. 257 258 Having `"cluster.x-k8s.io/interruptible"` label on Nodes that run on interruptible instances should help us with it. 259 260 Based on the discussion here https://github.com/kubernetes-sigs/cluster-api/pull/3668 ([1](https://github.com/kubernetes-sigs/cluster-api/pull/3668#issuecomment-696143653), [2](https://github.com/kubernetes-sigs/cluster-api/pull/3668#issuecomment-696862994).) we can do following: 261 1. User creates InfraMachine with whatever spec field(s) are required for that provider to indicate it's interruptible. 262 2. Infra provider sets InfraMachine.status.interruptible=true 263 3. Machine controller looks at InfraMachine.status.interruptible and ensures a label is set on the node if it is true. 264 4. Machine controller ensures the interruptible label is always present on the Node if InfraMachine.status.interruptible is true. 265 266 This snippet should work and it's similar to what is currently done to set node reference: 267 268 ``` 269 // Get and set the failure domain from the infrastructure provider. 270 var interruptible bool 271 err = util.UnstructuredUnmarshalField(infraConfig, &interruptible, "status", "interruptible") 272 switch { 273 case err == util.ErrUnstructuredFieldNotFound: // no-op 274 case err != nil: 275 return errors.Wrapf(err, "failed to get interruptible status from infrastructure provider for Machine %q in namespace %q", m.Name, m.Namespace) 276 } 277 278 if !interruptible { 279 return nil 280 } 281 282 // Here goes logic for assigning a label to node 283 ``` 284 285 ### Future Work 286 287 #### Termination handler 288 289 To enable graceful termination of workloads running on non-guaranteed instances, 290 a DaemonSet will need to be deployed to watch for termination notices and gracefully move workloads. 291 292 Alternatively, on AWS, termination events can be sourced via CloudWatch. 293 This would be preferable as a DaemonSet would not be required on workload clusters. 294 295 Since this is not essential for running on non-guaranteed instances and existing solutions exist for each provider, 296 users can deploy these existing solutions until CAPI has capacity to implement a solution. 297 298 #### Support for MachinePools 299 300 While MachinePools are being implemented across the three cloud providers that this project covers, 301 we will not be focusing on support non-guaranteed instances within MachinePools. 302 303 Once initial support for non-guaranteed instances has been tested and implemented within the providers, 304 we will investigate supporting non-guaranteed instances within MachinePools in a follow up proposal. 305 306 ### Risks and Mitigations 307 308 #### Control-Plane instances 309 310 Due to control-plane instances typically hosting etcd for the cluster, 311 running this on top of spot instances, where termination is more likely, 312 could introduce instability to the cluster or even result in a loss of quorum for the etcd cluster. 313 Running control-plane instances on top of spot instances should be forbidden. 314 315 There may also be limitations within cloud providers that restrict the usage of spot instances within the control-plane, 316 e.g. Azure Spot VMs do not support [ephemeral disks](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/spot-vms#limitations) which may be desired for control-plane instances. 317 318 This risk will be documented and it will be strongly advised that users do not attempt to create control-plane instances on spot instances. 319 To prevent it completely, an admission controller could be used to verify that Infrastructure Machines do not get created with the control-plane label, 320 specifying that they should run on spot-instances. 321 322 #### Cloud Provider rate limits 323 324 Currently, if there is an issue creating the Infrastructure instance for any reason, 325 the request to create the instance will be requeued. 326 When the issue is persistent (eg. Spot Bid too low on AWS), 327 this could lead to the Infrastructure controller attempting to create machines and failing in a loop. 328 329 To prevent this, Machine's could enter a failed state if persistent errors such as this occur. 330 This also has the added benefit of being more visible to a user, as currently, no error is reported apart from in logs. 331 332 Failing the Machine would allow a MachineHealthCheck to be used to clean up the Failed machines. 333 The MachineHealthCheck controller could handle the looping by using backoff on deletion of failed Machine's for a particular MachineHealthCheck, 334 which would be useful for MachineHealthCheck and keep this logic centralling in a non-cloud provider specific component of Cluster API. 335 336 ## Alternatives 337 338 ### Reserved Instances 339 340 Reserved instances offer cheaper compute costs by charging for the capacity up front for larger time periods. 341 Typically this is a yearly commitment to spending a certain amount. 342 343 While this would also allow users to save money on their compute, 344 it commits them to large up front spends, the savings are not as high and this could also be implemented tangentially to this proposal. 345 346 ## Upgrade Strategy 347 348 This proposal only adds new features and should not affect existing clusters. 349 No special upgrade considerations should be required. 350 351 ## Additional Details 352 353 ### Non-Guaranteed instances 354 355 Behaviour of non-guaranteed instances varies from provider to provider. 356 With each provider offering different ways to create the instances and different guarantees for the instances. 357 Each of the following sections details how non-guaranteed instances works for each provider. 358 359 #### AWS Spot Instances 360 361 Amazon’s Spot instances are available to customers via three different mechanisms. 362 Each mechanism requires the user to set a maximum price (a bid) they are willing to pay for the instances and, 363 until either no-capacity is left, or the market price exceeds their bid, the user will retain access to the machine. 364 365 ###### Spot backed Autoscaling Groups 366 367 Spot backed Autoscaling groups are identical to other Autoscaling groups, other than that they use Spot instances instead of On-Demand instances. 368 369 Autoscaling Groups are not currently supported within Cluster API, though adding support could be part of the MachinePool efforts. 370 If support were added, enabling Spot backed Autoscaling Groups would be a case of modifying the launch configuration to provide the relevant Spot options. 371 372 ###### Spot Fleet 373 374 Spot Fleets are similar to Spot backed Autoscaling Groups, but they differ in that there is no dedicated instance type for the group. 375 They can launch both On-Demand and Spot instances from a range of instance types available based on the market prices and the bid put forward by the user. 376 377 Similarly to Spot backed Autoscaling groups, there is currently no support within the Cluster API. 378 Spot Fleet could become part of the MachinePool effort, however this would require a considerable effort to design and implement and as such, 379 support should not be considered a goal within this proposal. 380 381 ###### Singular Spot Instances 382 Singular Spot instances are created using the same API as singular On-Demand instances. 383 By providing a single additional parameter, the API will instead launch a Spot Instance. 384 385 Given that the Cluster API currently implements Machine’s by using singular On-Demand instances, 386 adding singular Spot Instance support via this mechanism should be trivial. 387 388 ##### Other AWS Spot features of note 389 390 ###### Stop/Hibernate 391 392 Instead of terminating an instance when it is being interrupted, 393 Spot instances can be “stopped” or “hibernated” so that they can resume their workloads when new capacity becomes available. 394 395 Using this feature would contradict the functionality of the Machine Health Check remediation of failed nodes. 396 In cloud environments, it is expected that if a node is being switched off or taken away, a new one will replace it. 397 This option should not be made available to users to avoid conflicts within the Cluster API ecosystem. 398 399 ###### Termination Notices 400 401 Amazon provides a 2 minute notice of termination for Spot instances via it’s instance metadata service. 402 Each instance can poll the metadata service to see if it has been marked for termination. 403 There are [existing solutions](https://github.com/kube-aws/kube-spot-termination-notice-handler) 404 that run Daemonsets on Spot instances to gracefully drain workloads when the termination notice is given. 405 This is something that should be provided as part of the spot instance availability within Cluster API. 406 407 ###### Persistent Requests 408 409 Persistent requests allow users to ask that a Spot instance, once terminated, be replace by another instance when new capacity is available. 410 411 Using this feature would break assumptions in Cluster API since the instance ID for the Machine would change during its lifecycle. 412 The usage of this feature should be explicitly forbidden so that we do not break existing assumptions. 413 414 #### GCP Preemptible instances 415 416 GCP’s Preemptible instances are available to customers via two mechanisms. 417 For each, the instances are available at a fixed price and will be made available to users whenever there is capacity. 418 419 ###### Instance Groups 420 421 GCP Instance Groups can leverage Preemptible instances by modifying the instance template and setting Preemptible option. 422 423 Instance Groups are not currently supported within Cluster API, though adding support could be part of the MachinePool efforts. 424 If support were added, enabling Preemptible Instance Groups would be a case of modifying the configuration to provide the relevant Preemptible option. 425 426 ###### Single Instance 427 428 GCP Single Instances can run on Preemptible instances given the launch request specifies the preemptible option. 429 430 Given that the Cluster API currently implements Machine’s by using single instances, adding singular Preemptible Instance support via this mechanism should be trivial. 431 432 ##### Limitations of Preemptible 433 434 ###### 24 Hour limitation 435 436 Preemptible instance will, if not already, be terminated after 24 hours. 437 This means that the instances will be cycled regularly and as such, good handling of shutdown events should be implemented. 438 439 ###### Shutdown warning 440 441 GCP gives a 30 second warning for termination of Preemptible instances. 442 This signal comes via an ACPI G2 soft-off signal to the machine, which, could be intercepted to start a graceful termination of pods on the machine. 443 There are [existing projects](https://github.com/GoogleCloudPlatform/k8s-node-termination-handler) that already do this. 444 445 In the case that the node is reaching its 24 hour termination mark, 446 it may be safer to preempt this warning and shut down the node before the 30s shut down signal to provide adequate time for workloads to be moved gracefully/ 447 448 #### Azure Spot VMs 449 450 Azure recently announced Spot VMs as a replacement for their Low-Priority VMs which were in customer preview through the latter half of 2019. 451 Spot VMs work in a similar manner to AWS Spot Instances. A maximum price is set on the instance when it is created, and, until that price is reached, 452 the instance will be given to you and you will be charged the market rate. Should the price go above your maximum price, the instance will be preempted. 453 Additionally, at any point in time when Azure needs the capacity back, the Azure infrastructure will evict Spot instance. 454 455 Spot VMs are available in two forms in Azure. 456 457 ###### Scale Sets 458 459 Scale sets include support for Spot VMs by indicating when created, that they should be backed by Spot VMs. 460 At this point, an eviction policy should be set and a maximum price you wish to pay. 461 Alternatively, you can also choose to only be preempted in the case that there are capacity constraints, 462 in which case, you will pay whatever the market rate is, but will be preempted less often. 463 464 Scale Set are not currently supported within Cluster API, though they are being added as part of the MachinePool efforts. 465 Once support is added, enabling Spot backed Scale Sets would be a case of modifying the configuration to provide the relevant Spot options. 466 467 ###### Single Instances 468 Azure supports Spot VMs on single VM instances by indicating when created, that the VM should be a Spot VM. 469 At this point, an eviction policy should be set and a maximum price you wish to pay. 470 Alternatively, you can also choose to only be preempted in the case that there are capacity constraints, 471 in which case, you will pay whatever the market rate is, but will be preempted less often. 472 473 Given that the Cluster API currently implements Machine’s by using single instances, adding singular Spot VM support via this mechanism should be trivial. 474 475 ##### Important Spot VM notes 476 477 ###### Termination Notices 478 479 Azure uses their Scheduled Events API to notify Spot VMs that they are due to be preempted. 480 This is a similar service to the AWS metadata service that each machine can poll to see events for itself. 481 Azure only gives 30 seconds warning for nodes being preempted though. 482 483 A Daemonset solution similar to the AWS termination handlers could be implemented to provide graceful shutdown with Azure Spot VMs. 484 For example see this [existing solution](https://github.com/awesomenix/drainsafe). 485 486 ###### Eviction Policies 487 488 Azure Spot VMs support two types of eviction policy: 489 490 - Deallocate: This stops the VM but keeps disks and networking ready to be restarted. 491 In this state, VMs maintain usage of the CPU quota and as such, are effectively just paused or hibernating. 492 This is the *only* supported eviction policy for Single Instance Spot VMs. 493 494 - Delete: This deletes the VM and all associated disks and networking when the node is preempted. 495 This is *only* supported on Scale Sets backed by Spot VMs. 496 497 ## Implementation History 498 499 - [x] 12/11/2019: Proposed idea in an [issue](https://github.com/kubernetes-sigs/cluster-api/issues/1876) 500 - [x] 02/25/2020: Compile a Google Doc following the CAEP template (https://docs.google.com/document/d/1naxBVVlI_O-u6TchvQyZFbIaKrwU9qAzYD4akyV68nQ) 501 - [ ] MM/DD/YYYY: First round of feedback from community 502 - [ ] MM/DD/YYYY: Present proposal at a [community meeting] 503 - [x] 03/30/2020: Open proposal PR 504 505 <!-- Links --> 506 [community meeting]: https://docs.google.com/document/d/1Ys-DOR5UsgbMEeciuG0HOgDQc8kZsaWIWJeKJ1-UfbY