agones.dev/agones@v1.54.0/test/load/allocation/README.md (about) 1 # Load Tests for gRPC Allocation Service 2 3 [Allocation Load Test](#allocation-load-test) and [Scenario Tests](#scenario-tests) 4 for testing the performance of the gRPC allocation service. 5 6 ## Prerequisites 7 8 1. A [Kubernetes cluster](https://agones.dev/site/docs/installation/creating-cluster/) with [Agones](https://agones.dev/site/docs/installation/install-agones/) 9 - We recommend installing Agones using the [Helm](https://agones.dev/site/docs/installation/install-agones/helm/) package manager. 10 - If you are running in GCP, use a regional cluster instead of a zonal cluster to ensure high availability of the cluster control plane. 11 - Use a dedicated node pool for the Agones controllers with multiple CPUs per node, e.g. 'e2-standard-4'. 12 - For Allocation Load Test: 13 - In the default node pool (where the Game Server pods are created), 75 nodes are required to make sure there are enough nodes available for all game servers to move into the ready state. When using a regional GKE cluster with three zones that will require a configuration of 25 nodes per zone. 14 - For Scenario Tests: 15 - See [Kubernetes Cluster Setup for Scenario Tests](#kubernetes-cluster-setup-for-scenario-tests) 16 2. A configured [Allocator Service](https://agones.dev/site/docs/advanced/allocator-service/) 17 - The allocator service uses gRPC. In order to be able to call the service, TLS 18 and mTLS have to be set up on the Server and Client. 19 3. (Optional) [Metrics](https://agones.dev/site/docs/guides/metrics/) for monitoring Agones workloads 20 21 # Allocation Load Test 22 23 This load test aims to validate performance of the gRPC allocation service. 24 25 ## Fleet Setting 26 27 We used the sample [fleet configuration](./fleet.yaml). We set the `automaticShutdownDelaySec` parameter to 600 so simple-game-server game servers shutdown after 10 28 minutes. This helps to easily re-run the test without having to delete the game servers and allows to run tests continously. 29 30 31 ## Running the test 32 33 ``` 34 kubectl apply -f ./fleet.yaml 35 ```` 36 Wait until the fleet shows 4000 ready game servers before running the allocation script. 37 ``` 38 kubectl get fleet 39 NAME SCHEDULING DESIRED CURRENT ALLOCATED READY AGE 40 load-test-fleet Packed 4000 4000 0 4000 2m38s 41 ``` 42 43 You can use the provided runAllocation.sh script by providing two parameters: 44 - number of clients (to do parallel allocations) 45 - number of allocations for client 46 47 For making 4000 allocations calls, you can provide 40 and 100 48 49 ``` 50 ./runAllocation.sh 40 100 51 ``` 52 53 Script will print out the start and end date/time: 54 ``` 55 started: 2020-10-22 23:33:25.828724579 -0700 PDT m=+0.005921014 56 finished: 2020-10-22 23:34:18.381396416 -0700 PDT m=+52.558592912 57 ``` 58 59 If some errors occurred, the error message will be printed: 60 ``` 61 started: 2020-10-22 22:16:47.322731849 -0700 PDT m=+0.002953843 62 (failed(client=3,allocation=43): rpc error: code = Unknown desc = error updating allocated gameserver: Operation cannot be fulfilled on gameservers.agones.dev "simple-game-server-mlljx-g9crp": the object has been modified; please apply your changes to the latest version and try again 63 (failed(client=2,allocation=47): rpc error: code = Unknown desc = error updating allocated gameserver: Operation cannot be fulfilled on gameservers.agones.dev "simple-game-server-mlljx-rxflv": the object has been modified; please apply your changes to the latest version and try again 64 (failed(client=7,allocation=45): rpc error: code = Unknown desc = error updating allocated gameserver: Operation cannot be fulfilled on gameservers.agones.dev "simple-game-server-mlljx-x4khw": the object has been modified; please apply your changes to the latest version and try again 65 finished: 2020-10-22 22:17:18.822039094 -0700 PDT m=+31.502261092 66 ``` 67 68 You can use environment variables overwrite defaults. To run only a single run of tests, you can use: 69 ``` 70 TESTRUNSCOUNT=1 ./runAllocation.sh 40 10 71 ``` 72 73 74 # Scenario Tests 75 76 The scenario test allows you to generate a variable number of allocations to 77 your cluster over time, simulating a game where clients arrive in an unsteady 78 pattern. The game servers used in the test are configured to shutdown after 79 being allocated, simulating the GameServer churn that is expected during 80 normal game play. 81 82 ## Kubernetes Cluster Setup for Scenario Tests 83 84 For the scenario test to achieve high throughput, you can create multiple groups 85 of nodes in your cluster. During testing (on GKE), we created a node pool for 86 the Kubernetes system components (such as the metrics server and dns servers), a 87 node pool for the Agones system components (as recommended in the installation 88 guide), and a node pool for the game servers. 89 90 On GKE, to restrict the Kubernetes system components to their own set of nodes, 91 you can create a node pool with the taint 92 `components.gke.io/gke-managed-components=true:NoExecute`. 93 94 To prevent the Kubernetes system components from running on the game servers 95 node pool, that node pool was created with the taint 96 `scenario-test.io/game-servers=true:NoExecute` 97 and the Agones system node pool used the normal taint 98 `agones.dev/agones-system=true:NoExecute`. 99 100 In addition, the GKE cluster was configured as a regional cluster to ensure high 101 availability of the cluster control plane. 102 103 The following commands were used to construct a cluster for testing: 104 105 ```bash 106 export REGION="us-west1" 107 export VERSION="1.23" 108 109 gcloud container clusters create scenario-test --cluster-version=$VERSION \ 110 --tags=game-server --scopes=gke-default --num-nodes=2 \ 111 --no-enable-autoupgrade --machine-type=n2-standard-2 \ 112 --region=$REGION --enable-ip-alias 113 114 gcloud container node-pools create kube-system --cluster=scenario-test \ 115 --no-enable-autoupgrade \ 116 --node-taints components.gke.io/gke-managed-components=true:NoExecute \ 117 --num-nodes=1 --machine-type=n2-standard-16 --region $REGION 118 119 gcloud container node-pools create agones-system --cluster=scenario-test \ 120 --no-enable-autoupgrade --node-taints agones.dev/agones-system=true:NoExecute \ 121 --node-labels agones.dev/agones-system=true --num-nodes=1 \ 122 --machine-type=n2-standard-16 --region $REGION 123 124 gcloud container node-pools create game-servers --cluster=scenario-test \ 125 --node-taints scenario-test.io/game-servers=true:NoExecute --num-nodes=1 \ 126 --machine-type n2-standard-2 --no-enable-autoupgrade \ 127 --region $REGION --tags=game-server --scopes=gke-default \ 128 --enable-autoscaling --max-nodes=300 --min-nodes=175 129 ``` 130 131 ## Agones Modifications 132 133 For the scenario tests, we modified the Agones installation in a number of ways. 134 135 First, we made sure that the Agones pods would _only_ run in the Agones node 136 pool by changing the node affinity in the deployments for the controller, 137 allocator service, and ping service to 138 `requiredDuringSchedulingIgnoredDuringExecution`. 139 140 We also increased the resources for the controller and allocator service pods, 141 and made sure to specify both requests and limits to ensure that the pods were 142 given the highest quality of service. 143 144 These configuration changes are captured in 145 [scenario-values.yaml](scenario-values.yaml) and can be applied during 146 installation using helm: 147 148 ```bash 149 helm install my-release --namespace agones-system -f scenario-values.yaml agones/agones --create-namespace 150 ``` 151 152 Alternatively, these changes can be applied to an existing Agones installation 153 by running [`./configure-agones.sh`](configure-agones.sh). 154 155 ## Fleet Setting 156 157 We used the sample [fleet configuration](./scenario-fleet.yaml) and [fleet autoscaler configuration](./autoscaler.yaml). 158 159 To reduce pod churn in the system, the simple game servers are configured to 160 return themselves to `Ready` after being allocated the first 10 times following 161 the [Reusing Allocated GameServers for more than one game 162 session](https://agones.dev/site/docs/integration-patterns/reusing-gameservers/) 163 integration pattern. After 10 simulated game sessions, the simple game servers 164 then exit automatically. The fleet configuration above sets each game session to 165 last for 1 minute, representing a short game. 166 167 ## Running the test 168 169 You can use the provided runScenario.sh script by providing one parameter (a 170 scenario file). The scenario file is a simple text file where each line 171 represents a "scenario" that the program will execute before moving to the next 172 scenario. A scenario is a duration, the number of concurrent clients to use, and 173 the interval of the allocation requests submitted by each client in milliseconds 174 separated by a comma. The program will create the desired number of clients and 175 those clients send allocation requests to the allocator service for the scenario 176 duration in the defined cadence. At the end of each scenario the program will print 177 out some statistics for the scenario. 178 179 Two sample scenario files are included in this directory, one which sends a 180 constant rate of allocations for the duration of the test and another that sends 181 a variable number of allocations. 182 183 Since error counts are gathered per scenario, it's recommended to keep each 184 scenario short (e.g. 10 minutes) to narrow down the window when errors 185 occurred even if the allocation rate stays at the same level for longer than 186 10 minutes at a time. 187