github.com/dgraph-io/dgraph@v1.2.8/wiki/content/deploy/index.md (about) 1 +++ 2 date = "2017-03-20T22:25:17+11:00" 3 title = "Deploy" 4 +++ 5 6 This page talks about running Dgraph in various deployment modes, in a distributed fashion and involves 7 running multiple instances of Dgraph, over multiple servers in a cluster. 8 9 {{% notice "tip" %}} 10 For a single server setup, recommended for new users, please see [Get Started](/get-started) page. 11 {{% /notice %}} 12 13 ## Install Dgraph 14 #### Docker 15 16 ```sh 17 docker pull dgraph/dgraph:latest 18 19 # You can test that it worked fine, by running: 20 docker run -it dgraph/dgraph:latest dgraph 21 ``` 22 23 #### Automatic download 24 25 Running 26 ```sh 27 curl https://get.dgraph.io -sSf | bash 28 29 # Test that it worked fine, by running: 30 dgraph 31 ``` 32 would install the `dgraph` binary into your system. 33 34 #### Manual download [optional] 35 36 If you don't want to follow the automatic installation method, you could manually download the appropriate tar for your platform from **[Dgraph releases](https://github.com/dgraph-io/dgraph/releases)**. After downloading the tar for your platform from Github, extract the binary to `/usr/local/bin` like so. 37 38 ```sh 39 # For Linux 40 $ sudo tar -C /usr/local/bin -xzf dgraph-linux-amd64-VERSION.tar.gz 41 42 # For Mac 43 $ sudo tar -C /usr/local/bin -xzf dgraph-darwin-amd64-VERSION.tar.gz 44 45 # Test that it worked fine, by running: 46 dgraph 47 ``` 48 49 #### Building from Source 50 51 {{% notice "note" %}} 52 You can build the Ratel UI from source seperately following its build 53 [instructions](https://github.com/dgraph-io/ratel/blob/master/INSTRUCTIONS.md). 54 Ratel UI is distributed via Dgraph releases using any of the download methods 55 listed above. 56 {{% /notice %}} 57 58 Make sure you have [Go](https://golang.org/dl/) v1.11+ installed. 59 60 You'll need the following dependencies to install Dgraph using `make`: 61 ```bash 62 sudo apt-get update 63 sudo apt-get install gcc make 64 ``` 65 66 After installing Go, run 67 ```sh 68 # This should install dgraph binary in your $GOPATH/bin. 69 70 git clone https://github.com/dgraph-io/dgraph.git 71 cd ./dgraph 72 make install 73 ``` 74 75 If you get errors related to `grpc` while building them, your 76 `go-grpc` version might be outdated. We don't vendor in `go-grpc`(because it 77 causes issues while using the Go client). Update your `go-grpc` by running. 78 ```sh 79 go get -u -v google.golang.org/grpc 80 ``` 81 82 #### Config 83 84 The full set of dgraph's configuration options (along with brief descriptions) 85 can be viewed by invoking dgraph with the `--help` flag. For example, to see 86 the options available for `dgraph alpha`, run `dgraph alpha --help`. 87 88 The options can be configured in multiple ways (from highest precedence to 89 lowest precedence): 90 91 - Using command line flags (as described in the help output). 92 93 - Using environment variables. 94 95 - Using a configuration file. 96 97 If no configuration for an option is used, then the default value as described 98 in the `--help` output applies. 99 100 Multiple configuration methods can be used all at the same time. E.g. a core 101 set of options could be set in a config file, and instance specific options 102 could be set using environment vars or flags. 103 104 The environment variable names mirror the flag names as seen in the `--help` 105 output. They are the concatenation of `DGRAPH`, the subcommand invoked 106 (`ALPHA`, `ZERO`, `LIVE`, or `BULK`), and then the name of the flag (in 107 uppercase). For example, instead of using `dgraph alpha --lru_mb=8096`, you 108 could use `DGRAPH_ALPHA_LRU_MB=8096 dgraph alpha`. 109 110 Configuration file formats supported are JSON, TOML, YAML, HCL, and Java 111 properties (detected via file extension). The file extensions are .json, .toml, 112 .yml or .yaml, .hcl, and .properties for each format. 113 114 A configuration file can be specified using the `--config` flag, or an 115 environment variable. E.g. `dgraph zero --config my_config.json` or 116 `DGRAPH_ZERO_CONFIG=my_config.json dgraph zero`. 117 118 The config file structure is just simple key/value pairs (mirroring the flag 119 names). 120 121 Example JSON config file (config.json): 122 123 ```json 124 { 125 "my": "localhost:7080", 126 "zero": "localhost:5080", 127 "lru_mb": 4096, 128 "postings": "/path/to/p", 129 "wal": "/path/to/w" 130 } 131 ``` 132 133 Example TOML config file (config.toml): 134 135 ```toml 136 my = "localhost:7080" 137 zero = "localhost:5080" 138 lru_mb = 4096 139 postings = "/path/to/p" 140 wal = "/path/to/w" 141 ``` 142 143 144 Example YAML config file (config.yml): 145 146 ```yaml 147 my: "localhost:7080" 148 zero: "localhost:5080" 149 lru_mb: 4096 150 postings: "/path/to/p" 151 wal: "/path/to/w" 152 ``` 153 154 Example HCL config file (config.hcl): 155 156 ```hcl 157 my = "localhost:7080" 158 zero = "localhost:5080" 159 lru_mb = 4096 160 postings = "/path/to/p" 161 wal = "/path/to/w" 162 ``` 163 164 Example Java properties config file (config.properties): 165 ```text 166 my=localhost:7080 167 zero=localhost:5080 168 lru_mb=4096 169 postings=/path/to/p 170 wal=/path/to/w 171 ``` 172 173 ## Cluster Setup 174 175 ### Understanding Dgraph cluster 176 177 Dgraph is a truly distributed graph database - not a master-slave replication of 178 universal dataset. It shards by predicate and replicates predicates across the 179 cluster, queries can be run on any node and joins are handled over the 180 distributed data. A query is resolved locally for predicates the node stores, 181 and via distributed joins for predicates stored on other nodes. 182 183 For effectively running a Dgraph cluster, it's important to understand how 184 sharding, replication and rebalancing works. 185 186 **Sharding** 187 188 Dgraph colocates data per predicate (* P *, in RDF terminology), thus the 189 smallest unit of data is one predicate. To shard the graph, one or many 190 predicates are assigned to a group. Each Alpha node in the cluster serves a 191 single group. Dgraph Zero assigns a group to each Alpha node. 192 193 **Shard rebalancing** 194 195 Dgraph Zero tries to rebalance the cluster based on the disk usage in each 196 group. If Zero detects an imbalance, it would try to move a predicate along with 197 its indices to a group that has minimum disk usage. This can make the predicate 198 temporarily read-only. Queries for the predicate will still be serviced, but any 199 mutations for the predicate will be rejected and should be retried after the 200 move is finished. 201 202 Zero would continuously try to keep the amount of data on each server even, 203 typically running this check on a 10-min frequency. Thus, each additional 204 Dgraph Alpha instance would allow Zero to further split the predicates from 205 groups and move them to the new node. 206 207 **Consistent Replication** 208 209 If `--replicas` flag is set to something greater than one, Zero would assign the 210 same group to multiple nodes. These nodes would then form a Raft group aka 211 quorum. Every write would be consistently replicated to the quorum. To achieve 212 consensus, its important that the size of quorum be an odd number. Therefore, we 213 recommend setting `--replicas` to 1, 3 or 5 (not 2 or 4). This allows 0, 1, or 2 214 nodes serving the same group to be down, respectively without affecting the 215 overall health of that group. 216 217 ## Ports Usage 218 219 Dgraph cluster nodes use different ports to communicate over gRPC and HTTP. User has to pay attention while choosing these ports based on their topology and deployment-mode as each port needs different access security rules or firewall. 220 221 ### Types of ports 222 223 - **gRPC-internal:** Port that is used between the cluster nodes for internal communication and message exchange. 224 - **gRPC-external:** Port that is used by Dgraph clients, Dgraph Live Loader , and Dgraph Bulk loader to access APIs over gRPC. 225 - **http-external:** Port that is used by clients to access APIs over HTTP and other monitoring & administrative tasks. 226 227 ### Ports used by different nodes 228 229 Dgraph Node Type | gRPC-internal | gRPC-external | HTTP-external 230 ------------------|----------------|---------------|--------------- 231 zero | --Not Used-- | 5080 | 6080 232 alpha | 7080 | 9080 | 8080 233 ratel | --Not Used-- | --Not Used-- | 8000 234 235 Users have to modify security rules or open firewall depending up on their underlying network to allow communication between cluster nodes and between a server and a client. During development a general rule could be wide open *-external (gRPC/HTTP) ports to public and gRPC-internal to be open within the cluster nodes. 236 237 **Ratel UI** accesses Dgraph Alpha on the HTTP-external port (default localhost:8080) and can be configured to talk to remote Dgraph cluster. This way you can run Ratel on your local machine and point to a remote cluster. But if you are deploying Ratel along with Dgraph cluster, then you may have to expose 8000 to the public. 238 239 **Port Offset** To make it easier for user to setup the cluster, Dgraph defaults the ports used by Dgraph nodes and let user to provide an offset (through command option `--port_offset`) to define actual ports used by the node. Offset can also be used when starting multiple zero nodes in a HA setup. 240 241 For example, when a user runs a Dgraph Alpha by setting `--port_offset 2`, then the Alpha node binds to 7082 (gRPC-internal), 8082 (HTTP-external) & 9092 (gRPC-external) respectively. 242 243 **Ratel UI** by default listens on port 8000. You can use the `-port` flag to configure to listen on any other port. 244 245 {{% notice "tip" %}} 246 **For Dgraph v1.0.2 (or older)** 247 248 Zero's default ports are 7080 and 8080. When following instructions for the different setup guides below, override the Zero ports using `--port_offset` to match the current default ports. 249 250 ```sh 251 # Run Zero with ports 5080 and 6080 252 dgraph zero --idx=1 --port_offset -2000 253 # Run Zero with ports 5081 and 6081 254 dgraph zero --idx=2 --port_offset -1999 255 ``` 256 Likewise, Ratel's default port is 8081, so override it using `--port` to the current default port. 257 258 ```sh 259 dgraph-ratel --port 8080 260 ``` 261 {{% /notice %}} 262 263 ### HA Cluster Setup 264 265 In a high-availability setup, we need to run 3 or 5 replicas for Zero, and similarly, 3 or 5 replicas for Alpha. 266 {{% notice "note" %}} 267 If number of replicas is 2K + 1, up to **K servers** can be down without any impact on reads or writes. 268 269 Avoid keeping replicas to 2K (even number). If K servers go down, this would block reads and writes, due to lack of consensus. 270 {{% /notice %}} 271 272 **Dgraph Zero** 273 Run three Zero instances, assigning a unique ID(Integer) to each via `--idx` flag, and 274 passing the address of any healthy Zero instance via `--peer` flag. 275 276 To run three replicas for the alphas, set `--replicas=3`. Every time a new 277 Dgraph Alpha is added, Zero would check the existing groups and assign them to 278 one, which doesn't have three replicas. 279 280 **Dgraph Alpha** 281 Run as many Dgraph Alphas as you want. You can manually set `--idx` flag, or you 282 can leave that flag empty, and Zero would auto-assign an id to the Alpha. This 283 id would get persisted in the write-ahead log, so be careful not to delete it. 284 285 The new Alphas will automatically detect each other by communicating with 286 Dgraph zero and establish connections to each other. You can provide a list of 287 zero addresses to alpha using the `--zero` flag. Alpha will try to connect to 288 one of the zeros starting from the first zero address in the list. For example: 289 `--zero=zero1,zero2,zero3` where zero1 is the `host:port` of a zero instance. 290 291 Typically, Zero would first attempt to replicate a group, by assigning a new 292 Dgraph alpha to run the same group as assigned to another. Once the group has 293 been replicated as per the `--replicas` flag, Zero would create a new group. 294 295 Over time, the data would be evenly split across all the groups. So, it's 296 important to ensure that the number of Dgraph alphas is a multiple of the 297 replication setting. For e.g., if you set `--replicas=3` in Zero, then run three 298 Dgraph alphas for no sharding, but 3x replication. Run six Dgraph alphas, for 299 sharding the data into two groups, with 3x replication. 300 301 ## Single Host Setup 302 303 ### Run directly on the host 304 305 **Run dgraph zero** 306 307 ```sh 308 dgraph zero --my=IPADDR:5080 309 ``` 310 The `--my` flag is the connection that Dgraph alphas would dial to talk to 311 zero. So, the port `5080` and the IP address must be visible to all the Dgraph alphas. 312 313 For all other various flags, run `dgraph zero --help`. 314 315 **Run dgraph alpha** 316 317 ```sh 318 dgraph alpha --lru_mb=<typically one-third the RAM> --my=IPADDR:7080 --zero=localhost:5080 319 dgraph alpha --lru_mb=<typically one-third the RAM> --my=IPADDR:7081 --zero=localhost:5080 -o=1 320 ``` 321 322 Notice the use of `-o` for the second Alpha to add offset to the default ports used. Zero automatically assigns an unique ID to each Alpha, which is persisted in the write ahead log (wal) directory, users can specify the index using `--idx` option. Dgraph Alphas use two directories to persist data and 323 wal logs, and these directories must be different for each Alpha if they are running on the same host. You can use `-p` and `-w` to change the location of the data and WAL directories. For all other flags, run 324 325 `dgraph alpha --help`. 326 327 **Run dgraph UI** 328 329 ```sh 330 dgraph-ratel 331 ``` 332 333 ### Run using Docker 334 335 Dgraph cluster can be setup running as containers on a single host. First, you'd want to figure out the host IP address. You can typically do that via 336 337 ```sh 338 ip addr # On Arch Linux 339 ifconfig # On Ubuntu/Mac 340 ``` 341 We'll refer to the host IP address via `HOSTIPADDR`. 342 343 **Run dgraph zero** 344 345 ```sh 346 mkdir ~/zero # Or any other directory where data should be stored. 347 348 docker run -it -p 5080:5080 -p 6080:6080 -v ~/zero:/dgraph dgraph/dgraph:latest dgraph zero --my=HOSTIPADDR:5080 349 ``` 350 351 **Run dgraph alpha** 352 ```sh 353 mkdir ~/server1 # Or any other directory where data should be stored. 354 355 docker run -it -p 7080:7080 -p 8080:8080 -p 9080:9080 -v ~/server1:/dgraph dgraph/dgraph:latest dgraph alpha --lru_mb=<typically one-third the RAM> --zero=HOSTIPADDR:5080 --my=HOSTIPADDR:7080 356 357 mkdir ~/server2 # Or any other directory where data should be stored. 358 359 docker run -it -p 7081:7081 -p 8081:8081 -p 9081:9081 -v ~/server2:/dgraph dgraph/dgraph:latest dgraph alpha --lru_mb=<typically one-third the RAM> --zero=HOSTIPADDR:5080 --my=HOSTIPADDR:7081 -o=1 360 ``` 361 Notice the use of -o for server2 to override the default ports for server2. 362 363 **Run dgraph UI** 364 ```sh 365 docker run -it -p 8000:8000 dgraph/dgraph:latest dgraph-ratel 366 ``` 367 368 ### Run using Docker Compose (On single AWS instance) 369 370 We will use [Docker Machine](https://docs.docker.com/machine/overview/). It is a tool that lets you install Docker Engine on virtual machines and easily deploy applications. 371 372 * [Install Docker Machine](https://docs.docker.com/machine/install-machine/) on your machine. 373 374 {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS config. 375 Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}} 376 377 Here we'll go through an example of deploying Dgraph Zero, Alpha and Ratel on an AWS instance. 378 379 * Make sure you have Docker Machine installed by following [instructions](https://docs.docker.com/machine/install-machine/), provisioning an instance on AWS is just one step away. You'll have to [configure your AWS credentials](http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html) for programmatic access to the Amazon API. 380 381 * Create a new docker machine. 382 383 ```sh 384 docker-machine create --driver amazonec2 aws01 385 ``` 386 387 Your output should look like 388 389 ```sh 390 Running pre-create checks... 391 Creating machine... 392 (aws01) Launching instance... 393 ... 394 ... 395 Docker is up and running! 396 To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env aws01 397 ``` 398 399 The command would provision a `t2-micro` instance with a security group called `docker-machine` 400 (allowing inbound access on 2376 and 22). You can either edit the security group to allow inbound access to '5080`, `8080`, `9080` (default ports for Dgraph Zero & Alpha) or you can provide your own security 401 group which allows inbound access on port 22, 2376 (required by Docker Machine), 5080, 8080 and 9080. Remember port *5080* is only required if you are running Dgraph Live Loader or Dgraph Bulk Loader from outside. 402 403 [Here](https://docs.docker.com/machine/drivers/aws/#options) is a list of full options for the `amazonec2` driver which allows you choose the instance type, security group, AMI among many other things. 404 405 {{% notice "tip" %}}Docker machine supports [other drivers](https://docs.docker.com/machine/drivers/gce/) like GCE, Azure etc.{{% /notice %}} 406 407 * Install and run Dgraph using docker-compose 408 409 Docker Compose is a tool for running multi-container Docker applications. You can follow the 410 instructions [here](https://docs.docker.com/compose/install/) to install it. 411 412 Copy the file below in a directory on your machine and name it `docker-compose.yml`. 413 414 ```sh 415 version: "3.2" 416 services: 417 zero: 418 image: dgraph/dgraph:latest 419 volumes: 420 - /data:/dgraph 421 ports: 422 - 5080:5080 423 - 6080:6080 424 restart: on-failure 425 command: dgraph zero --my=zero:5080 426 server: 427 image: dgraph/dgraph:latest 428 volumes: 429 - /data:/dgraph 430 ports: 431 - 8080:8080 432 - 9080:9080 433 restart: on-failure 434 command: dgraph alpha --my=server:7080 --lru_mb=2048 --zero=zero:5080 435 ratel: 436 image: dgraph/dgraph:latest 437 ports: 438 - 8000:8000 439 command: dgraph-ratel 440 ``` 441 442 {{% notice "note" %}}The config mounts `/data`(you could mount something else) on the instance to `/dgraph` within the 443 container for persistence.{{% /notice %}} 444 445 * Connect to the Docker Engine running on the machine. 446 447 Running `docker-machine env aws01` tells us to run the command below to configure 448 our shell. 449 ``` 450 eval $(docker-machine env aws01) 451 ``` 452 This configures our Docker client to talk to the Docker engine running on the AWS Machine. 453 454 Finally run the command below to start the Zero and Alpha. 455 ``` 456 docker-compose up -d 457 ``` 458 This would start 3 Docker containers running Dgraph Zero, Alpha and Ratel on the same machine. Docker would restart the containers in case there is any error. 459 You can look at the logs using `docker-compose logs`. 460 461 ## Multi Host Setup 462 463 ### Using Docker Swarm 464 465 #### Cluster Setup Using Docker Swarm 466 467 {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS config. 468 Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}} 469 470 Here we'll go through an example of deploying 3 Dgraph Alpha nodes and 1 Zero on three different AWS instances using Docker Swarm with a replication factor of 3. 471 472 * Make sure you have Docker Machine installed by following [instructions](https://docs.docker.com/machine/install-machine/). 473 474 ```sh 475 docker-machine --version 476 ``` 477 478 * Create 3 instances on AWS and [install Docker Engine](https://docs.docker.com/engine/installation/) on them. This can be done manually or by using `docker-machine`. 479 You'll have to [configure your AWS credentials](http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html) to create the instances using Docker Machine. 480 481 Considering that you have AWS credentials setup, you can use the below commands to start 3 AWS 482 `t2-micro` instances with Docker Engine installed on them. 483 484 ```sh 485 docker-machine create --driver amazonec2 aws01 486 docker-machine create --driver amazonec2 aws02 487 docker-machine create --driver amazonec2 aws03 488 ``` 489 490 Your output should look like 491 492 ```sh 493 Running pre-create checks... 494 Creating machine... 495 (aws01) Launching instance... 496 ... 497 ... 498 Docker is up and running! 499 To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env aws01 500 ``` 501 502 The command would provision a `t2-micro` instance with a security group called `docker-machine` 503 (allowing inbound access on 2376 and 22). 504 505 You would need to edit the `docker-machine` security group to open inbound traffic on the following ports. 506 507 1. Allow all inbound traffic on all ports with Source being `docker-machine` 508 security ports so that Docker related communication can happen easily. 509 510 2. Also open inbound TCP traffic on the following ports required by Dgraph: 511 `5080`, `6080`, `8000`, `808[0-2]`, `908[0-2]`. Remember port *5080* is only 512 required if you are running Dgraph Live Loader or Dgraph Bulk Loader from 513 outside. You need to open `7080` to enable Alpha-to-Alpha communication in 514 case you have not opened all ports in #1. 515 516 If you are on AWS, below is the security group (**docker-machine**) after 517 necessary changes. 518 519 {{% load-img "/images/aws.png" "AWS Security Group" %}} 520 521 [Here](https://docs.docker.com/machine/drivers/aws/#options) is a list of full options for the `amazonec2` driver which allows you choose the 522 instance type, security group, AMI among many other 523 things. 524 525 {{% notice "tip" %}}Docker machine supports [other drivers](https://docs.docker.com/machine/drivers/gce/) like GCE, Azure etc.{{% /notice %}} 526 527 Running `docker-machine ps` shows all the AWS EC2 instances that we started. 528 ```sh 529 ➜ ~ docker-machine ls 530 NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS 531 aws01 - amazonec2 Running tcp://34.200.239.30:2376 v17.11.0-ce 532 aws02 - amazonec2 Running tcp://54.236.58.120:2376 v17.11.0-ce 533 aws03 - amazonec2 Running tcp://34.201.22.2:2376 v17.11.0-ce 534 ``` 535 536 * Start the Swarm 537 538 Docker Swarm has manager and worker nodes. Swarm can be started and updated on manager nodes. We 539 will setup `aws01` as swarm manager. You can first run the following commands to initialize the 540 swarm. 541 542 We are going to use the internal IP address given by AWS. Run the following command to get the 543 internal IP for `aws01`. Lets assume `172.31.64.18` is the internal IP in this case. 544 ``` 545 docker-machine ssh aws01 ifconfig eth0 546 ``` 547 548 Now that we have the internal IP, let's initiate the Swarm. 549 550 ```sh 551 # This configures our Docker client to talk to the Docker engine running on the aws01 host. 552 eval $(docker-machine env aws01) 553 docker swarm init --advertise-addr 172.31.64.18 554 ``` 555 556 Output: 557 ``` 558 Swarm initialized: current node (w9mpjhuju7nyewmg8043ypctf) is now a manager. 559 560 To add a worker to this swarm, run the following command: 561 562 docker swarm join \ 563 --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \ 564 172.31.64.18:2377 565 566 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions. 567 ``` 568 569 Now we will make other nodes join the swarm. 570 571 ```sh 572 eval $(docker-machine env aws02) 573 docker swarm join \ 574 --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \ 575 172.31.64.18:2377 576 ``` 577 578 Output: 579 ``` 580 This node joined a swarm as a worker. 581 ``` 582 583 Similarly, aws03 584 ```sh 585 eval $(docker-machine env aws03) 586 docker swarm join \ 587 --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \ 588 172.31.64.18:2377 589 ``` 590 591 On the Swarm manager `aws01`, verify that your swarm is running. 592 ```sh 593 docker node ls 594 ``` 595 596 Output: 597 ```sh 598 ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 599 ghzapjsto20c6d6l3n0m91zev aws02 Ready Active 600 rb39d5lgv66it1yi4rto0gn6a aws03 Ready Active 601 waqdyimp8llvca9i09k4202x5 * aws01 Ready Active Leader 602 ``` 603 604 * Start the Dgraph cluster 605 606 Copy the following file on your host machine and name it as `docker-compose.yml` 607 608 ```sh 609 version: "3" 610 networks: 611 dgraph: 612 services: 613 zero: 614 image: dgraph/dgraph:latest 615 volumes: 616 - data-volume:/dgraph 617 ports: 618 - 5080:5080 619 - 6080:6080 620 networks: 621 - dgraph 622 deploy: 623 placement: 624 constraints: 625 - node.hostname == aws01 626 command: dgraph zero --my=zero:5080 --replicas 3 627 alpha1: 628 image: dgraph/dgraph:latest 629 hostname: "alpha1" 630 volumes: 631 - data-volume:/dgraph 632 ports: 633 - 8080:8080 634 - 9080:9080 635 networks: 636 - dgraph 637 deploy: 638 placement: 639 constraints: 640 - node.hostname == aws01 641 command: dgraph alpha --my=alpha1:7080 --lru_mb=2048 --zero=zero:5080 642 alpha2: 643 image: dgraph/dgraph:latest 644 hostname: "alpha2" 645 volumes: 646 - data-volume:/dgraph 647 ports: 648 - 8081:8081 649 - 9081:9081 650 networks: 651 - dgraph 652 deploy: 653 placement: 654 constraints: 655 - node.hostname == aws02 656 command: dgraph alpha --my=alpha2:7081 --lru_mb=2048 --zero=zero:5080 -o 1 657 alpha3: 658 image: dgraph/dgraph:latest 659 hostname: "alpha3" 660 volumes: 661 - data-volume:/dgraph 662 ports: 663 - 8082:8082 664 - 9082:9082 665 networks: 666 - dgraph 667 deploy: 668 placement: 669 constraints: 670 - node.hostname == aws03 671 command: dgraph alpha --my=alpha3:7082 --lru_mb=2048 --zero=zero:5080 -o 2 672 ratel: 673 image: dgraph/dgraph:latest 674 hostname: "ratel" 675 ports: 676 - 8000:8000 677 networks: 678 - dgraph 679 command: dgraph-ratel 680 volumes: 681 data-volume: 682 ``` 683 Run the following command on the Swarm leader to deploy the Dgraph Cluster. 684 685 ```sh 686 eval $(docker-machine env aws01) 687 docker stack deploy -c docker-compose.yml dgraph 688 ``` 689 690 This should run three Dgraph Alpha services (one on each VM because of the 691 constraint we have), one Dgraph Zero service on aws01 and one Dgraph Ratel. 692 693 These placement constraints (as seen in the compose file) are important so that 694 in case of restarting any containers, swarm places the respective Dgraph Alpha 695 or Zero containers on the same hosts to re-use the volumes. Also, if you are 696 running fewer than three hosts, make sure you use either different volumes or 697 run Dgraph Alpha with `-p p1 -w w1` options. 698 699 {{% notice "note" %}} 700 701 1. This setup would create and use a local volume called `dgraph_data-volume` on 702 the instances. If you plan to replace instances, you should use remote 703 storage like 704 [cloudstore](https://docs.docker.com/docker-for-aws/persistent-data-volumes) 705 instead of local disk. {{% /notice %}} 706 707 You can verify that all services were created successfully by running: 708 709 ```sh 710 docker service ls 711 ``` 712 713 Output: 714 ``` 715 ID NAME MODE REPLICAS IMAGE PORTS 716 vp5bpwzwawoe dgraph_ratel replicated 1/1 dgraph/dgraph:latest *:8000->8000/tcp 717 69oge03y0koz dgraph_alpha2 replicated 1/1 dgraph/dgraph:latest *:8081->8081/tcp,*:9081->9081/tcp 718 kq5yks92mnk6 dgraph_alpha3 replicated 1/1 dgraph/dgraph:latest *:8082->8082/tcp,*:9082->9082/tcp 719 uild5cqp44dz dgraph_zero replicated 1/1 dgraph/dgraph:latest *:5080->5080/tcp,*:6080->6080/tcp 720 v9jlw00iz2gg dgraph_alpha1 replicated 1/1 dgraph/dgraph:latest *:8080->8080/tcp,*:9080->9080/tcp 721 ``` 722 723 To stop the cluster run 724 725 ``` 726 docker stack rm dgraph 727 ``` 728 729 ### HA Cluster setup using Docker Swarm 730 731 Here is a sample swarm config for running 6 Dgraph Alpha nodes and 3 Zero nodes on 6 different 732 ec2 instances. Setup should be similar to [Cluster setup using Docker Swarm]({{< relref "#cluster-setup-using-docker-swarm" >}}) apart from a couple of differences. This setup would ensure replication with sharding of data. The file assumes that there are six hosts available as docker-machines. Also if you are running on fewer than six hosts, make sure you use either different volumes or run Dgraph Alpha with `-p p1 -w w1` options. 733 734 You would need to edit the `docker-machine` security group to open inbound traffic on the following ports. 735 736 1. Allow all inbound traffic on all ports with Source being `docker-machine` security ports so that 737 docker related communication can happen easily. 738 739 2. Also open inbound TCP traffic on the following ports required by Dgraph: `5080`, `8000`, `808[0-5]`, `908[0-5]`. Remember port *5080* is only required if you are running Dgraph Live Loader or Dgraph Bulk Loader from outside. You need to open `7080` to enable Alpha-to-Alpha communication in case you have not opened all ports in #1. 740 741 If you are on AWS, below is the security group (**docker-machine**) after necessary changes. 742 743 {{% load-img "/images/aws.png" "AWS Security Group" %}} 744 745 Copy the following file on your host machine and name it as docker-compose.yml 746 747 ```sh 748 version: "3" 749 networks: 750 dgraph: 751 services: 752 zero1: 753 image: dgraph/dgraph:latest 754 volumes: 755 - data-volume:/dgraph 756 ports: 757 - 5080:5080 758 - 6080:6080 759 networks: 760 - dgraph 761 deploy: 762 placement: 763 constraints: 764 - node.hostname == aws01 765 command: dgraph zero --my=zero1:5080 --replicas 3 --idx 1 766 zero2: 767 image: dgraph/dgraph:latest 768 volumes: 769 - data-volume:/dgraph 770 ports: 771 - 5081:5081 772 - 6081:6081 773 networks: 774 - dgraph 775 deploy: 776 placement: 777 constraints: 778 - node.hostname == aws02 779 command: dgraph zero -o 1 --my=zero2:5081 --replicas 3 --peer zero1:5080 --idx 2 780 zero_3: 781 image: dgraph/dgraph:latest 782 volumes: 783 - data-volume:/dgraph 784 ports: 785 - 5082:5082 786 - 6082:6082 787 networks: 788 - dgraph 789 deploy: 790 placement: 791 constraints: 792 - node.hostname == aws03 793 command: dgraph zero -o 2 --my=zero_3:5082 --replicas 3 --peer zero1:5080 --idx 3 794 alpha1: 795 image: dgraph/dgraph:latest 796 hostname: "alpha1" 797 volumes: 798 - data-volume:/dgraph 799 ports: 800 - 8080:8080 801 - 9080:9080 802 networks: 803 - dgraph 804 deploy: 805 replicas: 1 806 placement: 807 constraints: 808 - node.hostname == aws01 809 command: dgraph alpha --my=alpha1:7080 --lru_mb=2048 --zero=zero1:5080 810 alpha2: 811 image: dgraph/dgraph:latest 812 hostname: "alpha2" 813 volumes: 814 - data-volume:/dgraph 815 ports: 816 - 8081:8081 817 - 9081:9081 818 networks: 819 - dgraph 820 deploy: 821 replicas: 1 822 placement: 823 constraints: 824 - node.hostname == aws02 825 command: dgraph alpha --my=alpha2:7081 --lru_mb=2048 --zero=zero1:5080 -o 1 826 alpha3: 827 image: dgraph/dgraph:latest 828 hostname: "alpha3" 829 volumes: 830 - data-volume:/dgraph 831 ports: 832 - 8082:8082 833 - 9082:9082 834 networks: 835 - dgraph 836 deploy: 837 replicas: 1 838 placement: 839 constraints: 840 - node.hostname == aws03 841 command: dgraph alpha --my=alpha3:7082 --lru_mb=2048 --zero=zero1:5080 -o 2 842 alpha_4: 843 image: dgraph/dgraph:latest 844 hostname: "alpha_4" 845 volumes: 846 - data-volume:/dgraph 847 ports: 848 - 8083:8083 849 - 9083:9083 850 networks: 851 - dgraph 852 deploy: 853 placement: 854 constraints: 855 - node.hostname == aws04 856 command: dgraph alpha --my=alpha_4:7083 --lru_mb=2048 --zero=zero1:5080 -o 3 857 alpha_5: 858 image: dgraph/dgraph:latest 859 hostname: "alpha_5" 860 volumes: 861 - data-volume:/dgraph 862 ports: 863 - 8084:8084 864 - 9084:9084 865 networks: 866 - dgraph 867 deploy: 868 placement: 869 constraints: 870 - node.hostname == aws05 871 command: dgraph alpha --my=alpha_5:7084 --lru_mb=2048 --zero=zero1:5080 -o 4 872 alpha_6: 873 image: dgraph/dgraph:latest 874 hostname: "alpha_6" 875 volumes: 876 - data-volume:/dgraph 877 ports: 878 - 8085:8085 879 - 9085:9085 880 networks: 881 - dgraph 882 deploy: 883 placement: 884 constraints: 885 - node.hostname == aws06 886 command: dgraph alpha --my=alpha_6:7085 --lru_mb=2048 --zero=zero1:5080 -o 5 887 ratel: 888 image: dgraph/dgraph:latest 889 hostname: "ratel" 890 ports: 891 - 8000:8000 892 networks: 893 - dgraph 894 command: dgraph-ratel 895 volumes: 896 data-volume: 897 ``` 898 {{% notice "note" %}} 899 1. This setup assumes that you are using 6 hosts, but if you are running fewer than 6 hosts then you have to either use different volumes between Dgraph alphas or use `-p` & `-w` to configure data directories. 900 2. This setup would create and use a local volume called `dgraph_data-volume` on the instances. If you plan to replace instances, you should use remote storage like [cloudstore](https://docs.docker.com/docker-for-aws/persistent-data-volumes) instead of local disk. {{% /notice %}} 901 902 ## Using Kubernetes 903 904 The following section covers running Dgraph with Kubernetes. We have tested Dgraph with Kubernetes 1.14 to 1.15 on [GKE](https://cloud.google.com/kubernetes-engine) and [EKS](https://aws.amazon.com/eks/). 905 906 {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS configuration. 907 Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}} 908 909 * Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) which is used to deploy 910 and manage applications on kubernetes. 911 * Get the Kubernetes cluster up and running on a cloud provider of your choice. 912 * For Amazon [EKS](https://aws.amazon.com/eks/), you can use [eksctl](https://eksctl.io/) to quickly provision a new cluster. If you are new to this, Amazon has an article [Getting started with eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html). 913 * For Google Cloud [GKE](https://cloud.google.com/kubernetes-engine), you can use [Google Cloud SDK](https://cloud.google.com/sdk/install) and the `gcloud container clusters create` command to quickly provision a new cluster. 914 915 Verify that you have your cluster up and running using `kubectl get nodes`. If you used `eksctl` or `gcloud container clusters create` with the default options, you should have 2-3 worker nodes ready. 916 917 On Amazon [EKS](https://aws.amazon.com/eks/), you would see something like this: 918 919 ```sh 920 ➜ kubernetes git:(master) ✗ kubectl get nodes 921 NAME STATUS ROLES AGE VERSION 922 <aws-ip-hostname>.<region>.compute.internal Ready <none> 1m v1.15.11-eks-af3caf 923 <aws-ip-hostname>.<region>.compute.internal Ready <none> 1m v1.15.11-eks-af3caf 924 ``` 925 926 On Google Cloud [GKE](https://cloud.google.com/kubernetes-engine), you would see something like this: 927 928 ```sh 929 ➜ kubernetes git:(master) ✗ kubectl get nodes 930 NAME STATUS ROLES AGE VERSION 931 gke-<cluster-name>-default-pool-<gce-id> Ready <none> 41s v1.14.10-gke.36 932 gke-<cluster-name>-default-pool-<gce-id> Ready <none> 40s v1.14.10-gke.36 933 gke-<cluster-name>-default-pool-<gce-id> Ready <none> 41s v1.14.10-gke.36 934 ``` 935 936 ### Single Server 937 938 Once your Kubernetes cluster is up, you can use [dgraph-single.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml) to start a Zero, Alpha, and Ratel UI services. 939 940 #### Deploy Single Server 941 942 From your machine, run the following command to start a StatefulSet that creates a single Pod with Zero, Alpha, and Ratel UI running in it. 943 944 ```sh 945 kubectl create --filename https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml 946 ``` 947 948 Output: 949 ``` 950 service/dgraph-public created 951 statefulset.apps/dgraph created 952 ``` 953 954 #### Verify Single Server 955 956 Confirm that the pod was created successfully. 957 958 ```sh 959 kubectl get pods 960 ``` 961 962 Output: 963 ``` 964 NAME READY STATUS RESTARTS AGE 965 dgraph-0 3/3 Running 0 1m 966 ``` 967 968 {{% notice "tip" %}} 969 You can check the logs for the containers in the pod using 970 `kubectl logs --follow dgraph-0 <container_name>`. For example, try 971 `kubectl logs --follow dgraph-0 alpha` for server logs. 972 {{% /notice %}} 973 974 #### Test Single Server Setup 975 976 Port forward from your local machine to the pod 977 978 ```sh 979 kubectl port-forward pod/dgraph-0 8080:8080 980 kubectl port-forward pod/dgraph-0 8000:8000 981 ``` 982 983 Go to `http://localhost:8000` and verify Dgraph is working as expected. 984 985 #### Remove Single Server Resources 986 987 Delete all the resources 988 989 ```sh 990 kubectl delete --filename https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml 991 kubectl delete persistentvolumeclaims --selector app=dgraph 992 ``` 993 994 ### HA Cluster Setup Using Kubernetes 995 996 This setup allows you to run 3 Dgraph Alphas and 3 Dgraph Zeros. We start Zero with `--replicas 997 3` flag, so all data would be replicated on 3 Alphas and form 1 alpha group. 998 999 {{% notice "note" %}} Ideally you should have at least three worker nodes as part of your Kubernetes 1000 cluster so that each Dgraph Alpha runs on a separate worker node.{{% /notice %}} 1001 1002 #### Validate Kubernetes Cluster for HA 1003 1004 Check the nodes that are part of the Kubernetes cluster. 1005 1006 ```sh 1007 kubectl get nodes 1008 ``` 1009 1010 Output for Amazon [EKS](https://aws.amazon.com/eks/): 1011 1012 ```sh 1013 NAME STATUS ROLES AGE VERSION 1014 <aws-ip-hostname>.<region>.compute.internal Ready <none> 1m v1.15.11-eks-af3caf 1015 <aws-ip-hostname>.<region>.compute.internal Ready <none> 1m v1.15.11-eks-af3caf 1016 <aws-ip-hostname>.<region>.compute.internal Ready <none> 1m v1.15.11-eks-af3caf 1017 ``` 1018 1019 Output for Google Cloud [GKE](https://cloud.google.com/kubernetes-engine) 1020 1021 ```sh 1022 NAME STATUS ROLES AGE VERSION 1023 gke-<cluster-name>-default-pool-<gce-id> Ready <none> 41s v1.14.10-gke.36 1024 gke-<cluster-name>-default-pool-<gce-id> Ready <none> 40s v1.14.10-gke.36 1025 gke-<cluster-name>-default-pool-<gce-id> Ready <none> 41s v1.14.10-gke.36 1026 ``` 1027 1028 Once your Kubernetes cluster is up, you can use [dgraph-ha.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml) to start the cluster. 1029 1030 #### Deploy Dgraph HA Cluster 1031 1032 From your machine, run the following command to start the cluster. 1033 1034 ```sh 1035 kubectl create --filename https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml 1036 ``` 1037 1038 Output: 1039 ```sh 1040 service/dgraph-zero-public created 1041 service/dgraph-alpha-public created 1042 service/dgraph-ratel-public created 1043 service/dgraph-zero created 1044 service/dgraph-alpha created 1045 statefulset.apps/dgraph-zero created 1046 statefulset.apps/dgraph-alpha created 1047 deployment.apps/dgraph-ratel created 1048 ``` 1049 1050 #### Verify Dgraph HA Cluster 1051 1052 Confirm that the pods were created successfully. 1053 1054 It may take a few minutes for the pods to come up. 1055 1056 ```sh 1057 kubectl get pods 1058 ``` 1059 1060 Output: 1061 ```sh 1062 NAME READY STATUS RESTARTS AGE 1063 dgraph-alpha-0 1/1 Running 0 6m24s 1064 dgraph-alpha-1 1/1 Running 0 5m42s 1065 dgraph-alpha-2 1/1 Running 0 5m2s 1066 dgraph-ratel-<pod-id> 1/1 Running 0 6m23s 1067 dgraph-zero-0 1/1 Running 0 6m24s 1068 dgraph-zero-1 1/1 Running 0 5m41s 1069 dgraph-zero-2 1/1 Running 0 5m6s 1070 ``` 1071 1072 1073 {{% notice "tip" %}}You can check the logs for the containers in the pod using `kubectl logs --follow dgraph-alpha-0` and `kubectl logs --follow dgraph-zero-0`.{{% /notice %}} 1074 1075 #### Test Dgraph HA Cluster Setup 1076 1077 Port forward from your local machine to the pod 1078 1079 ```sh 1080 kubectl port-forward service/dgraph-alpha-public 8080:8080 1081 kubectl port-forward service/dgraph-ratel-public 8000:8000 1082 ``` 1083 1084 Go to `http://localhost:8000` and verify Dgraph is working as expected. 1085 1086 {{% notice "note" %}} You can also access the service on its External IP address.{{% /notice %}} 1087 1088 #### Delete Dgraph HA Cluster Resources 1089 1090 Delete all the resources 1091 1092 ```sh 1093 kubectl delete --filename https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml 1094 kubectl delete persistentvolumeclaims --selector app=dgraph-zero 1095 kubectl delete persistentvolumeclaims --selector app=dgraph-alpha 1096 ``` 1097 1098 ### Using Helm Chart 1099 1100 Once your Kubernetes cluster is up, you can make use of the Helm chart present 1101 [in our official helm repository here](https://github.com/dgraph-io/charts/) to bring 1102 up a Dgraph cluster. 1103 1104 {{% notice "note" %}}The instructions below are for Helm versions >= 3.x.{{% /notice %}} 1105 1106 #### Installing the Chart 1107 1108 To add the Dgraph helm repository: 1109 1110 ```sh 1111 helm repo add dgraph https://charts.dgraph.io 1112 ``` 1113 1114 To install the chart with the release name `my-release`: 1115 1116 ```sh 1117 helm install my-release dgraph/dgraph 1118 ``` 1119 1120 The above command will install the latest available dgraph docker image. In order to install the older versions: 1121 1122 ```sh 1123 helm install my-release dgraph/dgraph --set image.tag="latest" 1124 ``` 1125 1126 By default zero and alpha services are exposed only within the kubernetes cluster as 1127 kubernetes service type `ClusterIP`. In order to expose the alpha service publicly 1128 you can use kubernetes service type `LoadBalancer`: 1129 1130 ```sh 1131 helm install my-release dgraph/dgraph --set alpha.service.type="LoadBalancer" 1132 ``` 1133 1134 Similarly, you can expose alpha and ratel service to the internet as follows: 1135 1136 ```sh 1137 helm install my-release dgraph/dgraph --set alpha.service.type="LoadBalancer" --set ratel.service.type="LoadBalancer" 1138 ``` 1139 1140 #### Upgrading the Chart 1141 1142 You can update your cluster configuration by updating the configuration of the 1143 Helm chart. Dgraph is a stateful database that requires some attention on 1144 upgrading the configuration carefully in order to update your cluster to your 1145 desired configuration. 1146 1147 In general, you can use [`helm upgrade`][helm-upgrade] to update the 1148 configuration values of the cluster. Depending on your change, you may need to 1149 upgrade the configuration in multiple steps following the steps below. 1150 1151 [helm-upgrade]: https://helm.sh/docs/helm/helm_upgrade/ 1152 1153 **Upgrade to HA cluster setup** 1154 1155 To upgrade to an [HA cluster setup]({{< relref "#ha-cluster-setup" >}}), ensure 1156 that the shard replication setting is more than 1. When `zero.shardReplicaCount` 1157 is not set to an HA configuration (3 or 5), follow the steps below: 1158 1159 1. Set the shard replica flag on the Zero node group. For example: `zero.shardReplicaCount=3`. 1160 2. Next, run the Helm upgrade command to restart the Zero node group: 1161 ```sh 1162 helm upgrade my-release dgraph/dgraph [options] 1163 ``` 1164 3. Now set the Alpha replica count flag. For example: `alpha.replicaCount=3`. 1165 4. Finally, run the Helm upgrade command again: 1166 ```sh 1167 helm upgrade my-release dgraph/dgraph [options] 1168 ``` 1169 1170 1171 #### Deleting the Chart 1172 1173 Delete the Helm deployment as normal 1174 1175 ```sh 1176 helm delete my-release 1177 ``` 1178 Deletion of the StatefulSet doesn't cascade to deleting associated PVCs. To delete them: 1179 1180 ```sh 1181 kubectl delete pvc -l release=my-release,chart=dgraph 1182 ``` 1183 1184 #### Configuration 1185 1186 The following table lists the configurable parameters of the dgraph chart and their default values. 1187 1188 | Parameter | Description | Default | 1189 | ------------------------------------ | ------------------------------------------------------------------- | --------------------------------------------------- | 1190 | `image.registry` | Container registry name | `docker.io` | 1191 | `image.repository` | Container image name | `dgraph/dgraph` | 1192 | `image.tag` | Container image tag | `latest` | 1193 | `image.pullPolicy` | Container pull policy | `Always` | 1194 | `zero.name` | Zero component name | `zero` | 1195 | `zero.updateStrategy` | Strategy for upgrading zero nodes | `RollingUpdate` | 1196 | `zero.monitorLabel` | Monitor label for zero, used by prometheus. | `zero-dgraph-io` | 1197 | `zero.rollingUpdatePartition` | Partition update strategy | `nil` | 1198 | `zero.podManagementPolicy` | Pod management policy for zero nodes | `OrderedReady` | 1199 | `zero.replicaCount` | Number of zero nodes | `3` | 1200 | `zero.shardReplicaCount` | Max number of replicas per data shard | `5` | 1201 | `zero.terminationGracePeriodSeconds` | Zero server pod termination grace period | `60` | 1202 | `zero.antiAffinity` | Zero anti-affinity policy | `soft` | 1203 | `zero.podAntiAffinitytopologyKey` | Anti affinity topology key for zero nodes | `kubernetes.io/hostname` | 1204 | `zero.nodeAffinity` | Zero node affinity policy | `{}` | 1205 | `zero.service.type` | Zero node service type | `ClusterIP` | 1206 | `zero.securityContext.enabled` | Security context for zero nodes enabled | `false` | 1207 | `zero.securityContext.fsGroup` | Group id of the zero container | `1001` | 1208 | `zero.securityContext.runAsUser` | User ID for the zero container | `1001` | 1209 | `zero.persistence.enabled` | Enable persistence for zero using PVC | `true` | 1210 | `zero.persistence.storageClass` | PVC Storage Class for zero volume | `nil` | 1211 | `zero.persistence.accessModes` | PVC Access Mode for zero volume | `ReadWriteOnce` | 1212 | `zero.persistence.size` | PVC Storage Request for zero volume | `8Gi` | 1213 | `zero.nodeSelector` | Node labels for zero pod assignment | `{}` | 1214 | `zero.tolerations` | Zero tolerations | `[]` | 1215 | `zero.resources` | Zero node resources requests & limits | `{}` | 1216 | `zero.livenessProbe` | Zero liveness probes | `See values.yaml for defaults` | 1217 | `zero.readinessProbe` | Zero readiness probes | `See values.yaml for defaults` | 1218 | `alpha.name` | Alpha component name | `alpha` | 1219 | `alpha.updateStrategy` | Strategy for upgrading alpha nodes | `RollingUpdate` | 1220 | `alpha.monitorLabel` | Monitor label for alpha, used by prometheus. | `alpha-dgraph-io` | 1221 | `alpha.rollingUpdatePartition` | Partition update strategy | `nil` | 1222 | `alpha.podManagementPolicy` | Pod management policy for alpha nodes | `OrderedReady` | 1223 | `alpha.replicaCount` | Number of alpha nodes | `3` | 1224 | `alpha.terminationGracePeriodSeconds`| Alpha server pod termination grace period | `60` | 1225 | `alpha.antiAffinity` | Alpha anti-affinity policy | `soft` | 1226 | `alpha.podAntiAffinitytopologyKey` | Anti affinity topology key for zero nodes | `kubernetes.io/hostname` | 1227 | `alpha.nodeAffinity` | Alpha node affinity policy | `{}` | 1228 | `alpha.service.type` | Alpha node service type | `ClusterIP` | 1229 | `alpha.securityContext.enabled` | Security context for alpha nodes enabled | `false` | 1230 | `alpha.securityContext.fsGroup` | Group id of the alpha container | `1001` | 1231 | `alpha.securityContext.runAsUser` | User ID for the alpha container | `1001` | 1232 | `alpha.persistence.enabled` | Enable persistence for alpha using PVC | `true` | 1233 | `alpha.persistence.storageClass` | PVC Storage Class for alpha volume | `nil` | 1234 | `alpha.persistence.accessModes` | PVC Access Mode for alpha volume | `ReadWriteOnce` | 1235 | `alpha.persistence.size` | PVC Storage Request for alpha volume | `8Gi` | 1236 | `alpha.nodeSelector` | Node labels for alpha pod assignment | `{}` | 1237 | `alpha.tolerations` | Alpha tolerations | `[]` | 1238 | `alpha.resources` | Alpha node resources requests & limits | `{}` | 1239 | `alpha.livenessProbe` | Alpha liveness probes | `See values.yaml for defaults` | 1240 | `alpha.readinessProbe` | Alpha readiness probes | `See values.yaml for defaults` | 1241 | `ratel.name` | Ratel component name | `ratel` | 1242 | `ratel.replicaCount` | Number of ratel nodes | `1` | 1243 | `ratel.service.type` | Ratel service type | `ClusterIP` | 1244 | `ratel.securityContext.enabled` | Security context for ratel nodes enabled | `false` | 1245 | `ratel.securityContext.fsGroup` | Group id of the ratel container | `1001` | 1246 | `ratel.securityContext.runAsUser` | User ID for the ratel container | `1001` | 1247 | `ratel.livenessProbe` | Ratel liveness probes | `See values.yaml for defaults` | 1248 | `ratel.readinessProbe` | Ratel readiness probes | `See values.yaml for defaults` | 1249 1250 ### Monitoring in Kubernetes 1251 1252 Dgraph exposes prometheus metrics to monitor the state of various components involved in the cluster, this includes dgraph alpha and zero. 1253 1254 Follow the below mentioned steps to setup prometheus monitoring for your cluster: 1255 1256 * Install Prometheus operator: 1257 1258 ```sh 1259 kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.34/bundle.yaml 1260 ``` 1261 1262 * Ensure that the instance of `prometheus-operator` has started before continuing. 1263 1264 ```sh 1265 $ kubectl get deployments prometheus-operator 1266 NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE 1267 prometheus-operator 1 1 1 1 3m 1268 ``` 1269 1270 * Apply prometheus manifest present [here](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/prometheus.yaml). 1271 1272 ```sh 1273 $ kubectl apply -f prometheus.yaml 1274 1275 serviceaccount/prometheus-dgraph-io created 1276 clusterrole.rbac.authorization.k8s.io/prometheus-dgraph-io created 1277 clusterrolebinding.rbac.authorization.k8s.io/prometheus-dgraph-io created 1278 servicemonitor.monitoring.coreos.com/alpha.dgraph-io created 1279 servicemonitor.monitoring.coreos.com/zero-dgraph-io created 1280 prometheus.monitoring.coreos.com/dgraph-io created 1281 ``` 1282 1283 To view prometheus UI locally run: 1284 1285 ```sh 1286 kubectl port-forward prometheus-dgraph-io-0 9090:9090 1287 ``` 1288 1289 The UI is accessible at port 9090. Open http://localhost:9090 in your browser to play around. 1290 1291 To register alerts from dgraph cluster with your prometheus deployment follow the steps below: 1292 1293 * Create a kubernetes secret containing alertmanager configuration. Edit the configuration file present [here](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/alertmanager-config.yaml) 1294 with the required reciever configuration including the slack webhook credential and create the secret. 1295 1296 You can find more information about alertmanager configuration [here](https://prometheus.io/docs/alerting/configuration/). 1297 1298 ```sh 1299 $ kubectl create secret generic alertmanager-alertmanager-dgraph-io --from-file=alertmanager.yaml=alertmanager-config.yaml 1300 1301 $ kubectl get secrets 1302 NAME TYPE DATA AGE 1303 alertmanager-alertmanager-dgraph-io Opaque 1 87m 1304 ``` 1305 1306 * Apply the [alertmanager](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/alertmanager.yaml) along with [alert-rules](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/alert-rules.yaml) manifest 1307 to use the default configured alert configuration. You can also add custom rules based on the metrics exposed by dgraph cluster similar to [alert-rules](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/alert-rules.yaml) 1308 manifest. 1309 1310 ```sh 1311 $ kubectl apply -f alertmanager.yaml 1312 alertmanager.monitoring.coreos.com/alertmanager-dgraph-io created 1313 service/alertmanager-dgraph-io created 1314 1315 $ kubectl apply -f alert-rules.yaml 1316 prometheusrule.monitoring.coreos.com/prometheus-rules-dgraph-io created 1317 ``` 1318 1319 ### Kubernetes Storage 1320 1321 The Kubernetes configurations in the previous sections were configured to run 1322 Dgraph with any storage type (`storage-class: anything`). On the common cloud 1323 environments like AWS, GCP, and Azure, the default storage type are slow disks 1324 like hard disks or low IOPS SSDs. We highly recommend using faster disks for 1325 ideal performance when running Dgraph. 1326 1327 #### Local storage 1328 1329 The AWS storage-optimized i-class instances provide locally attached NVMe-based 1330 SSD storage which provide consistent very high IOPS. The Dgraph team uses 1331 i3.large instances on AWS to test Dgraph. 1332 1333 You can create a Kubernetes `StorageClass` object to provision a specific type 1334 of storage volume which you can then attach to your Dgraph pods. You can set up 1335 your cluster with local SSDs by using [Local Persistent 1336 Volumes](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/). 1337 This Kubernetes feature is in beta at the time of this writing (Kubernetes 1338 v1.13.1). You can first set up an EC2 instance with locally attached storage. 1339 Once it is formatted and mounted properly, then you can create a StorageClass to 1340 access it.: 1341 1342 ```yaml 1343 apiVersion: storage.k8s.io/v1 1344 kind: StorageClass 1345 metadata: 1346 name: <your-local-storage-class-name> 1347 provisioner: kubernetes.io/no-provisioner 1348 volumeBindingMode: WaitForFirstConsumer 1349 ``` 1350 1351 Currently, Kubernetes does not allow automatic provisioning of local storage. So 1352 a PersistentVolume with a specific mount path should be created: 1353 1354 ```yaml 1355 apiVersion: v1 1356 kind: PersistentVolume 1357 metadata: 1358 name: <your-local-pv-name> 1359 spec: 1360 capacity: 1361 storage: 475Gi 1362 volumeMode: Filesystem 1363 accessModes: 1364 - ReadWriteOnce 1365 persistentVolumeReclaimPolicy: Delete 1366 storageClassName: <your-local-storage-class-name> 1367 local: 1368 path: /data 1369 nodeAffinity: 1370 required: 1371 nodeSelectorTerms: 1372 - matchExpressions: 1373 - key: kubernetes.io/hostname 1374 operator: In 1375 values: 1376 - <node-name> 1377 ``` 1378 1379 Then, in the StatefulSet configuration you can claim this local storage in 1380 .spec.volumeClaimTemplate: 1381 1382 ``` 1383 kind: StatefulSet 1384 ... 1385 volumeClaimTemplates: 1386 - metadata: 1387 name: datadir 1388 spec: 1389 accessModes: 1390 - ReadWriteOnce 1391 storageClassName: <your-local-storage-class-name> 1392 resources: 1393 requests: 1394 storage: 500Gi 1395 ``` 1396 1397 You can repeat these steps for each instance that's configured with local 1398 node storage. 1399 1400 #### Non-local persistent disks 1401 1402 EBS volumes on AWS and PDs on GCP are persistent disks that can be configured 1403 with Dgraph. The disk performance is much lower than locally attached storage 1404 but can be sufficient for your workload such as testing environments. 1405 1406 When using EBS volumes on AWS, we recommend using Provisioned IOPS SSD EBS 1407 volumes (the io1 disk type) which provide consistent IOPS. The available IOPS 1408 for AWS EBS volumes is based on the total disk size. With Kubernetes, you can 1409 request io1 disks to be provisioned with this config with 50 IOPS/GB using the 1410 `iopsPerGB` parameter: 1411 1412 ``` 1413 kind: StorageClass 1414 apiVersion: storage.k8s.io/v1 1415 metadata: 1416 name: <your-storage-class-name> 1417 provisioner: kubernetes.io/aws-ebs 1418 parameters: 1419 type: io1 1420 iopsPerGB: "50" 1421 fsType: ext4 1422 ``` 1423 1424 Example: Requesting a disk size of 250Gi with this storage class would provide 1425 12.5K IOPS. 1426 1427 ### Removing a Dgraph Pod 1428 1429 In the event that you need to completely remove a pod (e.g., its disk got 1430 corrupted and data cannot be recovered), you can use the `/removeNode` API to 1431 remove the node from the cluster. With a Kubernetes StatefulSet, you'll need to 1432 remove the node in this order: 1433 1434 1. Call `/removeNode` to remove the Dgraph instance from the cluster (see [More 1435 about Dgraph Zero]({{< relref "#more-about-dgraph-zero" >}})). The removed 1436 instance will immediately stop running. Any further attempts to join the 1437 cluster will fail for that instance since it has been removed. 1438 2. Remove the PersistentVolumeClaim associated with the pod to delete its data. 1439 This prepares the pod to join with a clean state. 1440 3. Restart the pod. This will create a new PersistentVolumeClaim to create new 1441 data directories. 1442 1443 When an Alpha pod restarts in a replicated cluster, it will join as a new member 1444 of the cluster, be assigned a group and an unused index from Zero, and receive 1445 the latest snapshot from the Alpha leader of the group. 1446 1447 When a Zero pod restarts, it must join the existing group with an unused index 1448 ID. The index ID is set with the `--idx` flag. This may require the StatefulSet 1449 configuration to be updated. 1450 1451 ### Kubernetes and Bulk Loader 1452 1453 You may want to initialize a new cluster with an existing data set such as data 1454 from the [Dgraph Bulk Loader]({{< relref "#bulk-loader" >}}). You can use [Init 1455 Containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) 1456 to copy the data to the pod volume before the Alpha process runs. 1457 1458 See the `initContainers` configuration in 1459 [dgraph-ha.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml) 1460 to learn more. 1461 1462 ## More about Dgraph Alpha 1463 1464 On its HTTP port, a Dgraph Alpha exposes a number of admin endpoints. 1465 1466 * `/health` returns HTTP status code 200 if the worker is running, HTTP 503 otherwise. 1467 * `/admin/shutdown` initiates a proper [shutdown]({{< relref "#shutdown">}}) of the Alpha. 1468 * `/admin/export` initiates a data [export]({{< relref "#export">}}). 1469 1470 By default the Alpha listens on `localhost` for admin actions (the loopback address only accessible from the same machine). The `--bindall=true` option binds to `0.0.0.0` and thus allows external connections. 1471 1472 {{% notice "tip" %}}Set max file descriptors to a high value like 10000 if you are going to load a lot of data.{{% /notice %}} 1473 1474 ### More about /health endpoint 1475 1476 The `/health` endpoint of Dgraph Alpha returns HTTP status 200 with a JSON consisting of basic information about the running worker. 1477 1478 Here’s an example of JSON returned from `/health` endpoint: 1479 1480 ```json 1481 { 1482 "version": "v1.1.1", 1483 "instance": "alpha", 1484 "uptime": 75011100974 1485 } 1486 ``` 1487 1488 - `version`: Version of Dgraph running the Alpha server. 1489 - `instance`: Name of the instance. Always set to `alpha`. 1490 - `uptime`: Time in nanoseconds since the Alpha server is up and running. 1491 1492 ## More about Dgraph Zero 1493 1494 Dgraph Zero controls the Dgraph cluster. It automatically moves data between 1495 different Dgraph Alpha instances based on the size of the data served by each Alpha instance. 1496 1497 It is mandatory to run at least one `dgraph zero` node before running any `dgraph alpha`. 1498 Options present for `dgraph zero` can be seen by running `dgraph zero --help`. 1499 1500 * Zero stores information about the cluster. 1501 * `--replicas` is the option that controls the replication factor. (i.e. number of replicas per data shard, including the original shard) 1502 * When a new Alpha joins the cluster, it is assigned a group based on the replication factor. If the replication factor is 1 then each Alpha node will serve different group. If replication factor is 2 and you launch 4 Alphas, then first two Alphas would serve group 1 and next two machines would serve group 2. 1503 * Zero also monitors the space occupied by predicates in each group and moves them around to rebalance the cluster. 1504 1505 Like Alpha, Zero also exposes HTTP on 6080 (+ any `--port_offset`). You can query (**GET** request) it 1506 to see useful information, like the following: 1507 1508 * `/state` Information about the nodes that are part of the cluster. Also contains information about 1509 size of predicates and groups they belong to. 1510 * `/assign?what=uids&num=100` This would allocate `num` uids and return a JSON map 1511 containing `startId` and `endId`, both inclusive. This id range can be safely assigned 1512 externally to new nodes during data ingestion. 1513 * `/assign?what=timestamps&num=100` This would request timestamps from Zero. 1514 This is useful to fast forward Zero state when starting from a postings 1515 directory, which already has commits higher than Zero's leased timestamp. 1516 * `/removeNode?id=3&group=2` If a replica goes down and can't be recovered, you 1517 can remove it and add a new node to the quorum. This endpoint can be used to 1518 remove a dead Zero or Dgraph Alpha node. To remove dead Zero nodes, pass 1519 `group=0` and the id of the Zero node. 1520 1521 {{% notice "note" %}} 1522 Before using the API ensure that the node is down and ensure that it doesn't come back up ever again. 1523 1524 You should not use the same `idx` of a node that was removed earlier. 1525 {{% /notice %}} 1526 1527 * `/moveTablet?tablet=name&group=2` This endpoint can be used to move a tablet to a group. Zero 1528 already does shard rebalancing every 8 mins, this endpoint can be used to force move a tablet. 1529 1530 1531 These are the **POST** endpoints available: 1532 1533 * `/enterpriseLicense` Use endpoint to apply an enterprise license to the cluster by supplying it 1534 as part of the body. 1535 1536 ### More about /state endpoint 1537 1538 The `/state` endpoint of Dgraph Zero returns a JSON document of the current group membership info: 1539 1540 - Instances which are part of the cluster. 1541 - Number of instances in Zero group and each Alpha groups. 1542 - Current leader of each group. 1543 - Predicates that belong to a group. 1544 - Estimated size in bytes of each predicate. 1545 - Enterprise license information. 1546 - Max Leased transaction ID. 1547 - Max Leased UID. 1548 - CID (Cluster ID). 1549 1550 Here’s an example of JSON returned from `/state` endpoint for a 6-node Dgraph cluster with three replicas: 1551 1552 ```json 1553 { 1554 "counter": "15", 1555 "groups": { 1556 "1": { 1557 "members": { 1558 "1": { 1559 "id": "1", 1560 "groupId": 1, 1561 "addr": "alpha1:7080", 1562 "leader": true, 1563 "lastUpdate": "1576112366" 1564 }, 1565 "2": { 1566 "id": "2", 1567 "groupId": 1, 1568 "addr": "alpha2:7080" 1569 }, 1570 "3": { 1571 "id": "3", 1572 "groupId": 1, 1573 "addr": "alpha3:7080" 1574 } 1575 }, 1576 "tablets": { 1577 "counter.val": { 1578 "groupId": 1, 1579 "predicate": "counter.val" 1580 }, 1581 "dgraph.type": { 1582 "groupId": 1, 1583 "predicate": "dgraph.type" 1584 } 1585 }, 1586 "checksum": "1021598189643258447" 1587 } 1588 }, 1589 "zeros": { 1590 "1": { 1591 "id": "1", 1592 "addr": "zero1:5080", 1593 "leader": true 1594 }, 1595 "2": { 1596 "id": "2", 1597 "addr": "zero2:5080" 1598 }, 1599 "3": { 1600 "id": "3", 1601 "addr": "zero3:5080" 1602 } 1603 }, 1604 "maxLeaseId": "10000", 1605 "maxTxnTs": "10000", 1606 "cid": "3602537a-ee49-43cb-9792-c766eea683dc", 1607 "license": { 1608 "maxNodes": "18446744073709551615", 1609 "expiryTs": "1578704367", 1610 "enabled": true 1611 } 1612 } 1613 ``` 1614 1615 Here’s the information the above JSON document provides: 1616 1617 - Group 0 1618 - members 1619 - zero1:5080, id: 1, leader 1620 - zero2:5080, id: 2 1621 - zero3:5080, id: 3 1622 - Group 1 1623 - members 1624 - alpha1:7080, id: 1, leader 1625 - alpha2:7080, id: 2 1626 - alpha3:7080, id: 3 1627 - predicates 1628 - dgraph.type 1629 - counter.val 1630 - Enterprise license 1631 - Enabled 1632 - maxNodes: unlimited 1633 - License expires on Friday, January 10, 2020 4:59:27 PM GMT-08:00 (converted from epoch timestamp) 1634 - Other data: 1635 - maxTxnTs 1636 - The current max lease of transaction timestamps used to hand out start timestamps 1637 and commit timestamps. 1638 - This increments in batches of 10,000 IDs. Once the max lease is reached, another 1639 10,000 IDs are leased. In the event that the Zero leader is lost, then the new 1640 leader starts a brand new lease from maxTxnTs+1 . Any lost transaction IDs 1641 in-between will never be used. 1642 - An admin can use the Zero endpoint HTTP GET `/assign?what=timestamps&num=1000` to 1643 increase the current transaction timestamp (in this case, by 1000). This is mainly 1644 useful in special-case scenarios, e.g., using an existing p directory to a fresh 1645 cluster in order to be able to query the latest data in the DB. 1646 - maxLeaseId 1647 - The current max lease of UIDs used for blank node UID assignment. 1648 - This increments in batches of 10,000 IDs. Once the max lease is reached, another 1649 10,000 IDs are leased. In the event that the Zero leader is lost, the new leader 1650 starts a brand new lease from maxLeaseId+1. Any UIDs lost in-between will never 1651 be used for blank-node UID assignment. 1652 - An admin can use the Zero endpoint HTTP GET `/assign?what=uids&num=1000` to 1653 reserve a range of UIDs (in this case, 1000) to use externally (Zero will NEVER 1654 use these UIDs for blank node UID assignment, so the user can use the range 1655 to assign UIDs manually to their own data sets. 1656 - CID 1657 - This is a unique UUID representing the *cluster-ID* for this cluster. It is generated 1658 during the initial DB startup and is retained across restarts. 1659 - Group checksum 1660 - This is the checksum verification of the data per Alpha group. This is used internally 1661 to verify group memberships in the event of a tablet move. 1662 1663 {{% notice "note" %}} 1664 "tablet", "predicate", and "edge" are synonymous terms today. The future plan to 1665 improve data scalability is to shard a predicate into separate tablets that could 1666 be assigned to different groups. 1667 {{% /notice %}} 1668 1669 ## TLS configuration 1670 1671 {{% notice "note" %}} 1672 This section refers to the `dgraph cert` command which was introduced in v1.0.9. For previous releases, see the previous [TLS configuration documentation](https://docs.dgraph.io/v1.0.7/deploy/#tls-configuration). 1673 {{% /notice %}} 1674 1675 1676 Connections between client and server can be secured with TLS. Password protected private keys are **not supported**. 1677 1678 {{% notice "tip" %}}If you're generating encrypted private keys with `openssl`, be sure to specify encryption algorithm explicitly (like `-aes256`). This will force `openssl` to include `DEK-Info` header in private key, which is required to decrypt the key by Dgraph. When default encryption is used, `openssl` doesn't write that header and key can't be decrypted.{{% /notice %}} 1679 1680 ### Dgraph Certificate Management Tool 1681 1682 The `dgraph cert` program creates and manages CA-signed certificates and private keys using a generated Dgraph Root CA. The `dgraph cert` command simplifies certificate management for you. 1683 1684 ```sh 1685 # To see the available flags. 1686 $ dgraph cert --help 1687 1688 # Create Dgraph Root CA, used to sign all other certificates. 1689 $ dgraph cert 1690 1691 # Create node certificate and private key 1692 $ dgraph cert -n localhost 1693 1694 # Create client certificate and private key for mTLS (mutual TLS) 1695 $ dgraph cert -c dgraphuser 1696 1697 # Combine all in one command 1698 $ dgraph cert -n localhost -c dgraphuser 1699 1700 # List all your certificates and keys 1701 $ dgraph cert ls 1702 ``` 1703 1704 #### File naming conventions 1705 1706 To enable TLS you must specify the directory path to find certificates and keys. The default location where the _cert_ command stores certificates (and keys) is `tls` under the Dgraph working directory; where the data files are found. The default dir path can be overridden using the `--dir` option. 1707 1708 ```sh 1709 $ dgraph cert --dir ~/mycerts 1710 ``` 1711 1712 The following file naming conventions are used by Dgraph for proper TLS setup. 1713 1714 | File name | Description | Use | 1715 |-----------|-------------|-------| 1716 | ca.crt | Dgraph Root CA certificate | Verify all certificates | 1717 | ca.key | Dgraph CA private key | Validate CA certificate | 1718 | node.crt | Dgraph node certificate | Shared by all nodes for accepting TLS connections | 1719 | node.key | Dgraph node private key | Validate node certificate | 1720 | client._name_.crt | Dgraph client certificate | Authenticate a client _name_ | 1721 | client._name_.key | Dgraph client private key | Validate _name_ client certificate | 1722 1723 The Root CA certificate is used for verifying node and client certificates, if changed you must regenerate all certificates. 1724 1725 For client authentication, each client must have their own certificate and key. These are then used to connect to the Dgraph node(s). 1726 1727 The node certificate `node.crt` can support multiple node names using multiple host names and/or IP address. Just separate the names with commas when generating the certificate. 1728 1729 ```sh 1730 $ dgraph cert -n localhost,104.25.165.23,dgraph.io,2400:cb00:2048:1::6819:a417 1731 ``` 1732 1733 {{% notice "tip" %}}You must delete the old node cert and key before you can generate a new pair.{{% /notice %}} 1734 1735 {{% notice "note" %}}When using host names for node certificates, including _localhost_, your clients must connect to the matching host name -- such as _localhost_ not 127.0.0.1. If you need to use IP addresses, then add them to the node certificate.{{% /notice %}} 1736 1737 #### Certificate inspection 1738 1739 The command `dgraph cert ls` lists all certificates and keys in the `--dir` directory (default 'tls'), along with details to inspect and validate cert/key pairs. 1740 1741 Example of command output: 1742 1743 ```sh 1744 -rw-r--r-- ca.crt - Dgraph Root CA certificate 1745 Issuer: Dgraph Labs, Inc. 1746 S/N: 043c4d8fdd347f06 1747 Expiration: 02 Apr 29 16:56 UTC 1748 SHA-256 Digest: 4A2B0F0F 716BF5B6 C603E01A 6229D681 0B2AFDC5 CADF5A0D 17D59299 116119E5 1749 1750 -r-------- ca.key - Dgraph Root CA key 1751 SHA-256 Digest: 4A2B0F0F 716BF5B6 C603E01A 6229D681 0B2AFDC5 CADF5A0D 17D59299 116119E5 1752 1753 -rw-r--r-- client.admin.crt - Dgraph client certificate: admin 1754 Issuer: Dgraph Labs, Inc. 1755 CA Verify: PASSED 1756 S/N: 297e4cb4f97c71f9 1757 Expiration: 03 Apr 24 17:29 UTC 1758 SHA-256 Digest: D23EFB61 DE03C735 EB07B318 DB70D471 D3FE8556 B15D084C 62675857 788DF26C 1759 1760 -rw------- client.admin.key - Dgraph Client key 1761 SHA-256 Digest: D23EFB61 DE03C735 EB07B318 DB70D471 D3FE8556 B15D084C 62675857 788DF26C 1762 1763 -rw-r--r-- node.crt - Dgraph Node certificate 1764 Issuer: Dgraph Labs, Inc. 1765 CA Verify: PASSED 1766 S/N: 795ff0e0146fdb2d 1767 Expiration: 03 Apr 24 17:00 UTC 1768 Hosts: 104.25.165.23, 2400:cb00:2048:1::6819:a417, localhost, dgraph.io 1769 SHA-256 Digest: 7E243ED5 3286AE71 B9B4E26C 5B2293DA D3E7F336 1B1AFFA7 885E8767 B1A84D28 1770 1771 -rw------- node.key - Dgraph Node key 1772 SHA-256 Digest: 7E243ED5 3286AE71 B9B4E26C 5B2293DA D3E7F336 1B1AFFA7 885E8767 B1A84D28 1773 ``` 1774 1775 Important points: 1776 1777 * The cert/key pairs should always have matching SHA-256 digests. Otherwise, the cert(s) must be 1778 regenerated. If the Root CA pair differ, all cert/key must be regenerated; the flag `--force` 1779 can help. 1780 * All certificates must pass Dgraph CA verification. 1781 * All key files should have the least access permissions, especially the `ca.key`, but be readable. 1782 * Key files won't be overwritten if they have limited access, even with `--force`. 1783 * Node certificates are only valid for the hosts listed. 1784 * Client certificates are only valid for the named client/user. 1785 1786 ### TLS Options 1787 1788 The following configuration options are available for Alpha: 1789 1790 * `--tls_dir string` - TLS dir path; this enables TLS connections (usually 'tls'). 1791 * `--tls_use_system_ca` - Include System CA with Dgraph Root CA. 1792 * `--tls_client_auth string` - TLS client authentication used to validate client connection. See [Client Authentication Options](#client-authentication-options) for details. 1793 1794 Dgraph Live Loader can be configured with the following options: 1795 1796 * `--tls_cacert string` - Dgraph Root CA, such as `./tls/ca.crt` 1797 * `--tls_use_system_ca` - Include System CA with Dgraph Root CA. 1798 * `--tls_cert` - User cert file provided by the client to Alpha 1799 * `--tls_key` - User private key file provided by the client to Alpha 1800 * `--tls_server_name string` - Server name, used for validating the server's TLS host name. 1801 1802 1803 #### Using TLS without Client Authentication 1804 1805 For TLS without client authentication, you can configure certificates and run Alpha server using the following: 1806 1807 ```sh 1808 # First, create rootca and node certificates and private keys 1809 $ dgraph cert -n localhost 1810 # Default use for enabling TLS server (after generating certificates and private keys) 1811 $ dgraph alpha --tls_dir tls 1812 ``` 1813 1814 You can then run Dgraph live loader using the following: 1815 1816 ```sh 1817 # Now, connect to server using TLS 1818 $ dgraph live --tls_cacert ./tls/ca.crt --tls_server_name "localhost" -s 21million.schema -f 21million.rdf.gz 1819 ``` 1820 1821 #### Using TLS with Client Authentication 1822 1823 If you do require Client Authentication (Mutual TLS), you can configure certificates and run Alpha server using the following: 1824 1825 ```sh 1826 # First, create a rootca, node, and client certificates and private keys 1827 $ dgraph cert -n localhost -c dgraphuser 1828 # Default use for enabling TLS server with client authentication (after generating certificates and private keys) 1829 $ dgraph alpha --tls_dir tls --tls_client_auth="REQUIREANDVERIFY" 1830 ``` 1831 1832 You can then run Dgraph live loader using the following: 1833 1834 ```sh 1835 # Now, connect to server using mTLS (mutual TLS) 1836 $ dgraph live \ 1837 --tls_cacert ./tls/ca.crt \ 1838 --tls_cert ./tls/client.dgraphuser.crt \ 1839 --tls_key ./tls/client.dgraphuser.key \ 1840 --tls_server_name "localhost" \ 1841 -s 21million.schema \ 1842 -f 21million.rdf.gz 1843 ``` 1844 1845 #### Client Authentication Options 1846 1847 The server will always **request** Client Authentication. There are four different values for the `--tls_client_auth` option that change the security policy of the client certificate. 1848 1849 | Value | Client Cert/Key | Client Certificate Verified | 1850 |--------------------|-----------------|--------------------| 1851 | `REQUEST` | optional | Client certificate is not VERIFIED if provided. (least secure) | 1852 | `REQUIREANY` | required | Client certificate is never VERIFIED | 1853 | `VERIFYIFGIVEN` | optional | Client certificate is VERIFIED if provided (default) | 1854 | `REQUIREANDVERIFY` | required | Client certificate is always VERIFIED (most secure) | 1855 1856 {{% notice "note" %}}REQUIREANDVERIFY is the most secure but also the most difficult to configure for remote clients. When using this value, the value of `--tls_server_name` is matched against the certificate SANs values and the connection host.{{% /notice %}} 1857 1858 ### Using Ratel UI with Client authentication 1859 1860 Ratel UI (and any other JavaScript clients built on top of `dgraph-js-http`) 1861 connect to Dgraph servers via HTTP, when TLS is enabled servers begin to expect 1862 HTTPS requests only. Therefore some adjustments need to be made. 1863 1864 If the `--tls_client_auth` option is set to `REQUEST` (default) or 1865 `VERIFYIFGIVEN`: 1866 1. Change the connection URL from `http://` to `https://` (e.g. `https://127.0.0.1:8080`). 1867 2. Install / make trusted the certificate of the Dgraph certificate authority `ca.crt`. Refer to the documentation of your OS / browser for instructions. 1868 (E.g. on Mac OS this means adding `ca.crt` to the KeyChain and making it trusted 1869 for `Secure Socket Layer`). 1870 1871 For `REQUIREANY` and `REQUIREANDVERIFY` you need to follow the steps above and 1872 also need to install client certificate on your OS / browser: 1873 1874 1. Generate a client certificate: `dgraph -c MyLaptop`. 1875 2. Convert it to a `.p12` file: 1876 `openssl pkcs12 -export -out MyLaptopCert.p12 -in tls/client.MyLaptop.crt -inkey tls/client.MyLaptop.key`. Use any password you like for export. 1877 3. Install the generated `MyLaptopCert.p12` file on the client system 1878 (on Mac OS this means simply double-click the file in Finder). 1879 4. Next time you use Ratel to connect to an alpha with Client authentication 1880 enabled the browser will prompt you for a client certificate to use. Select the 1881 certificate you've just installed in the step above and queries/mutations will 1882 succeed. 1883 1884 ### Using Curl with Client authentication 1885 1886 When TLS is enabled, `curl` requests to Dgraph will need some specific options to work. For instance (for an export request): 1887 1888 ``` 1889 curl --silent --cacert ./tls/ca.crt https://localhost:8080/admin/export 1890 ``` 1891 1892 If you are using `curl` with [Client Authentication](#client-authentication-options) set to `REQUIREANY` or `REQUIREANDVERIFY`, you will need to provide the client certificate and private key. For instance (for an export request): 1893 1894 <<<<<<< HEAD 1895 ``` 1896 curl --cacert ./tls/ca.crt --cert ./tls/node.crt --key ./tls/node.key https://localhost:8080/admin/export 1897 ======= 1898 ``` 1899 curl --silent --cacert ./tls/ca.crt --cert ./tls/client.dgraphuser.crt --key ./tls/client.dgraphuser.key https://localhost:8080/admin/export 1900 >>>>>>> cd454e58a... mTLS/TLS documentation fixes (#5382) 1901 ``` 1902 1903 Refer to the `curl` documentation for further information on its TLS options. 1904 1905 ### Access Data Using a Client 1906 1907 Some examples of connecting via a [Client](/clients) when TLS is in use can be found below: 1908 1909 - [dgraph4j](https://github.com/dgraph-io/dgraph4j#creating-a-secure-client-using-tls) 1910 - [dgraph-js](https://github.com/dgraph-io/dgraph-js/tree/master/examples/tls) 1911 - [dgo](https://github.com/dgraph-io/dgraph/blob/master/tlstest/acl/acl_over_tls_test.go) 1912 - [pydgraph](https://github.com/dgraph-io/pydgraph/tree/master/examples/tls) 1913 1914 ### Troubleshooting Ratel's Client authentication 1915 1916 If you are getting errors in Ratel when server's TLS is enabled try opening 1917 your alpha URL as a webpage. 1918 1919 Assuming you are running Dgraph on your local machine, opening 1920 `https://localhost:8080/` in browser should produce a message `Dgraph browser is available for running separately using the dgraph-ratel binary`. 1921 1922 In case you are getting a connection error, try not passing the 1923 `--tls_client_auth` flag when starting an alpha. If you are still getting an 1924 error, check that your hostname is correct and the port is open; then make sure 1925 that "Dgraph Root CA" certificate is installed and trusted correctly. 1926 1927 After that, if things work without `--tls_client_auth` but stop working when 1928 `REQUIREANY` and `REQUIREANDVERIFY` is set make sure the `.p12` file is 1929 installed correctly. 1930 1931 ## Cluster Checklist 1932 1933 In setting up a cluster be sure the check the following. 1934 1935 * Is at least one Dgraph Zero node running? 1936 * Is each Dgraph Alpha instance in the cluster set up correctly? 1937 * Will each Dgraph Alpha instance be accessible to all peers on 7080 (+ any port offset)? 1938 * Does each instance have a unique ID on startup? 1939 * Has `--bindall=true` been set for networked communication? 1940 1941 ## Fast Data Loading 1942 1943 There are two different tools that can be used for fast data loading: 1944 1945 - `dgraph live` runs the Dgraph Live Loader 1946 - `dgraph bulk` runs the Dgraph Bulk Loader 1947 1948 {{% notice "note" %}} Both tools only accept [RDF N-Quad/Triple 1949 data](https://www.w3.org/TR/n-quads/) or JSON in plain or gzipped format. Data 1950 in other formats must be converted.{{% /notice %}} 1951 1952 ### Live Loader 1953 1954 Dgraph Live Loader (run with `dgraph live`) is a small helper program which reads RDF N-Quads from a gzipped file, batches them up, creates mutations (using the go client) and shoots off to Dgraph. 1955 1956 Dgraph Live Loader correctly handles assigning unique IDs to blank nodes across multiple files, and can optionally persist them to disk to save memory, in case the loader was re-run. 1957 1958 {{% notice "note" %}} Dgraph Live Loader can optionally write the xid->uid mapping to a directory specified using the `-x` flag, which can reused 1959 given that live loader completed successfully in the previous run.{{% /notice %}} 1960 1961 ```sh 1962 $ dgraph live --help # To see the available flags. 1963 1964 # Read RDFs or JSON from the passed file, and send them to Dgraph on localhost:9080. 1965 $ dgraph live -f <path-to-gzipped-RDF-or-JSON-file> 1966 1967 # Read multiple RDFs or JSON from the passed path, and send them to Dgraph on localhost:9080. 1968 $ dgraph live -f <./path-to-gzipped-RDF-or-JSON-files> 1969 1970 # Read multiple files strictly by name. 1971 $ dgraph live -f <file1.rdf, file2.rdf> 1972 1973 # Use compressed gRPC connections to and from Dgraph. 1974 $ dgraph live -C -f <path-to-gzipped-RDF-or-JSON-file> 1975 1976 # Read RDFs and a schema file and send to Dgraph running at given address. 1977 $ dgraph live -f <path-to-gzipped-RDf-or-JSON-file> -s <path-to-schema-file> -a <dgraph-alpha-address:grpc_port> -z <dgraph-zero-address:grpc_port> 1978 ``` 1979 1980 #### Encrypted imports via Live Loader 1981 1982 A new flag keyfile is added to the Live Loader. This option is required to decrypt the encrypted export data and schema files. Once the export files are decrypted, the Live Loader streams the data to a live Alpha instance. 1983 1984 {{% notice "note" %}} 1985 If the live Alpha instance has encryption turned on, the `p` directory will be encrypted. Otherwise, the `p` directory is unencrypted. 1986 {{% /notice %}} 1987 1988 #### Encrypted RDF/JSON file and schema via Live Loader 1989 `dgraph live -f <path-to-encrypted-gzipped-RDF-or-JSON-file> -s <path-to-encrypted-schema> -keyfile <path-to-keyfile-to-decrypt-files>` 1990 1991 #### Other Live Loader options 1992 1993 `--new_uids` (default: false): Assign new UIDs instead of using the existing 1994 UIDs in data files. This is useful to avoid overriding the data in a DB already 1995 in operation. 1996 1997 `-f, --files`: Location of *.rdf(.gz) or *.json(.gz) file(s) to load. It can 1998 load multiple files in a given path. If the path is a directory, then all files 1999 ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded. 2000 2001 `--format`: Specify file format (rdf or json) instead of getting it from 2002 filenames. This is useful if you need to define a strict format manually. 2003 2004 `-b, --batch` (default: 1000): Number of N-Quads to send as part of a mutation. 2005 2006 `-c, --conc` (default: 10): Number of concurrent requests to make to Dgraph. 2007 Do not confuse with `-C`. 2008 2009 `-C, --use_compression` (default: false): Enable compression for connections to and from the 2010 Alpha server. 2011 2012 `-a, --alpha` (default: `localhost:9080`): Dgraph Alpha gRPC server address to connect for live loading. This can be a comma-separated list of Alphas addresses in the same cluster to distribute the load, e.g., `"alpha:grpc_port,alpha2:grpc_port,alpha3:grpc_port"`. 2013 2014 ### Bulk Loader 2015 2016 {{% notice "note" %}} 2017 It's crucial to tune the bulk loader's flags to get good performance. See the 2018 section below for details. 2019 {{% /notice %}} 2020 2021 Dgraph Bulk Loader serves a similar purpose to the Dgraph Live Loader, but can 2022 only be used to load data into a new cluster. It cannot be run on an existing 2023 Dgraph cluster. Dgraph Bulk Loader is **considerably faster** than the Dgraph 2024 Live Loader and is the recommended way to perform the initial import of large 2025 datasets into Dgraph. 2026 2027 Only one or more Dgraph Zeros should be running for bulk loading. Dgraph Alphas 2028 will be started later. 2029 2030 {{% notice "warning" %}} 2031 Don't use bulk loader once the Dgraph cluster is up and running. Use it to import 2032 your existing data to a new cluster. 2033 {{% /notice %}} 2034 2035 You can [read some technical details](https://blog.dgraph.io/post/bulkloader/) 2036 about the bulk loader on the blog. 2037 2038 See [Fast Data Loading]({{< relref "#fast-data-loading" >}}) for more info about 2039 the expected N-Quads format. 2040 2041 **Reduce shards**: Before running the bulk load, you need to decide how many 2042 Alpha groups will be running when the cluster starts. The number of Alpha groups 2043 will be the same number of reduce shards you set with the `--reduce_shards` 2044 flag. For example, if your cluster will run 3 Alpha with 3 replicas per group, 2045 then there is 1 group and `--reduce_shards` should be set to 1. If your cluster 2046 will run 6 Alphas with 3 replicas per group, then there are 2 groups and 2047 `--reduce_shards` should be set to 2. 2048 2049 **Map shards**: The `--map_shards` option must be set to at least what's set for 2050 `--reduce_shards`. A higher number helps the bulk loader evenly distribute 2051 predicates between the reduce shards. 2052 2053 ```sh 2054 $ dgraph bulk -f goldendata.rdf.gz -s goldendata.schema --map_shards=4 --reduce_shards=2 --http localhost:8000 --zero=localhost:5080 2055 ``` 2056 ``` 2057 { 2058 "DataFiles": "goldendata.rdf.gz", 2059 "DataFormat": "", 2060 "SchemaFile": "goldendata.schema", 2061 "DgraphsDir": "out", 2062 "TmpDir": "tmp", 2063 "NumGoroutines": 4, 2064 "MapBufSize": 67108864, 2065 "ExpandEdges": true, 2066 "SkipMapPhase": false, 2067 "CleanupTmp": true, 2068 "NumShufflers": 1, 2069 "Version": false, 2070 "StoreXids": false, 2071 "ZeroAddr": "localhost:5080", 2072 "HttpAddr": "localhost:8000", 2073 "IgnoreErrors": false, 2074 "MapShards": 4, 2075 "ReduceShards": 2 2076 } 2077 The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See `man ulimit` for details of how to change the limit. 2078 Current max open files limit: 1024 2079 MAP 01s rdf_count:176.0 rdf_speed:174.4/sec edge_count:564.0 edge_speed:558.8/sec 2080 MAP 02s rdf_count:399.0 rdf_speed:198.5/sec edge_count:1.291k edge_speed:642.4/sec 2081 MAP 03s rdf_count:666.0 rdf_speed:221.3/sec edge_count:2.164k edge_speed:718.9/sec 2082 MAP 04s rdf_count:952.0 rdf_speed:237.4/sec edge_count:3.014k edge_speed:751.5/sec 2083 MAP 05s rdf_count:1.327k rdf_speed:264.8/sec edge_count:4.243k edge_speed:846.7/sec 2084 MAP 06s rdf_count:1.774k rdf_speed:295.1/sec edge_count:5.720k edge_speed:951.5/sec 2085 MAP 07s rdf_count:2.375k rdf_speed:338.7/sec edge_count:7.607k edge_speed:1.085k/sec 2086 MAP 08s rdf_count:3.697k rdf_speed:461.4/sec edge_count:11.89k edge_speed:1.484k/sec 2087 MAP 09s rdf_count:71.98k rdf_speed:7.987k/sec edge_count:225.4k edge_speed:25.01k/sec 2088 MAP 10s rdf_count:354.8k rdf_speed:35.44k/sec edge_count:1.132M edge_speed:113.1k/sec 2089 MAP 11s rdf_count:610.5k rdf_speed:55.39k/sec edge_count:1.985M edge_speed:180.1k/sec 2090 MAP 12s rdf_count:883.9k rdf_speed:73.52k/sec edge_count:2.907M edge_speed:241.8k/sec 2091 MAP 13s rdf_count:1.108M rdf_speed:85.10k/sec edge_count:3.653M edge_speed:280.5k/sec 2092 MAP 14s rdf_count:1.121M rdf_speed:79.93k/sec edge_count:3.695M edge_speed:263.5k/sec 2093 MAP 15s rdf_count:1.121M rdf_speed:74.61k/sec edge_count:3.695M edge_speed:246.0k/sec 2094 REDUCE 16s [1.69%] edge_count:62.61k edge_speed:62.61k/sec plist_count:29.98k plist_speed:29.98k/sec 2095 REDUCE 17s [18.43%] edge_count:681.2k edge_speed:651.7k/sec plist_count:328.1k plist_speed:313.9k/sec 2096 REDUCE 18s [33.28%] edge_count:1.230M edge_speed:601.1k/sec plist_count:678.9k plist_speed:331.8k/sec 2097 REDUCE 19s [45.70%] edge_count:1.689M edge_speed:554.4k/sec plist_count:905.9k plist_speed:297.4k/sec 2098 REDUCE 20s [60.94%] edge_count:2.252M edge_speed:556.5k/sec plist_count:1.278M plist_speed:315.9k/sec 2099 REDUCE 21s [93.21%] edge_count:3.444M edge_speed:681.5k/sec plist_count:1.555M plist_speed:307.7k/sec 2100 REDUCE 22s [100.00%] edge_count:3.695M edge_speed:610.4k/sec plist_count:1.778M plist_speed:293.8k/sec 2101 REDUCE 22s [100.00%] edge_count:3.695M edge_speed:584.4k/sec plist_count:1.778M plist_speed:281.3k/sec 2102 Total: 22s 2103 ``` 2104 2105 The output will be generated in the `out` directory by default. Here's the bulk 2106 load output from the example above: 2107 2108 ```sh 2109 $ tree ./out 2110 ``` 2111 ``` 2112 ./out 2113 ├── 0 2114 │ └── p 2115 │ ├── 000000.vlog 2116 │ ├── 000002.sst 2117 │ └── MANIFEST 2118 └── 1 2119 └── p 2120 ├── 000000.vlog 2121 ├── 000002.sst 2122 └── MANIFEST 2123 2124 4 directories, 6 files 2125 ``` 2126 2127 Because `--reduce_shards` was set to 2, there are two sets of p directories: one 2128 in `./out/0` directory and another in the `./out/1` directory. 2129 2130 Once the output is created, they can be copied to all the servers that will run 2131 Dgraph Alphas. Each Dgraph Alpha must have its own copy of the group's p 2132 directory output. Each replica of the first group should have its own copy of 2133 `./out/0/p`, each replica of the second group should have its own copy of 2134 `./out/1/p`, and so on. 2135 2136 ```sh 2137 $ dgraph bulk --help # To see the available flags. 2138 2139 # Read RDFs or JSON from the passed file. 2140 $ dgraph bulk -f <path-to-gzipped-RDF-or-JSON-file> ... 2141 2142 # Read multiple RDFs or JSON from the passed path. 2143 $ dgraph bulk -f <./path-to-gzipped-RDF-or-JSON-files> ... 2144 2145 # Read multiple files strictly by name. 2146 $ dgraph bulk -f <file1.rdf, file2.rdf> ... 2147 2148 ``` 2149 2150 #### Encryption at rest with Bulk Loader 2151 2152 Even before the Dgraph cluster starts, we can load data using Bulk Loader with the encryption feature turned on. Later we can point the generated `p` directory to a new Alpha server. 2153 2154 Here's an example to run Bulk Loader with a key used to write encrypted data: 2155 2156 ```bash 2157 dgraph bulk --encryption_key_file ./enc_key_file -f data.json.gz -s data.schema --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080 2158 ``` 2159 2160 #### Encrypting imports via Bulk Loader 2161 2162 The Bulk Loader’s `encryption_key_file` option was previously used to encrypt the output `p ` directory. This same option will also be used to decrypt the encrypted export data and schema files. 2163 2164 Another option, `--encrypted`, indicates whether the input `rdf`/`json` data and schema files are encrypted or not. With this switch, we support the use case of migrating data from unencrypted exports to encrypted import. 2165 2166 So, with the above two options we have 4 cases: 2167 2168 1. `--encrypted=true` and no `encryption_key_file`. 2169 2170 Error: If the input is encrypted, a key file must be provided. 2171 2172 2. `--encrypted=true` and `encryption_key_file`=`path to key. 2173 2174 Input is encrypted and output `p` dir is encrypted as well. 2175 2176 3. `--encrypted=false` and no `encryption_key_file`. 2177 2178 Input is not encrypted and the output `p` dir is also not encrypted. 2179 2180 4. `--encrypted=false` and `encryption_key_file`=`path to key`. 2181 2182 Input is not encrypted but the output is encrypted. (This is the migration use case mentioned above). 2183 2184 #### Other Bulk Loader options 2185 2186 `--new_uids` (default: false): Assign new UIDs instead of using the existing 2187 UIDs in data files. This is useful to avoid overriding the data in a DB already 2188 in operation. 2189 2190 `-f, --files`: Location of *.rdf(.gz) or *.json(.gz) file(s) to load. It can 2191 load multiple files in a given path. If the path is a directory, then all files 2192 ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded. 2193 2194 `--format`: Specify file format (rdf or json) instead of getting it from 2195 filenames. This is useful if you need to define a strict format manually. 2196 2197 #### Tuning & monitoring 2198 2199 ##### Performance Tuning 2200 2201 {{% notice "tip" %}} 2202 We highly recommend [disabling swap 2203 space](https://askubuntu.com/questions/214805/how-do-i-disable-swap) when 2204 running Bulk Loader. It is better to fix the parameters to decrease memory 2205 usage, than to have swapping grind the loader down to a halt. 2206 {{% /notice %}} 2207 2208 Flags can be used to control the behaviour and performance characteristics of 2209 the bulk loader. You can see the full list by running `dgraph bulk --help`. In 2210 particular, **the flags should be tuned so that the bulk loader doesn't use more 2211 memory than is available as RAM**. If it starts swapping, it will become 2212 incredibly slow. 2213 2214 **In the map phase**, tweaking the following flags can reduce memory usage: 2215 2216 - The `--num_go_routines` flag controls the number of worker threads. Lowering reduces memory 2217 consumption. 2218 2219 - The `--mapoutput_mb` flag controls the size of the map output files. Lowering 2220 reduces memory consumption. 2221 2222 For bigger datasets and machines with many cores, gzip decoding can be a 2223 bottleneck during the map phase. Performance improvements can be obtained by 2224 first splitting the RDFs up into many `.rdf.gz` files (e.g. 256MB each). This 2225 has a negligible impact on memory usage. 2226 2227 **The reduce phase** is less memory heavy than the map phase, although can still 2228 use a lot. Some flags may be increased to improve performance, *but only if 2229 you have large amounts of RAM*: 2230 2231 - The `--reduce_shards` flag controls the number of resultant Dgraph alpha instances. 2232 Increasing this increases memory consumption, but in exchange allows for 2233 higher CPU utilization. 2234 2235 - The `--map_shards` flag controls the number of separate map output shards. 2236 Increasing this increases memory consumption but balances the resultant 2237 Dgraph alpha instances more evenly. 2238 2239 - The `--shufflers` controls the level of parallelism in the shuffle/reduce 2240 stage. Increasing this increases memory consumption. 2241 2242 ## Monitoring 2243 Dgraph exposes metrics via the `/debug/vars` endpoint in json format and the `/debug/prometheus_metrics` endpoint in Prometheus's text-based format. Dgraph doesn't store the metrics and only exposes the value of the metrics at that instant. You can either poll this endpoint to get the data in your monitoring systems or install **[Prometheus](https://prometheus.io/docs/introduction/install/)**. Replace targets in the below config file with the ip of your Dgraph instances and run prometheus using the command `prometheus -config.file my_config.yaml`. 2244 ```sh 2245 scrape_configs: 2246 - job_name: "dgraph" 2247 metrics_path: "/debug/prometheus_metrics" 2248 scrape_interval: "2s" 2249 static_configs: 2250 - targets: 2251 - 172.31.9.133:6080 #For Dgraph zero, 6080 is the http endpoint exposing metrics. 2252 - 172.31.15.230:8080 2253 - 172.31.0.170:8080 2254 - 172.31.8.118:8080 2255 ``` 2256 2257 {{% notice "note" %}} 2258 Raw data exported by Prometheus is available via `/debug/prometheus_metrics` endpoint on Dgraph alphas. 2259 {{% /notice %}} 2260 2261 Install **[Grafana](http://docs.grafana.org/installation/)** to plot the metrics. Grafana runs at port 3000 in default settings. Create a prometheus datasource by following these **[steps](https://prometheus.io/docs/visualization/grafana/#creating-a-prometheus-data-source)**. Import **[grafana_dashboard.json](https://github.com/dgraph-io/benchmarks/blob/master/scripts/grafana_dashboard.json)** by following this **[link](http://docs.grafana.org/reference/export_import/#importing-a-dashboard)**. 2262 2263 ## Metrics 2264 2265 Dgraph metrics follow the [metric and label conventions for 2266 Prometheus](https://prometheus.io/docs/practices/naming/). 2267 2268 ### Disk Metrics 2269 2270 The disk metrics let you track the disk activity of the Dgraph process. Dgraph does not interact 2271 directly with the filesystem. Instead it relies on [Badger](https://github.com/dgraph-io/badger) to 2272 read from and write to disk. 2273 2274 Metrics | Description 2275 ------- | ----------- 2276 `badger_v2_disk_reads_total` | Total count of disk reads in Badger. 2277 `badger_v2_disk_writes_total` | Total count of disk writes in Badger. 2278 `badger_v2_gets_total` | Total count of calls to Badger's `get`. 2279 `badger_v2_memtable_gets_total` | Total count of memtable accesses to Badger's `get`. 2280 `badger_v2_puts_total` | Total count of calls to Badger's `put`. 2281 `badger_v2_read_bytes` | Total bytes read from Badger. 2282 `badger_v2_written_bytes` | Total bytes written to Badger. 2283 2284 ### Memory Metrics 2285 2286 The memory metrics let you track the memory usage of the Dgraph process. The idle and inuse metrics 2287 gives you a better sense of the active memory usage of the Dgraph process. The process memory metric 2288 shows the memory usage as measured by the operating system. 2289 2290 By looking at all three metrics you can see how much memory a Dgraph process is holding from the 2291 operating system and how much is actively in use. 2292 2293 Metrics | Description 2294 ------- | ----------- 2295 `dgraph_memory_idle_bytes` | Estimated amount of memory that is being held idle that could be reclaimed by the OS. 2296 `dgraph_memory_inuse_bytes` | Total memory usage in bytes (sum of heap usage and stack usage). 2297 `dgraph_memory_proc_bytes` | Total memory usage in bytes of the Dgraph process. On Linux/macOS, this metric is equivalent to resident set size. On Windows, this metric is equivalent to [Go's runtime.ReadMemStats](https://golang.org/pkg/runtime/#ReadMemStats). 2298 2299 ### Activity Metrics 2300 2301 The activity metrics let you track the mutations, queries, and proposals of an Dgraph instance. 2302 2303 Metrics | Description 2304 ------- | ----------- 2305 `dgraph_goroutines_total` | Total number of Goroutines currently running in Dgraph. 2306 `dgraph_active_mutations_total` | Total number of mutations currently running. 2307 `dgraph_pending_proposals_total` | Total pending Raft proposals. 2308 `dgraph_pending_queries_total` | Total number of queries in progress. 2309 `dgraph_num_queries_total` | Total number of queries run in Dgraph. 2310 2311 ### Health Metrics 2312 2313 The health metrics let you track to check the availability of an Dgraph Alpha instance. 2314 2315 Metrics | Description 2316 ------- | ----------- 2317 `dgraph_alpha_health_status` | **Only applicable to Dgraph Alpha**. Value is 1 when the Alpha is ready to accept requests; otherwise 0. 2318 2319 ### Go Metrics 2320 2321 Go's built-in metrics may also be useful to measure for memory usage and garbage collection time. 2322 2323 Metrics | Description 2324 ------- | ----------- 2325 `go_memstats_gc_cpu_fraction` | The fraction of this program's available CPU time used by the GC since the program started. 2326 `go_memstats_heap_idle_bytes` | Number of heap bytes waiting to be used. 2327 `go_memstats_heap_inuse_bytes` | Number of heap bytes that are in use. 2328 2329 ## Tracing 2330 2331 Dgraph is integrated with [OpenCensus](https://opencensus.io/zpages/) to collect distributed traces from the Dgraph cluster. 2332 2333 Trace data is always collected within Dgraph. You can adjust the trace sampling rate for Dgraph queries with the `--trace` option for Dgraph Alphas. By default, `--trace` is set to 1 to trace 100% of queries. 2334 2335 ### Examining Traces with zPages 2336 2337 The most basic way to view traces is with the integrated trace pages. 2338 2339 OpenCensus's [zPages](https://opencensus.io/zpages/) are accessible via the Zero or Alpha HTTP port at `/z/tracez`. 2340 2341 ### Examining Traces with Jaeger 2342 2343 Jaeger collects distributed traces and provides a UI to view and query traces across different services. This provides the necessary observability to figure out what is happening in the system. 2344 2345 Dgraph can be configured to send traces directly to a Jaeger collector with the `--jaeger.collector` flag. For example, if the Jaeger collector is running on `http://localhost:14268`, then pass the flag to the Dgraph Zero and Dgraph Alpha instances as `--jaeger.collector=http://localhost:14268`. 2346 2347 See [Jaeger's Getting Started docs](https://www.jaegertracing.io/docs/getting-started/) to get up and running with Jaeger. 2348 2349 #### Setting up multiple Dgraph clusters with Jaeger 2350 2351 Jaeger allows you to examine traces from multiple Dgraph clusters. To do this, use the `--collector.tags` on a Jaeger collector to set custom trace tags. For example, run one collector with `--collector.tags env=qa` and then another collector with `--collector.tags env=dev`. In Dgraph, set the `--jaeger.collector` flag in the Dgraph QA cluster to the first collector and the flag in the Dgraph Dev cluster to the second collector. 2352 You can run multiple Jaeger collector components for the same single Jaeger backend (e.g., many Jaeger collectors to a single Cassandra backend). This is still a single Jaeger installation but with different collectors customizing the tags per environment. 2353 2354 Once you have this configured, you can filter by tags in the Jaeger UI. Filter traces by tags matching `env=dev`: 2355 2356 {{% load-img "/images/jaeger-ui.png" "Jaeger UI" %}} 2357 2358 Every trace has your custom tags set under the “Process” section of each span: 2359 2360 {{% load-img "/images/jaeger-server-query.png" "Jaeger Query" %}} 2361 2362 Filter traces by tags matching `env=qa`: 2363 2364 {{% load-img "/images/jaeger-json.png" "Jaeger JSON" %}} 2365 2366 {{% load-img "/images/jaeger-server-query-2.png" "Jaeger Query Result" %}} 2367 2368 For more information, check out [Jaeger's Deployment Guide](https://www.jaegertracing.io/docs/deployment/). 2369 2370 ## Dgraph Administration 2371 2372 Each Dgraph Alpha exposes administrative operations over HTTP to export data and to perform a clean shutdown. 2373 2374 ### Whitelist Admin Operations 2375 2376 By default, admin operations can only be initiated from the machine on which the Dgraph Alpha runs. 2377 You can use the `--whitelist` option to specify whitelisted IP addresses and ranges for hosts from which admin operations can be initiated. 2378 2379 ```sh 2380 dgraph alpha --whitelist 172.17.0.0:172.20.0.0,192.168.1.1 --lru_mb <one-third RAM> ... 2381 ``` 2382 This would allow admin operations from hosts with IP between `172.17.0.0` and `172.20.0.0` along with 2383 the server which has IP address as `192.168.1.1`. 2384 2385 ### Restrict Mutation Operations 2386 2387 By default, you can perform mutation operations for any predicate. 2388 If the predicate in mutation doesn't exist in the schema, 2389 the predicate gets added to the schema with an appropriate 2390 [Dgraph Type](https://docs.dgraph.io/master/query-language/#schema-types). 2391 2392 You can use `--mutations disallow` to disable all mutations, 2393 which is set to `allow` by default. 2394 2395 ```sh 2396 dgraph alpha --mutations disallow 2397 ``` 2398 2399 Enforce a strict schema by setting `--mutations strict`. 2400 This mode allows mutations only on predicates already in the schema. 2401 Before performing a mutation on a predicate that doesn't exist in the schema, 2402 you need to perform an alter operation with that predicate and its schema type. 2403 2404 ```sh 2405 dgraph alpha --mutations strict 2406 ``` 2407 2408 ### Secure Alter Operations 2409 2410 Clients can use alter operations to apply schema updates and drop particular or all predicates from the database. 2411 By default, all clients are allowed to perform alter operations. 2412 You can configure Dgraph to only allow alter operations when the client provides a specific token. 2413 This can be used to prevent clients from making unintended or accidental schema updates or predicate drops. 2414 2415 You can specify the auth token with the `--auth_token` option for each Dgraph Alpha in the cluster. 2416 Clients must include the same auth token to make alter requests. 2417 2418 ```sh 2419 $ dgraph alpha --lru_mb=2048 --auth_token=<authtokenstring> 2420 ``` 2421 2422 ```sh 2423 $ curl -s localhost:8080/alter -d '{ "drop_all": true }' 2424 # Permission denied. No token provided. 2425 ``` 2426 2427 ```sh 2428 $ curl -s -H 'X-Dgraph-AuthToken: <wrongsecret>' localhost:8180/alter -d '{ "drop_all": true }' 2429 # Permission denied. Incorrect token. 2430 ``` 2431 2432 ```sh 2433 $ curl -H 'X-Dgraph-AuthToken: <authtokenstring>' localhost:8180/alter -d '{ "drop_all": true }' 2434 # Success. Token matches. 2435 ``` 2436 2437 {{% notice "note" %}} 2438 To fully secure alter operations in the cluster, the auth token must be set for every Alpha. 2439 {{% /notice %}} 2440 2441 2442 ### Export Database 2443 2444 An export of all nodes is started by locally accessing the export endpoint of any Alpha in the cluster. 2445 2446 ```sh 2447 $ curl localhost:8080/admin/export 2448 ``` 2449 {{% notice "warning" %}}By default, this won't work if called from outside the server where the Dgraph Alpha is running. 2450 You can specify a list or range of whitelisted IP addresses from which export or other admin operations 2451 can be initiated using the `--whitelist` flag on `dgraph alpha`. 2452 {{% /notice %}} 2453 2454 This also works from a browser, provided the HTTP GET is being run from the same server where the Dgraph alpha instance is running. 2455 2456 This triggers an export for all Alpha groups of the cluster. The data is exported from the following Dgraph instances: 2457 2458 1. For the Alpha instance that receives the GET request, the group's export data is stored with this Alpha. 2459 2. For every other group, its group's export data is stored with the Alpha leader of that group. 2460 2461 It is up to the user to retrieve the right export files from the Alphas in the 2462 cluster. Dgraph does not copy all files to the Alpha that initiated the export. 2463 The user must also ensure that there is sufficient space on disk to store the 2464 export. 2465 2466 Each Alpha leader for a group writes output as a gzipped file to the export 2467 directory specified via the `--export` flag (defaults to a directory called `"export"`). If any of the groups fail, the 2468 entire export process is considered failed and an error is returned. 2469 2470 The data is exported in RDF format by default. A different output format may be specified with the 2471 `format` URL parameter. For example: 2472 2473 ```sh 2474 $ curl 'localhost:8080/admin/export?format=json' 2475 ``` 2476 2477 Currently, "rdf" and "json" are the only formats supported. 2478 2479 #### Encrypting Exports 2480 2481 Export is available wherever an Alpha is running. To encrypt an export, the Alpha must be configured with the `encryption-key-file`. 2482 2483 {{% notice "note" %}} 2484 The `encryption-key-file` was used for `encryption-at-rest` and will now also be used for encrypted backups and exports. 2485 {{% /notice %}} 2486 2487 ### Shutdown Database 2488 2489 To shutdown a Dgraph cluster, shutdown all its Alpha and Zero nodes. This can be done in different ways, 2490 depending on how Dgraph was started (e.g. sending a `SIGTERM` to the processes, or using `systemctl stop service-name` 2491 if you are using systemd). 2492 2493 A clean exit of a single Dgraph Alpha node can be initiated by running the following command on that node. 2494 {{% notice "warning" %}}This won't work if called from outside the server where Dgraph is running. 2495 You can specify a list or range of whitelisted IP addresses from which shutdown or other admin operations 2496 can be initiated using the `--whitelist` flag on `dgraph alpha`. 2497 {{% /notice %}} 2498 2499 ```sh 2500 $ curl localhost:8080/admin/shutdown 2501 ``` 2502 2503 This stops the Alpha on which the command is executed and not the entire cluster. 2504 2505 ### Delete database 2506 2507 Individual triples, patterns of triples and predicates can be deleted as described in the [query languge docs](/query-language#delete). 2508 2509 To drop all data, you could send a `DropAll` request via `/alter` endpoint. 2510 2511 Alternatively, you could: 2512 2513 * [Shutdown Dgraph]({{< relref "#shutdown-database" >}}) and wait for all writes to complete, 2514 * Delete (maybe do an export first) the `p` and `w` directories, then 2515 * Restart Dgraph. 2516 2517 ### Upgrade Database 2518 2519 Doing periodic exports is always a good idea. This is particularly useful if you wish to upgrade Dgraph or reconfigure the sharding of a cluster. The following are the right steps to safely export and restart. 2520 2521 1. Start an [export]({{< relref "#export-database">}}) 2522 2. Ensure it is successful 2523 3. [Shutdown Dgraph]({{< relref "#shutdown-database" >}}) and wait for all writes to complete 2524 4. Start a new Dgraph cluster using new data directories (this can be done by passing empty directories to the options `-p` and `-w` for Alphas and `-w` for Zeros) 2525 5. Reload the data via [bulk loader]({{< relref "#bulk-loader" >}}) 2526 6. Verify the correctness of the new Dgraph cluster. If all looks good, you can delete the old directories (export serves as an insurance) 2527 2528 These steps are necessary because Dgraph's underlying data format could have changed, and reloading the export avoids encoding incompatibilities. 2529 2530 Blue-green deployment is a common approach to minimize downtime during the upgrade process. 2531 This approach involves switching your application to read-only mode. To make sure that no mutations are executed during the maintenance window you can 2532 do a rolling restart of all your Alpha using the option `--mutations disallow` when you restart the Alphas. This will ensure the cluster is in read-only mode. 2533 2534 At this point your application can still read from the old cluster and you can perform the steps 4. and 5. described above. 2535 When the new cluster (that uses the upgraded version of Dgraph) is up and running, you can point your application to it, and shutdown the old cluster. 2536 2537 {{% notice "note" %}} 2538 If you are upgrading from v1.0, please make sure you follow the schema migration steps described in [this section](/howto/#schema-types-scalar-uid-and-list-uid). 2539 {{% /notice %}} 2540 2541 ### Post Installation 2542 2543 Now that Dgraph is up and running, to understand how to add and query data to Dgraph, follow [Query Language Spec](/query-language). Also, have a look at [Frequently asked questions](/faq). 2544 2545 ## Troubleshooting 2546 2547 Here are some problems that you may encounter and some solutions to try. 2548 2549 #### Running OOM (out of memory) 2550 2551 During bulk loading of data, Dgraph can consume more memory than usual, due to high volume of writes. That's generally when you see the OOM crashes. 2552 2553 The recommended minimum RAM to run on desktops and laptops is 16GB. Dgraph can take up to 7-8 GB with the default setting `--lru_mb` set to 4096; so having the rest 8GB for desktop applications should keep your machine humming along. 2554 2555 On EC2/GCE instances, the recommended minimum is 8GB. It's recommended to set `--lru_mb` to one-third of RAM size. 2556 2557 You could also decrease memory usage of Dgraph by setting `--badger.vlog=disk`. 2558 2559 #### Too many open files 2560 2561 If you see an log error messages saying `too many open files`, you should increase the per-process file descriptors limit. 2562 2563 During normal operations, Dgraph must be able to open many files. Your operating system may set by default a open file descriptor limit lower than what's needed for a database such as Dgraph. 2564 2565 On Linux and Mac, you can check the file descriptor limit with `ulimit -n -H` for the hard limit and `ulimit -n -S` for the soft limit. The soft limit should be set high enough for Dgraph to run properly. A soft limit of 65535 is a good lower bound for a production setup. You can adjust the limit as needed. 2566 2567 ## See Also 2568 2569 * [Product Roadmap to v1.0](https://github.com/dgraph-io/dgraph/issues/1)