github.com/unigraph-dev/dgraph@v1.1.1-0.20200923154953-8b52b426f765/wiki/content/deploy/index.md (about) 1 +++ 2 date = "2017-03-20T22:25:17+11:00" 3 title = "Deploy" 4 +++ 5 6 This page talks about running Dgraph in various deployment modes, in a distributed fashion and involves 7 running multiple instances of Dgraph, over multiple servers in a cluster. 8 9 {{% notice "tip" %}} 10 For a single server setup, recommended for new users, please see [Get Started](/get-started) page. 11 {{% /notice %}} 12 13 ## Install Dgraph 14 #### Docker 15 16 ```sh 17 docker pull dgraph/dgraph:latest 18 19 # You can test that it worked fine, by running: 20 docker run -it dgraph/dgraph:latest dgraph 21 ``` 22 23 #### Automatic download 24 25 Running 26 ```sh 27 curl https://get.dgraph.io -sSf | bash 28 29 # Test that it worked fine, by running: 30 dgraph 31 ``` 32 would install the `dgraph` binary into your system. 33 34 #### Manual download [optional] 35 36 If you don't want to follow the automatic installation method, you could manually download the appropriate tar for your platform from **[Dgraph releases](https://github.com/dgraph-io/dgraph/releases)**. After downloading the tar for your platform from Github, extract the binary to `/usr/local/bin` like so. 37 38 ```sh 39 # For Linux 40 $ sudo tar -C /usr/local/bin -xzf dgraph-linux-amd64-VERSION.tar.gz 41 42 # For Mac 43 $ sudo tar -C /usr/local/bin -xzf dgraph-darwin-amd64-VERSION.tar.gz 44 45 # Test that it worked fine, by running: 46 dgraph 47 ``` 48 49 #### Building from Source 50 51 {{% notice "note" %}} 52 You can build the Ratel UI from source seperately following its build 53 [instructions](https://github.com/dgraph-io/ratel/blob/master/INSTRUCTIONS.md). 54 Ratel UI is distributed via Dgraph releases using any of the download methods 55 listed above. 56 {{% /notice %}} 57 58 Make sure you have [Go](https://golang.org/dl/) v1.11+ installed. 59 60 You'll need the following dependencies to install Dgraph using `make`: 61 ```bash 62 sudo apt-get update 63 sudo apt-get install gcc make 64 ``` 65 66 After installing Go, run 67 ```sh 68 # This should install dgraph binary in your $GOPATH/bin. 69 70 git clone https://github.com/dgraph-io/dgraph.git 71 cd ./dgraph 72 make install 73 ``` 74 75 If you get errors related to `grpc` while building them, your 76 `go-grpc` version might be outdated. We don't vendor in `go-grpc`(because it 77 causes issues while using the Go client). Update your `go-grpc` by running. 78 ```sh 79 go get -u -v google.golang.org/grpc 80 ``` 81 82 #### Config 83 84 The full set of dgraph's configuration options (along with brief descriptions) 85 can be viewed by invoking dgraph with the `--help` flag. For example, to see 86 the options available for `dgraph alpha`, run `dgraph alpha --help`. 87 88 The options can be configured in multiple ways (from highest precedence to 89 lowest precedence): 90 91 - Using command line flags (as described in the help output). 92 93 - Using environment variables. 94 95 - Using a configuration file. 96 97 If no configuration for an option is used, then the default value as described 98 in the `--help` output applies. 99 100 Multiple configuration methods can be used all at the same time. E.g. a core 101 set of options could be set in a config file, and instance specific options 102 could be set using environment vars or flags. 103 104 The environment variable names mirror the flag names as seen in the `--help` 105 output. They are the concatenation of `DGRAPH`, the subcommand invoked 106 (`ALPHA`, `ZERO`, `LIVE`, or `BULK`), and then the name of the flag (in 107 uppercase). For example, instead of using `dgraph alpha --lru_mb=8096`, you 108 could use `DGRAPH_ALPHA_LRU_MB=8096 dgraph alpha`. 109 110 Configuration file formats supported are JSON, TOML, YAML, HCL, and Java 111 properties (detected via file extension). The file extensions are .json, .toml, 112 .yml or .yaml, .hcl, and .properties for each format. 113 114 A configuration file can be specified using the `--config` flag, or an 115 environment variable. E.g. `dgraph zero --config my_config.json` or 116 `DGRAPH_ZERO_CONFIG=my_config.json dgraph zero`. 117 118 The config file structure is just simple key/value pairs (mirroring the flag 119 names). 120 121 Example JSON config file (config.json): 122 123 ```json 124 { 125 "my": "localhost:7080", 126 "zero": "localhost:5080", 127 "lru_mb": 4096, 128 "postings": "/path/to/p", 129 "wal": "/path/to/w" 130 } 131 ``` 132 133 Example TOML config file (config.toml): 134 135 ```toml 136 my = "localhost:7080" 137 zero = "localhost:5080" 138 lru_mb = 4096 139 postings = "/path/to/p" 140 wal = "/path/to/w" 141 ``` 142 143 144 Example YAML config file (config.yml): 145 146 ```yaml 147 my: "localhost:7080" 148 zero: "localhost:5080" 149 lru_mb: 4096 150 postings: "/path/to/p" 151 wal: "/path/to/w" 152 ``` 153 154 Example HCL config file (config.hcl): 155 156 ```hcl 157 my = "localhost:7080" 158 zero = "localhost:5080" 159 lru_mb = 4096 160 postings = "/path/to/p" 161 wal = "/path/to/w" 162 ``` 163 164 Example Java properties config file (config.properties): 165 ```text 166 my=localhost:7080 167 zero=localhost:5080 168 lru_mb=4096 169 postings=/path/to/p 170 wal=/path/to/w 171 ``` 172 173 ## Cluster Setup 174 175 ### Understanding Dgraph cluster 176 177 Dgraph is a truly distributed graph database - not a master-slave replication of 178 universal dataset. It shards by predicate and replicates predicates across the 179 cluster, queries can be run on any node and joins are handled over the 180 distributed data. A query is resolved locally for predicates the node stores, 181 and via distributed joins for predicates stored on other nodes. 182 183 For effectively running a Dgraph cluster, it's important to understand how 184 sharding, replication and rebalancing works. 185 186 **Sharding** 187 188 Dgraph colocates data per predicate (* P *, in RDF terminology), thus the 189 smallest unit of data is one predicate. To shard the graph, one or many 190 predicates are assigned to a group. Each Alpha node in the cluster serves a 191 single group. Dgraph Zero assigns a group to each Alpha node. 192 193 **Shard rebalancing** 194 195 Dgraph Zero tries to rebalance the cluster based on the disk usage in each 196 group. If Zero detects an imbalance, it would try to move a predicate along with 197 its indices to a group that has minimum disk usage. This can make the predicate 198 temporarily read-only. Queries for the predicate will still be serviced, but any 199 mutations for the predicate will be rejected and should be retried after the 200 move is finished. 201 202 Zero would continuously try to keep the amount of data on each server even, 203 typically running this check on a 10-min frequency. Thus, each additional 204 Dgraph Alpha instance would allow Zero to further split the predicates from 205 groups and move them to the new node. 206 207 **Consistent Replication** 208 209 If `--replicas` flag is set to something greater than one, Zero would assign the 210 same group to multiple nodes. These nodes would then form a Raft group aka 211 quorum. Every write would be consistently replicated to the quorum. To achieve 212 consensus, its important that the size of quorum be an odd number. Therefore, we 213 recommend setting `--replicas` to 1, 3 or 5 (not 2 or 4). This allows 0, 1, or 2 214 nodes serving the same group to be down, respectively without affecting the 215 overall health of that group. 216 217 ## Ports Usage 218 219 Dgraph cluster nodes use different ports to communicate over gRPC and HTTP. User has to pay attention while choosing these ports based on their topology and deployment-mode as each port needs different access security rules or firewall. 220 221 ### Types of ports 222 223 - **gRPC-internal:** Port that is used between the cluster nodes for internal communication and message exchange. 224 - **gRPC-external:** Port that is used by Dgraph clients, Dgraph Live Loader , and Dgraph Bulk loader to access APIs over gRPC. 225 - **http-external:** Port that is used by clients to access APIs over HTTP and other monitoring & administrative tasks. 226 227 ### Ports used by different nodes 228 229 Dgraph Node Type | gRPC-internal | gRPC-external | HTTP-external 230 ------------------|----------------|---------------|--------------- 231 zero | --Not Used-- | 5080 | 6080 232 alpha | 7080 | 9080 | 8080 233 ratel | --Not Used-- | --Not Used-- | 8000 234 235 Users have to modify security rules or open firewall depending up on their underlying network to allow communication between cluster nodes and between a server and a client. During development a general rule could be wide open *-external (gRPC/HTTP) ports to public and gRPC-internal to be open within the cluster nodes. 236 237 **Ratel UI** accesses Dgraph Alpha on the HTTP-external port (default localhost:8080) and can be configured to talk to remote Dgraph cluster. This way you can run Ratel on your local machine and point to a remote cluster. But if you are deploying Ratel along with Dgraph cluster, then you may have to expose 8000 to the public. 238 239 **Port Offset** To make it easier for user to setup the cluster, Dgraph defaults the ports used by Dgraph nodes and let user to provide an offset (through command option `--port_offset`) to define actual ports used by the node. Offset can also be used when starting multiple zero nodes in a HA setup. 240 241 For example, when a user runs a Dgraph Alpha by setting `--port_offset 2`, then the Alpha node binds to 7082 (gRPC-internal), 8082 (HTTP-external) & 9092 (gRPC-external) respectively. 242 243 **Ratel UI** by default listens on port 8000. You can use the `-port` flag to configure to listen on any other port. 244 245 {{% notice "tip" %}} 246 **For Dgraph v1.0.2 (or older)** 247 248 Zero's default ports are 7080 and 8080. When following instructions for the different setup guides below, override the Zero ports using `--port_offset` to match the current default ports. 249 250 ```sh 251 # Run Zero with ports 5080 and 6080 252 dgraph zero --idx=1 --port_offset -2000 253 # Run Zero with ports 5081 and 6081 254 dgraph zero --idx=2 --port_offset -1999 255 ``` 256 Likewise, Ratel's default port is 8081, so override it using `--port` to the current default port. 257 258 ```sh 259 dgraph-ratel --port 8080 260 ``` 261 {{% /notice %}} 262 263 ### HA Cluster Setup 264 265 In a high-availability setup, we need to run 3 or 5 replicas for Zero, and similarly, 3 or 5 replicas for Alpha. 266 {{% notice "note" %}} 267 If number of replicas is 2K + 1, up to **K servers** can be down without any impact on reads or writes. 268 269 Avoid keeping replicas to 2K (even number). If K servers go down, this would block reads and writes, due to lack of consensus. 270 {{% /notice %}} 271 272 **Dgraph Zero** 273 Run three Zero instances, assigning a unique ID(Integer) to each via `--idx` flag, and 274 passing the address of any healthy Zero instance via `--peer` flag. 275 276 To run three replicas for the alphas, set `--replicas=3`. Every time a new 277 Dgraph Alpha is added, Zero would check the existing groups and assign them to 278 one, which doesn't have three replicas. 279 280 **Dgraph Alpha** 281 Run as many Dgraph Alphas as you want. You can manually set `--idx` flag, or you 282 can leave that flag empty, and Zero would auto-assign an id to the Alpha. This 283 id would get persisted in the write-ahead log, so be careful not to delete it. 284 285 The new Alphas will automatically detect each other by communicating with 286 Dgraph zero and establish connections to each other. 287 288 Typically, Zero would first attempt to replicate a group, by assigning a new 289 Dgraph alpha to run the same group as assigned to another. Once the group has 290 been replicated as per the `--replicas` flag, Zero would create a new group. 291 292 Over time, the data would be evenly split across all the groups. So, it's 293 important to ensure that the number of Dgraph alphas is a multiple of the 294 replication setting. For e.g., if you set `--replicas=3` in Zero, then run three 295 Dgraph alphas for no sharding, but 3x replication. Run six Dgraph alphas, for 296 sharding the data into two groups, with 3x replication. 297 298 ## Single Host Setup 299 300 ### Run directly on the host 301 302 **Run dgraph zero** 303 304 ```sh 305 dgraph zero --my=IPADDR:5080 306 ``` 307 The `--my` flag is the connection that Dgraph alphas would dial to talk to 308 zero. So, the port `5080` and the IP address must be visible to all the Dgraph alphas. 309 310 For all other various flags, run `dgraph zero --help`. 311 312 **Run dgraph alpha** 313 314 ```sh 315 dgraph alpha --lru_mb=<typically one-third the RAM> --my=IPADDR:7080 --zero=localhost:5080 316 dgraph alpha --lru_mb=<typically one-third the RAM> --my=IPADDR:7081 --zero=localhost:5080 -o=1 317 ``` 318 319 Notice the use of `-o` for the second Alpha to add offset to the default ports used. Zero automatically assigns an unique ID to each Alpha, which is persisted in the write ahead log (wal) directory, users can specify the index using `--idx` option. Dgraph Alphas use two directories to persist data and 320 wal logs, and these directories must be different for each Alpha if they are running on the same host. You can use `-p` and `-w` to change the location of the data and WAL directories. For all other flags, run 321 322 `dgraph alpha --help`. 323 324 **Run dgraph UI** 325 326 ```sh 327 dgraph-ratel 328 ``` 329 330 ### Run using Docker 331 332 Dgraph cluster can be setup running as containers on a single host. First, you'd want to figure out the host IP address. You can typically do that via 333 334 ```sh 335 ip addr # On Arch Linux 336 ifconfig # On Ubuntu/Mac 337 ``` 338 We'll refer to the host IP address via `HOSTIPADDR`. 339 340 **Run dgraph zero** 341 342 ```sh 343 mkdir ~/zero # Or any other directory where data should be stored. 344 345 docker run -it -p 5080:5080 -p 6080:6080 -v ~/zero:/dgraph dgraph/dgraph:latest dgraph zero --my=HOSTIPADDR:5080 346 ``` 347 348 **Run dgraph alpha** 349 ```sh 350 mkdir ~/server1 # Or any other directory where data should be stored. 351 352 docker run -it -p 7080:7080 -p 8080:8080 -p 9080:9080 -v ~/server1:/dgraph dgraph/dgraph:latest dgraph alpha --lru_mb=<typically one-third the RAM> --zero=HOSTIPADDR:5080 --my=HOSTIPADDR:7080 353 354 mkdir ~/server2 # Or any other directory where data should be stored. 355 356 docker run -it -p 7081:7081 -p 8081:8081 -p 9081:9081 -v ~/server2:/dgraph dgraph/dgraph:latest dgraph alpha --lru_mb=<typically one-third the RAM> --zero=HOSTIPADDR:5080 --my=HOSTIPADDR:7081 -o=1 357 ``` 358 Notice the use of -o for server2 to override the default ports for server2. 359 360 **Run dgraph UI** 361 ```sh 362 docker run -it -p 8000:8000 dgraph/dgraph:latest dgraph-ratel 363 ``` 364 365 ### Run using Docker Compose (On single AWS instance) 366 367 We will use [Docker Machine](https://docs.docker.com/machine/overview/). It is a tool that lets you install Docker Engine on virtual machines and easily deploy applications. 368 369 * [Install Docker Machine](https://docs.docker.com/machine/install-machine/) on your machine. 370 371 {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS config. 372 Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}} 373 374 Here we'll go through an example of deploying Dgraph Zero, Alpha and Ratel on an AWS instance. 375 376 * Make sure you have Docker Machine installed by following [instructions](https://docs.docker.com/machine/install-machine/), provisioning an instance on AWS is just one step away. You'll have to [configure your AWS credentials](http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html) for programmatic access to the Amazon API. 377 378 * Create a new docker machine. 379 380 ```sh 381 docker-machine create --driver amazonec2 aws01 382 ``` 383 384 Your output should look like 385 386 ```sh 387 Running pre-create checks... 388 Creating machine... 389 (aws01) Launching instance... 390 ... 391 ... 392 Docker is up and running! 393 To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env aws01 394 ``` 395 396 The command would provision a `t2-micro` instance with a security group called `docker-machine` 397 (allowing inbound access on 2376 and 22). You can either edit the security group to allow inbound access to '5080`, `8080`, `9080` (default ports for Dgraph Zero & Alpha) or you can provide your own security 398 group which allows inbound access on port 22, 2376 (required by Docker Machine), 5080, 8080 and 9080. Remember port *5080* is only required if you are running Dgraph Live Loader or Dgraph Bulk Loader from outside. 399 400 [Here](https://docs.docker.com/machine/drivers/aws/#options) is a list of full options for the `amazonec2` driver which allows you choose the instance type, security group, AMI among many other things. 401 402 {{% notice "tip" %}}Docker machine supports [other drivers](https://docs.docker.com/machine/drivers/gce/) like GCE, Azure etc.{{% /notice %}} 403 404 * Install and run Dgraph using docker-compose 405 406 Docker Compose is a tool for running multi-container Docker applications. You can follow the 407 instructions [here](https://docs.docker.com/compose/install/) to install it. 408 409 Copy the file below in a directory on your machine and name it `docker-compose.yml`. 410 411 ```sh 412 version: "3.2" 413 services: 414 zero: 415 image: dgraph/dgraph:latest 416 volumes: 417 - /data:/dgraph 418 ports: 419 - 5080:5080 420 - 6080:6080 421 restart: on-failure 422 command: dgraph zero --my=zero:5080 423 server: 424 image: dgraph/dgraph:latest 425 volumes: 426 - /data:/dgraph 427 ports: 428 - 8080:8080 429 - 9080:9080 430 restart: on-failure 431 command: dgraph alpha --my=server:7080 --lru_mb=2048 --zero=zero:5080 432 ratel: 433 image: dgraph/dgraph:latest 434 ports: 435 - 8000:8000 436 command: dgraph-ratel 437 ``` 438 439 {{% notice "note" %}}The config mounts `/data`(you could mount something else) on the instance to `/dgraph` within the 440 container for persistence.{{% /notice %}} 441 442 * Connect to the Docker Engine running on the machine. 443 444 Running `docker-machine env aws01` tells us to run the command below to configure 445 our shell. 446 ``` 447 eval $(docker-machine env aws01) 448 ``` 449 This configures our Docker client to talk to the Docker engine running on the AWS Machine. 450 451 Finally run the command below to start the Zero and Alpha. 452 ``` 453 docker-compose up -d 454 ``` 455 This would start 3 Docker containers running Dgraph Zero, Alpha and Ratel on the same machine. Docker would restart the containers in case there is any error. 456 You can look at the logs using `docker-compose logs`. 457 458 ## Multi Host Setup 459 460 ### Using Docker Swarm 461 462 #### Cluster Setup Using Docker Swarm 463 464 {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS config. 465 Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}} 466 467 Here we'll go through an example of deploying 3 Dgraph Alpha nodes and 1 Zero on three different AWS instances using Docker Swarm with a replication factor of 3. 468 469 * Make sure you have Docker Machine installed by following [instructions](https://docs.docker.com/machine/install-machine/). 470 471 ```sh 472 docker-machine --version 473 ``` 474 475 * Create 3 instances on AWS and [install Docker Engine](https://docs.docker.com/engine/installation/) on them. This can be done manually or by using `docker-machine`. 476 You'll have to [configure your AWS credentials](http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html) to create the instances using Docker Machine. 477 478 Considering that you have AWS credentials setup, you can use the below commands to start 3 AWS 479 `t2-micro` instances with Docker Engine installed on them. 480 481 ```sh 482 docker-machine create --driver amazonec2 aws01 483 docker-machine create --driver amazonec2 aws02 484 docker-machine create --driver amazonec2 aws03 485 ``` 486 487 Your output should look like 488 489 ```sh 490 Running pre-create checks... 491 Creating machine... 492 (aws01) Launching instance... 493 ... 494 ... 495 Docker is up and running! 496 To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env aws01 497 ``` 498 499 The command would provision a `t2-micro` instance with a security group called `docker-machine` 500 (allowing inbound access on 2376 and 22). 501 502 You would need to edit the `docker-machine` security group to open inbound traffic on the following ports. 503 504 1. Allow all inbound traffic on all ports with Source being `docker-machine` 505 security ports so that Docker related communication can happen easily. 506 507 2. Also open inbound TCP traffic on the following ports required by Dgraph: 508 `5080`, `6080`, `8000`, `808[0-2]`, `908[0-2]`. Remember port *5080* is only 509 required if you are running Dgraph Live Loader or Dgraph Bulk Loader from 510 outside. You need to open `7080` to enable Alpha-to-Alpha communication in 511 case you have not opened all ports in #1. 512 513 If you are on AWS, below is the security group (**docker-machine**) after 514 necessary changes. 515 516  517 518 [Here](https://docs.docker.com/machine/drivers/aws/#options) is a list of full options for the `amazonec2` driver which allows you choose the 519 instance type, security group, AMI among many other 520 things. 521 522 {{% notice "tip" %}}Docker machine supports [other drivers](https://docs.docker.com/machine/drivers/gce/) like GCE, Azure etc.{{% /notice %}} 523 524 Running `docker-machine ps` shows all the AWS EC2 instances that we started. 525 ```sh 526 ➜ ~ docker-machine ls 527 NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS 528 aws01 - amazonec2 Running tcp://34.200.239.30:2376 v17.11.0-ce 529 aws02 - amazonec2 Running tcp://54.236.58.120:2376 v17.11.0-ce 530 aws03 - amazonec2 Running tcp://34.201.22.2:2376 v17.11.0-ce 531 ``` 532 533 * Start the Swarm 534 535 Docker Swarm has manager and worker nodes. Swarm can be started and updated on manager nodes. We 536 will setup `aws01` as swarm manager. You can first run the following commands to initialize the 537 swarm. 538 539 We are going to use the internal IP address given by AWS. Run the following command to get the 540 internal IP for `aws01`. Lets assume `172.31.64.18` is the internal IP in this case. 541 ``` 542 docker-machine ssh aws01 ifconfig eth0 543 ``` 544 545 Now that we have the internal IP, let's initiate the Swarm. 546 547 ```sh 548 # This configures our Docker client to talk to the Docker engine running on the aws01 host. 549 eval $(docker-machine env aws01) 550 docker swarm init --advertise-addr 172.31.64.18 551 ``` 552 553 Output: 554 ``` 555 Swarm initialized: current node (w9mpjhuju7nyewmg8043ypctf) is now a manager. 556 557 To add a worker to this swarm, run the following command: 558 559 docker swarm join \ 560 --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \ 561 172.31.64.18:2377 562 563 To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions. 564 ``` 565 566 Now we will make other nodes join the swarm. 567 568 ```sh 569 eval $(docker-machine env aws02) 570 docker swarm join \ 571 --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \ 572 172.31.64.18:2377 573 ``` 574 575 Output: 576 ``` 577 This node joined a swarm as a worker. 578 ``` 579 580 Similarly, aws03 581 ```sh 582 eval $(docker-machine env aws03) 583 docker swarm join \ 584 --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \ 585 172.31.64.18:2377 586 ``` 587 588 On the Swarm manager `aws01`, verify that your swarm is running. 589 ```sh 590 docker node ls 591 ``` 592 593 Output: 594 ```sh 595 ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS 596 ghzapjsto20c6d6l3n0m91zev aws02 Ready Active 597 rb39d5lgv66it1yi4rto0gn6a aws03 Ready Active 598 waqdyimp8llvca9i09k4202x5 * aws01 Ready Active Leader 599 ``` 600 601 * Start the Dgraph cluster 602 603 Copy the following file on your host machine and name it as `docker-compose.yml` 604 605 ```sh 606 version: "3" 607 networks: 608 dgraph: 609 services: 610 zero: 611 image: dgraph/dgraph:latest 612 volumes: 613 - data-volume:/dgraph 614 ports: 615 - 5080:5080 616 - 6080:6080 617 networks: 618 - dgraph 619 deploy: 620 placement: 621 constraints: 622 - node.hostname == aws01 623 command: dgraph zero --my=zero:5080 --replicas 3 624 alpha_1: 625 image: dgraph/dgraph:latest 626 hostname: "alpha_1" 627 volumes: 628 - data-volume:/dgraph 629 ports: 630 - 8080:8080 631 - 9080:9080 632 networks: 633 - dgraph 634 deploy: 635 placement: 636 constraints: 637 - node.hostname == aws01 638 command: dgraph alpha --my=alpha_1:7080 --lru_mb=2048 --zero=zero:5080 639 alpha_2: 640 image: dgraph/dgraph:latest 641 hostname: "alpha_2" 642 volumes: 643 - data-volume:/dgraph 644 ports: 645 - 8081:8081 646 - 9081:9081 647 networks: 648 - dgraph 649 deploy: 650 placement: 651 constraints: 652 - node.hostname == aws02 653 command: dgraph alpha --my=alpha_2:7081 --lru_mb=2048 --zero=zero:5080 -o 1 654 alpha_3: 655 image: dgraph/dgraph:latest 656 hostname: "alpha_3" 657 volumes: 658 - data-volume:/dgraph 659 ports: 660 - 8082:8082 661 - 9082:9082 662 networks: 663 - dgraph 664 deploy: 665 placement: 666 constraints: 667 - node.hostname == aws03 668 command: dgraph alpha --my=alpha_3:7082 --lru_mb=2048 --zero=zero:5080 -o 2 669 ratel: 670 image: dgraph/dgraph:latest 671 hostname: "ratel" 672 ports: 673 - 8000:8000 674 networks: 675 - dgraph 676 command: dgraph-ratel 677 volumes: 678 data-volume: 679 ``` 680 Run the following command on the Swarm leader to deploy the Dgraph Cluster. 681 682 ```sh 683 eval $(docker-machine env aws01) 684 docker stack deploy -c docker-compose.yml dgraph 685 ``` 686 687 This should run three Dgraph Alpha services (one on each VM because of the 688 constraint we have), one Dgraph Zero service on aws01 and one Dgraph Ratel. 689 690 These placement constraints (as seen in the compose file) are important so that 691 in case of restarting any containers, swarm places the respective Dgraph Alpha 692 or Zero containers on the same hosts to re-use the volumes. Also, if you are 693 running fewer than three hosts, make sure you use either different volumes or 694 run Dgraph Alpha with `-p p1 -w w1` options. 695 696 {{% notice "note" %}} 697 698 1. This setup would create and use a local volume called `dgraph_data-volume` on 699 the instances. If you plan to replace instances, you should use remote 700 storage like 701 [cloudstore](https://docs.docker.com/docker-for-aws/persistent-data-volumes) 702 instead of local disk. {{% /notice %}} 703 704 You can verify that all services were created successfully by running: 705 706 ```sh 707 docker service ls 708 ``` 709 710 Output: 711 ``` 712 ID NAME MODE REPLICAS IMAGE PORTS 713 vp5bpwzwawoe dgraph_ratel replicated 1/1 dgraph/dgraph:latest *:8000->8000/tcp 714 69oge03y0koz dgraph_alpha_2 replicated 1/1 dgraph/dgraph:latest *:8081->8081/tcp,*:9081->9081/tcp 715 kq5yks92mnk6 dgraph_alpha_3 replicated 1/1 dgraph/dgraph:latest *:8082->8082/tcp,*:9082->9082/tcp 716 uild5cqp44dz dgraph_zero replicated 1/1 dgraph/dgraph:latest *:5080->5080/tcp,*:6080->6080/tcp 717 v9jlw00iz2gg dgraph_alpha_1 replicated 1/1 dgraph/dgraph:latest *:8080->8080/tcp,*:9080->9080/tcp 718 ``` 719 720 To stop the cluster run 721 722 ``` 723 docker stack rm dgraph 724 ``` 725 726 ### HA Cluster setup using Docker Swarm 727 728 Here is a sample swarm config for running 6 Dgraph Alpha nodes and 3 Zero nodes on 6 different 729 ec2 instances. Setup should be similar to [Cluster setup using Docker Swarm]({{< relref "#cluster-setup-using-docker-swarm" >}}) apart from a couple of differences. This setup would ensure replication with sharding of data. The file assumes that there are six hosts available as docker-machines. Also if you are running on fewer than six hosts, make sure you use either different volumes or run Dgraph Alpha with `-p p1 -w w1` options. 730 731 You would need to edit the `docker-machine` security group to open inbound traffic on the following ports. 732 733 1. Allow all inbound traffic on all ports with Source being `docker-machine` security ports so that 734 docker related communication can happen easily. 735 736 2. Also open inbound TCP traffic on the following ports required by Dgraph: `5080`, `8000`, `808[0-5]`, `908[0-5]`. Remember port *5080* is only required if you are running Dgraph Live Loader or Dgraph Bulk Loader from outside. You need to open `7080` to enable Alpha-to-Alpha communication in case you have not opened all ports in #1. 737 738 If you are on AWS, below is the security group (**docker-machine**) after necessary changes. 739 740 741  742 743 Copy the following file on your host machine and name it as docker-compose.yml 744 745 ```sh 746 version: "3" 747 networks: 748 dgraph: 749 services: 750 zero_1: 751 image: dgraph/dgraph:latest 752 volumes: 753 - data-volume:/dgraph 754 ports: 755 - 5080:5080 756 - 6080:6080 757 networks: 758 - dgraph 759 deploy: 760 placement: 761 constraints: 762 - node.hostname == aws01 763 command: dgraph zero --my=zero_1:5080 --replicas 3 --idx 1 764 zero_2: 765 image: dgraph/dgraph:latest 766 volumes: 767 - data-volume:/dgraph 768 ports: 769 - 5081:5081 770 - 6081:6081 771 networks: 772 - dgraph 773 deploy: 774 placement: 775 constraints: 776 - node.hostname == aws02 777 command: dgraph zero -o 1 --my=zero_2:5081 --replicas 3 --peer zero_1:5080 --idx 2 778 zero_3: 779 image: dgraph/dgraph:latest 780 volumes: 781 - data-volume:/dgraph 782 ports: 783 - 5082:5082 784 - 6082:6082 785 networks: 786 - dgraph 787 deploy: 788 placement: 789 constraints: 790 - node.hostname == aws03 791 command: dgraph zero -o 2 --my=zero_3:5082 --replicas 3 --peer zero_1:5080 --idx 3 792 alpha_1: 793 image: dgraph/dgraph:latest 794 hostname: "alpha_1" 795 volumes: 796 - data-volume:/dgraph 797 ports: 798 - 8080:8080 799 - 9080:9080 800 networks: 801 - dgraph 802 deploy: 803 replicas: 1 804 placement: 805 constraints: 806 - node.hostname == aws01 807 command: dgraph alpha --my=alpha_1:7080 --lru_mb=2048 --zero=zero_1:5080 808 alpha_2: 809 image: dgraph/dgraph:latest 810 hostname: "alpha_2" 811 volumes: 812 - data-volume:/dgraph 813 ports: 814 - 8081:8081 815 - 9081:9081 816 networks: 817 - dgraph 818 deploy: 819 replicas: 1 820 placement: 821 constraints: 822 - node.hostname == aws02 823 command: dgraph alpha --my=alpha_2:7081 --lru_mb=2048 --zero=zero_1:5080 -o 1 824 alpha_3: 825 image: dgraph/dgraph:latest 826 hostname: "alpha_3" 827 volumes: 828 - data-volume:/dgraph 829 ports: 830 - 8082:8082 831 - 9082:9082 832 networks: 833 - dgraph 834 deploy: 835 replicas: 1 836 placement: 837 constraints: 838 - node.hostname == aws03 839 command: dgraph alpha --my=alpha_3:7082 --lru_mb=2048 --zero=zero_1:5080 -o 2 840 alpha_4: 841 image: dgraph/dgraph:latest 842 hostname: "alpha_4" 843 volumes: 844 - data-volume:/dgraph 845 ports: 846 - 8083:8083 847 - 9083:9083 848 networks: 849 - dgraph 850 deploy: 851 placement: 852 constraints: 853 - node.hostname == aws04 854 command: dgraph alpha --my=alpha_4:7083 --lru_mb=2048 --zero=zero_1:5080 -o 3 855 alpha_5: 856 image: dgraph/dgraph:latest 857 hostname: "alpha_5" 858 volumes: 859 - data-volume:/dgraph 860 ports: 861 - 8084:8084 862 - 9084:9084 863 networks: 864 - dgraph 865 deploy: 866 placement: 867 constraints: 868 - node.hostname == aws05 869 command: dgraph alpha --my=alpha_5:7084 --lru_mb=2048 --zero=zero_1:5080 -o 4 870 alpha_6: 871 image: dgraph/dgraph:latest 872 hostname: "alpha_6" 873 volumes: 874 - data-volume:/dgraph 875 ports: 876 - 8085:8085 877 - 9085:9085 878 networks: 879 - dgraph 880 deploy: 881 placement: 882 constraints: 883 - node.hostname == aws06 884 command: dgraph alpha --my=alpha_6:7085 --lru_mb=2048 --zero=zero_1:5080 -o 5 885 ratel: 886 image: dgraph/dgraph:latest 887 hostname: "ratel" 888 ports: 889 - 8000:8000 890 networks: 891 - dgraph 892 command: dgraph-ratel 893 volumes: 894 data-volume: 895 ``` 896 {{% notice "note" %}} 897 1. This setup assumes that you are using 6 hosts, but if you are running fewer than 6 hosts then you have to either use different volumes between Dgraph alphas or use `-p` & `-w` to configure data directories. 898 2. This setup would create and use a local volume called `dgraph_data-volume` on the instances. If you plan to replace instances, you should use remote storage like [cloudstore](https://docs.docker.com/docker-for-aws/persistent-data-volumes) instead of local disk. {{% /notice %}} 899 900 ## Using Kubernetes (v1.8.4) 901 902 {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS config. 903 Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}} 904 905 * Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) which is used to deploy 906 and manage applications on kubernetes. 907 * Get the kubernetes cluster up and running on a cloud provider of your choice. You can use [kops](https://github.com/kubernetes/kops/blob/master/docs/aws.md) to set it up on AWS. Kops does auto-scaling by default on AWS and creates the volumes and instances for you. 908 909 Verify that you have your cluster up and running using `kubectl get nodes`. If you used `kops` with 910 the default options, you should have a master and two worker nodes ready. 911 912 ```sh 913 ➜ kubernetes git:(master) ✗ kubectl get nodes 914 NAME STATUS ROLES AGE VERSION 915 ip-172-20-42-118.us-west-2.compute.internal Ready node 1h v1.8.4 916 ip-172-20-61-179.us-west-2.compute.internal Ready master 2h v1.8.4 917 ip-172-20-61-73.us-west-2.compute.internal Ready node 2h v1.8.4 918 ``` 919 920 ### Single Server 921 922 Once your Kubernetes cluster is up, you can use [dgraph-single.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml) to start a Zero and Alpha. 923 924 * From your machine, run the following command to start a StatefulSet that 925 creates a Pod with Zero and Alpha running in it. 926 927 ```sh 928 kubectl create -f https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml 929 ``` 930 931 Output: 932 ``` 933 service "dgraph-public" created 934 statefulset "dgraph" created 935 ``` 936 937 * Confirm that the pod was created successfully. 938 939 ```sh 940 kubectl get pods 941 ``` 942 943 Output: 944 ``` 945 NAME READY STATUS RESTARTS AGE 946 dgraph-0 3/3 Running 0 1m 947 ``` 948 949 {{% notice "tip" %}}You can check the logs for the containers in the pod using `kubectl logs -f dgraph-0 <container_name>`. For example, try `kubectl logs -f dgraph-0 alpha` for server logs.{{% /notice %}} 950 951 * Test the setup 952 953 Port forward from your local machine to the pod 954 955 ```sh 956 kubectl port-forward dgraph-0 8080 957 kubectl port-forward dgraph-0 8000 958 ``` 959 960 Go to `http://localhost:8000` and verify Dgraph is working as expected. 961 962 {{% notice "note" %}} You can also access the service on its External IP address.{{% /notice %}} 963 964 965 * Stop the cluster 966 967 Delete all the resources 968 969 ```sh 970 kubectl delete pods,statefulsets,services,persistentvolumeclaims,persistentvolumes -l app=dgraph 971 ``` 972 973 Stop the cluster. If you used `kops` you can run the following command. 974 975 ```sh 976 kops delete cluster ${NAME} --yes 977 ``` 978 979 ### HA Cluster Setup Using Kubernetes 980 981 This setup allows you to run 3 Dgraph Alphas and 3 Dgraph Zeros. We start Zero with `--replicas 982 3` flag, so all data would be replicated on 3 Alphas and form 1 alpha group. 983 984 {{% notice "note" %}} Ideally you should have at least three worker nodes as part of your Kubernetes 985 cluster so that each Dgraph Alpha runs on a separate node.{{% /notice %}} 986 987 * Check the nodes that are part of the Kubernetes cluster. 988 989 ```sh 990 kubectl get nodes 991 ``` 992 993 Output: 994 ```sh 995 NAME STATUS ROLES AGE VERSION 996 ip-172-20-34-90.us-west-2.compute.internal Ready master 6m v1.8.4 997 ip-172-20-51-1.us-west-2.compute.internal Ready node 4m v1.8.4 998 ip-172-20-59-116.us-west-2.compute.internal Ready node 4m v1.8.4 999 ip-172-20-61-88.us-west-2.compute.internal Ready node 5m v1.8.4 1000 ``` 1001 1002 Once your Kubernetes cluster is up, you can use [dgraph-ha.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml) to start the cluster. 1003 1004 * From your machine, run the following command to start the cluster. 1005 1006 ```sh 1007 kubectl create -f https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml 1008 ``` 1009 1010 Output: 1011 ```sh 1012 service "dgraph-zero-public" created 1013 service "dgraph-alpha-public" created 1014 service "dgraph-alpha-0-http-public" created 1015 service "dgraph-ratel-public" created 1016 service "dgraph-zero" created 1017 service "dgraph-alpha" created 1018 statefulset "dgraph-zero" created 1019 statefulset "dgraph-alpha" created 1020 deployment "dgraph-ratel" created 1021 ``` 1022 1023 * Confirm that the pods were created successfully. 1024 1025 ```sh 1026 kubectl get pods 1027 ``` 1028 1029 Output: 1030 ```sh 1031 NAME READY STATUS RESTARTS AGE 1032 dgraph-ratel-<pod-id> 1/1 Running 0 9s 1033 dgraph-alpha-0 1/1 Running 0 2m 1034 dgraph-alpha-1 1/1 Running 0 2m 1035 dgraph-alpha-2 1/1 Running 0 2m 1036 dgraph-zero-0 1/1 Running 0 2m 1037 dgraph-zero-1 1/1 Running 0 2m 1038 dgraph-zero-2 1/1 Running 0 2m 1039 1040 ``` 1041 1042 {{% notice "tip" %}}You can check the logs for the containers in the pod using `kubectl logs -f dgraph-alpha-0` and `kubectl logs -f dgraph-zero-0`.{{% /notice %}} 1043 1044 * Test the setup 1045 1046 Port forward from your local machine to the pod 1047 1048 ```sh 1049 kubectl port-forward dgraph-alpha-0 8080 1050 kubectl port-forward dgraph-ratel-<pod-id> 8000 1051 ``` 1052 1053 Go to `http://localhost:8000` and verify Dgraph is working as expected. 1054 1055 {{% notice "note" %}} You can also access the service on its External IP address.{{% /notice %}} 1056 1057 1058 * Stop the cluster 1059 1060 Delete all the resources 1061 1062 ```sh 1063 kubectl delete pods,statefulsets,services,persistentvolumeclaims,persistentvolumes -l app=dgraph-zero 1064 kubectl delete pods,statefulsets,services,persistentvolumeclaims,persistentvolumes -l app=dgraph-alpha 1065 kubectl delete pods,replicasets,services,persistentvolumeclaims,persistentvolumes -l app=dgraph-ratel 1066 ``` 1067 1068 Stop the cluster. If you used `kops` you can run the following command. 1069 1070 ```sh 1071 kops delete cluster ${NAME} --yes 1072 ``` 1073 1074 ### Kubernetes Storage 1075 1076 The Kubernetes configurations in the previous sections were configured to run 1077 Dgraph with any storage type (`storage-class: anything`). On the common cloud 1078 environments like AWS, GCP, and Azure, the default storage type are slow disks 1079 like hard disks or low IOPS SSDs. We highly recommend using faster disks for 1080 ideal performance when running Dgraph. 1081 1082 #### Local storage 1083 1084 The AWS storage-optimized i-class instances provide locally attached NVMe-based 1085 SSD storage which provide consistent very high IOPS. The Dgraph team uses 1086 i3.large instances on AWS to test Dgraph. 1087 1088 You can create a Kubernetes `StorageClass` object to provision a specific type 1089 of storage volume which you can then attach to your Dgraph pods. You can set up 1090 your cluster with local SSDs by using [Local Persistent 1091 Volumes](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/). 1092 This Kubernetes feature is in beta at the time of this writing (Kubernetes 1093 v1.13.1). You can first set up an EC2 instance with locally attached storage. 1094 Once it is formatted and mounted properly, then you can create a StorageClass to 1095 access it.: 1096 1097 ```yaml 1098 apiVersion: storage.k8s.io/v1 1099 kind: StorageClass 1100 metadata: 1101 name: <your-local-storage-class-name> 1102 provisioner: kubernetes.io/no-provisioner 1103 volumeBindingMode: WaitForFirstConsumer 1104 ``` 1105 1106 Currently, Kubernetes does not allow automatic provisioning of local storage. So 1107 a PersistentVolume with a specific mount path should be created: 1108 1109 ```yaml 1110 apiVersion: v1 1111 kind: PersistentVolume 1112 metadata: 1113 name: <your-local-pv-name> 1114 spec: 1115 capacity: 1116 storage: 475Gi 1117 volumeMode: Filesystem 1118 accessModes: 1119 - ReadWriteOnce 1120 persistentVolumeReclaimPolicy: Delete 1121 storageClassName: <your-local-storage-class-name> 1122 local: 1123 path: /data 1124 nodeAffinity: 1125 required: 1126 nodeSelectorTerms: 1127 - matchExpressions: 1128 - key: kubernetes.io/hostname 1129 operator: In 1130 values: 1131 - <node-name> 1132 ``` 1133 1134 Then, in the StatefulSet configuration you can claim this local storage in 1135 .spec.volumeClaimTemplate: 1136 1137 ``` 1138 kind: StatefulSet 1139 ... 1140 volumeClaimTemplates: 1141 - metadata: 1142 name: datadir 1143 spec: 1144 accessModes: 1145 - ReadWriteOnce 1146 storageClassName: <your-local-storage-class-name> 1147 resources: 1148 requests: 1149 storage: 500Gi 1150 ``` 1151 1152 You can repeat these steps for each instance that's configured with local 1153 node storage. 1154 1155 #### Non-local persistent disks 1156 1157 EBS volumes on AWS and PDs on GCP are persistent disks that can be configured 1158 with Dgraph. The disk performance is much lower than locally attached storage 1159 but can be sufficient for your workload such as testing environments. 1160 1161 When using EBS volumes on AWS, we recommend using Provisioned IOPS SSD EBS 1162 volumes (the io1 disk type) which provide consistent IOPS. The available IOPS 1163 for AWS EBS volumes is based on the total disk size. With Kubernetes, you can 1164 request io1 disks to be provisioned with this config with 50 IOPS/GB using the 1165 `iopsPerGB` parameter: 1166 1167 ``` 1168 kind: StorageClass 1169 apiVersion: storage.k8s.io/v1 1170 metadata: 1171 name: <your-storage-class-name> 1172 provisioner: kubernetes.io/aws-ebs 1173 parameters: 1174 type: io1 1175 iopsPerGB: "50" 1176 fsType: ext4 1177 ``` 1178 1179 Example: Requesting a disk size of 250Gi with this storage class would provide 1180 12.5K IOPS. 1181 1182 ### Removing a Dgraph Pod 1183 1184 In the event that you need to completely remove a pod (e.g., its disk got 1185 corrupted and data cannot be recovered), you can use the `/removeNode` API to 1186 remove the node from the cluster. With a Kubernetes StatefulSet, you'll need to 1187 remove the node in this order: 1188 1189 1. Call `/removeNode` to remove the Dgraph instance from the cluster (see [More 1190 about Dgraph Zero]({{< relref "#more-about-dgraph-zero" >}})). The removed 1191 instance will immediately stop running. Any further attempts to join the 1192 cluster will fail for that instance since it has been removed. 1193 2. Remove the PersistentVolumeClaim associated with the pod to delete its data. 1194 This prepares the pod to join with a clean state. 1195 3. Restart the pod. This will create a new PersistentVolumeClaim to create new 1196 data directories. 1197 1198 When an Alpha pod restarts in a replicated cluster, it will join as a new member 1199 of the cluster, be assigned a group and an unused index from Zero, and receive 1200 the latest snapshot from the Alpha leader of the group. 1201 1202 When a Zero pod restarts, it must join the existing group with an unused index 1203 ID. The index ID is set with the `--idx` flag. This may require the StatefulSet 1204 configuration to be updated. 1205 1206 ### Kubernetes and Bulk Loader 1207 1208 You may want to initialize a new cluster with an existing data set such as data 1209 from the [Dgraph Bulk Loader]({{< relref "#bulk-loader" >}}). You can use [Init 1210 Containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) 1211 to copy the data to the pod volume before the Alpha process runs. 1212 1213 See the `initContainers` configuration in 1214 [dgraph-ha.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml) 1215 to learn more. 1216 1217 ## More about Dgraph Alpha 1218 1219 On its HTTP port, a Dgraph Alpha exposes a number of admin endpoints. 1220 1221 * `/health` returns HTTP status code 200 if the worker is running, HTTP 503 otherwise. 1222 * `/admin/shutdown` initiates a proper [shutdown]({{< relref "#shutdown">}}) of the Alpha. 1223 * `/admin/export` initiates a data [export]({{< relref "#export">}}). 1224 1225 By default the Alpha listens on `localhost` for admin actions (the loopback address only accessible from the same machine). The `--bindall=true` option binds to `0.0.0.0` and thus allows external connections. 1226 1227 {{% notice "tip" %}}Set max file descriptors to a high value like 10000 if you are going to load a lot of data.{{% /notice %}} 1228 1229 ## More about Dgraph Zero 1230 1231 Dgraph Zero controls the Dgraph cluster. It automatically moves data between 1232 different Dgraph Alpha instances based on the size of the data served by each Alpha instance. 1233 1234 It is mandatory to run at least one `dgraph zero` node before running any `dgraph alpha`. 1235 Options present for `dgraph zero` can be seen by running `dgraph zero --help`. 1236 1237 * Zero stores information about the cluster. 1238 * `--replicas` is the option that controls the replication factor. (i.e. number of replicas per data shard, including the original shard) 1239 * When a new Alpha joins the cluster, it is assigned a group based on the replication factor. If the replication factor is 1 then each Alpha node will serve different group. If replication factor is 2 and you launch 4 Alphas, then first two Alphas would serve group 1 and next two machines would serve group 2. 1240 * Zero also monitors the space occupied by predicates in each group and moves them around to rebalance the cluster. 1241 1242 Like Alpha, Zero also exposes HTTP on 6080 (+ any `--port_offset`). You can query (**GET** request) it 1243 to see useful information, like the following: 1244 1245 * `/state` Information about the nodes that are part of the cluster. Also contains information about 1246 size of predicates and groups they belong to. 1247 * `/assign?what=uids&num=100` This would allocate `num` uids and return a JSON map 1248 containing `startId` and `endId`, both inclusive. This id range can be safely assigned 1249 externally to new nodes during data ingestion. 1250 * `/assign?what=timestamps&num=100` This would request timestamps from Zero. 1251 This is useful to fast forward Zero state when starting from a postings 1252 directory, which already has commits higher than Zero's leased timestamp. 1253 * `/removeNode?id=3&group=2` If a replica goes down and can't be recovered, you 1254 can remove it and add a new node to the quorum. This endpoint can be used to 1255 remove a dead Zero or Dgraph Alpha node. To remove dead Zero nodes, pass 1256 `group=0` and the id of the Zero node. 1257 1258 {{% notice "note" %}} 1259 Before using the API ensure that the node is down and ensure that it doesn't come back up ever again. 1260 1261 You should not use the same `idx` of a node that was removed earlier. 1262 {{% /notice %}} 1263 1264 * `/moveTablet?tablet=name&group=2` This endpoint can be used to move a tablet to a group. Zero 1265 already does shard rebalancing every 8 mins, this endpoint can be used to force move a tablet. 1266 1267 1268 These are the **POST** endpoints available: 1269 1270 * `/enterpriseLicense` Use endpoint to apply an enterprise license to the cluster by supplying it 1271 as part of the body. 1272 1273 1274 1275 ## TLS configuration 1276 1277 {{% notice "note" %}} 1278 This section refers to the `dgraph cert` command which was introduced in v1.0.9. For previous releases, see the previous [TLS configuration documentation](https://docs.dgraph.io/v1.0.7/deploy/#tls-configuration). 1279 {{% /notice %}} 1280 1281 1282 Connections between client and server can be secured with TLS. Password protected private keys are **not supported**. 1283 1284 {{% notice "tip" %}}If you're generating encrypted private keys with `openssl`, be sure to specify encryption algorithm explicitly (like `-aes256`). This will force `openssl` to include `DEK-Info` header in private key, which is required to decrypt the key by Dgraph. When default encryption is used, `openssl` doesn't write that header and key can't be decrypted.{{% /notice %}} 1285 1286 ### Self-signed certificates 1287 1288 The `dgraph cert` program creates and manages self-signed certificates using a generated Dgraph Root CA. The _cert_ command simplifies certificate management for you. 1289 1290 ```sh 1291 # To see the available flags. 1292 $ dgraph cert --help 1293 1294 # Create Dgraph Root CA, used to sign all other certificates. 1295 $ dgraph cert 1296 1297 # Create node certificate (needed for Dgraph Live Loader using TLS) 1298 $ dgraph cert -n live 1299 1300 # Create client certificate 1301 $ dgraph cert -c dgraphuser 1302 1303 # Combine all in one command 1304 $ dgraph cert -n live -c dgraphuser 1305 1306 # List all your certificates and keys 1307 $ dgraph cert ls 1308 ``` 1309 1310 ### File naming conventions 1311 1312 To enable TLS you must specify the directory path to find certificates and keys. The default location where the _cert_ command stores certificates (and keys) is `tls` under the Dgraph working directory; where the data files are found. The default dir path can be overridden using the `--dir` option. 1313 1314 ```sh 1315 $ dgraph cert --dir ~/mycerts 1316 ``` 1317 1318 The following file naming conventions are used by Dgraph for proper TLS setup. 1319 1320 | File name | Description | Use | 1321 |-----------|-------------|-------| 1322 | ca.crt | Dgraph Root CA certificate | Verify all certificates | 1323 | ca.key | Dgraph CA private key | Validate CA certificate | 1324 | node.crt | Dgraph node certificate | Shared by all nodes for accepting TLS connections | 1325 | node.key | Dgraph node private key | Validate node certificate | 1326 | client._name_.crt | Dgraph client certificate | Authenticate a client _name_ | 1327 | client._name_.key | Dgraph client private key | Validate _name_ client certificate | 1328 1329 The Root CA certificate is used for verifying node and client certificates, if changed you must regenerate all certificates. 1330 1331 For client authentication, each client must have their own certificate and key. These are then used to connect to the Dgraph node(s). 1332 1333 The node certificate `node.crt` can support multiple node names using multiple host names and/or IP address. Just separate the names with commas when generating the certificate. 1334 1335 ```sh 1336 $ dgraph cert -n localhost,104.25.165.23,dgraph.io,2400:cb00:2048:1::6819:a417 1337 ``` 1338 1339 {{% notice "tip" %}}You must delete the old node cert and key before you can generate a new pair.{{% /notice %}} 1340 1341 {{% notice "note" %}}When using host names for node certificates, including _localhost_, your clients must connect to the matching host name -- such as _localhost_ not 127.0.0.1. If you need to use IP addresses, then add them to the node certificate.{{% /notice %}} 1342 1343 ### Certificate inspection 1344 1345 The command `dgraph cert ls` lists all certificates and keys in the `--dir` directory (default 'tls'), along with details to inspect and validate cert/key pairs. 1346 1347 Example of command output: 1348 1349 ```sh 1350 -rw-r--r-- ca.crt - Dgraph Root CA certificate 1351 Issuer: Dgraph Labs, Inc. 1352 S/N: 043c4d8fdd347f06 1353 Expiration: 02 Apr 29 16:56 UTC 1354 SHA-256 Digest: 4A2B0F0F 716BF5B6 C603E01A 6229D681 0B2AFDC5 CADF5A0D 17D59299 116119E5 1355 1356 -r-------- ca.key - Dgraph Root CA key 1357 SHA-256 Digest: 4A2B0F0F 716BF5B6 C603E01A 6229D681 0B2AFDC5 CADF5A0D 17D59299 116119E5 1358 1359 -rw-r--r-- client.admin.crt - Dgraph client certificate: admin 1360 Issuer: Dgraph Labs, Inc. 1361 CA Verify: PASSED 1362 S/N: 297e4cb4f97c71f9 1363 Expiration: 03 Apr 24 17:29 UTC 1364 SHA-256 Digest: D23EFB61 DE03C735 EB07B318 DB70D471 D3FE8556 B15D084C 62675857 788DF26C 1365 1366 -rw------- client.admin.key - Dgraph Client key 1367 SHA-256 Digest: D23EFB61 DE03C735 EB07B318 DB70D471 D3FE8556 B15D084C 62675857 788DF26C 1368 1369 -rw-r--r-- node.crt - Dgraph Node certificate 1370 Issuer: Dgraph Labs, Inc. 1371 CA Verify: PASSED 1372 S/N: 795ff0e0146fdb2d 1373 Expiration: 03 Apr 24 17:00 UTC 1374 Hosts: 104.25.165.23, 2400:cb00:2048:1::6819:a417, localhost, dgraph.io 1375 SHA-256 Digest: 7E243ED5 3286AE71 B9B4E26C 5B2293DA D3E7F336 1B1AFFA7 885E8767 B1A84D28 1376 1377 -rw------- node.key - Dgraph Node key 1378 SHA-256 Digest: 7E243ED5 3286AE71 B9B4E26C 5B2293DA D3E7F336 1B1AFFA7 885E8767 B1A84D28 1379 ``` 1380 1381 Important points: 1382 1383 * The cert/key pairs should always have matching SHA-256 digests. Otherwise, the cert(s) must be 1384 regenerated. If the Root CA pair differ, all cert/key must be regenerated; the flag `--force` 1385 can help. 1386 * All certificates must pass Dgraph CA verification. 1387 * All key files should have the least access permissions, especially the `ca.key`, but be readable. 1388 * Key files won't be overwritten if they have limited access, even with `--force`. 1389 * Node certificates are only valid for the hosts listed. 1390 * Client certificates are only valid for the named client/user. 1391 1392 ### TLS options 1393 1394 The following configuration options are available for Alpha: 1395 1396 * `--tls_dir string` - TLS dir path; this enables TLS connections (usually 'tls'). 1397 * `--tls_use_system_ca` - Include System CA with Dgraph Root CA. 1398 * `--tls_client_auth string` - TLS client authentication used to validate client connection. See [Client authentication](#client-authentication) for details. 1399 1400 ```sh 1401 # Default use for enabling TLS server (after generating certificates) 1402 $ dgraph alpha --tls_dir tls 1403 ``` 1404 1405 Dgraph Live Loader can be configured with following options: 1406 1407 * `--tls_dir string` - TLS dir path; this enables TLS connections (usually 'tls'). 1408 * `--tls_use_system_ca` - Include System CA with Dgraph Root CA. 1409 * `--tls_server_name string` - Server name, used for validating the server's TLS host name. 1410 1411 ```sh 1412 # First, create a client certificate for live loader. This will create 'tls/client.live.crt' 1413 $ dgraph cert -c live 1414 1415 # Now, connect to server using TLS 1416 $ dgraph live --tls_dir tls -s 21million.schema -f 21million.rdf.gz 1417 ``` 1418 1419 ### Client authentication 1420 1421 The server option `--tls_client_auth` accepts different values that change the security policty of client certificate verification. 1422 1423 | Value | Description | 1424 |-------|-------------| 1425 | REQUEST | Server accepts any certificate, invalid and unverified (least secure) | 1426 | REQUIREANY | Server expects any certificate, valid and unverified | 1427 | VERIFYIFGIVEN | Client certificate is verified if provided (default) | 1428 | REQUIREANDVERIFY | Always require a valid certificate (most secure) | 1429 1430 {{% notice "note" %}}REQUIREANDVERIFY is the most secure but also the most difficult to configure for remote clients. When using this value, the value of `--tls_server_name` is matched against the certificate SANs values and the connection host.{{% /notice %}} 1431 1432 ## Cluster Checklist 1433 1434 In setting up a cluster be sure the check the following. 1435 1436 * Is at least one Dgraph Zero node running? 1437 * Is each Dgraph Alpha instance in the cluster set up correctly? 1438 * Will each Dgraph Alpha instance be accessible to all peers on 7080 (+ any port offset)? 1439 * Does each instance have a unique ID on startup? 1440 * Has `--bindall=true` been set for networked communication? 1441 1442 ## Fast Data Loading 1443 1444 There are two different tools that can be used for fast data loading: 1445 1446 - `dgraph live` runs the Dgraph Live Loader 1447 - `dgraph bulk` runs the Dgraph Bulk Loader 1448 1449 {{% notice "note" %}} Both tools only accept [RDF N-Quad/Triple 1450 data](https://www.w3.org/TR/n-quads/) or JSON in plain or gzipped format. Data 1451 in other formats must be converted.{{% /notice %}} 1452 1453 ### Live Loader 1454 1455 Dgraph Live Loader (run with `dgraph live`) is a small helper program which reads RDF N-Quads from a gzipped file, batches them up, creates mutations (using the go client) and shoots off to Dgraph. 1456 1457 Dgraph Live Loader correctly handles assigning unique IDs to blank nodes across multiple files, and can optionally persist them to disk to save memory, in case the loader was re-run. 1458 1459 {{% notice "note" %}} Dgraph Live Loader can optionally write the xid->uid mapping to a directory specified using the `-x` flag, which can reused 1460 given that live loader completed successfully in the previous run.{{% /notice %}} 1461 1462 ```sh 1463 $ dgraph live --help # To see the available flags. 1464 1465 # Read RDFs or JSON from the passed file, and send them to Dgraph on localhost:9080. 1466 $ dgraph live -f <path-to-gzipped-RDF-or-JSON-file> 1467 1468 # Read multiple RDFs or JSON from the passed path, and send them to Dgraph on localhost:9080. 1469 $ dgraph live -f <./path-to-gzipped-RDF-or-JSON-files> 1470 1471 # Read multiple files strictly by name. 1472 $ dgraph live -f <file1.rdf, file2.rdf> 1473 1474 # Use compressed gRPC connections to and from Dgraph. 1475 $ dgraph live -C -f <path-to-gzipped-RDF-or-JSON-file> 1476 1477 # Read RDFs and a schema file and send to Dgraph running at given address. 1478 $ dgraph live -f <path-to-gzipped-RDf-or-JSON-file> -s <path-to-schema-file> -a <dgraph-alpha-address:grpc_port> -z <dgraph-zero-address:grpc_port> 1479 ``` 1480 1481 #### Other Live Loader options 1482 1483 `--new_uids` (default: false): Assign new UIDs instead of using the existing 1484 UIDs in data files. This is useful to avoid overriding the data in a DB already 1485 in operation. 1486 1487 `-f, --files`: Location of *.rdf(.gz) or *.json(.gz) file(s) to load. It can 1488 load multiple files in a given path. If the path is a directory, then all files 1489 ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded. 1490 1491 `--format`: Specify file format (rdf or json) instead of getting it from 1492 filenames. This is useful if you need to define a strict format manually. 1493 1494 `-b, --batch` (default: 1000): Number of N-Quads to send as part of a mutation. 1495 1496 `-c, --conc` (default: 10): Number of concurrent requests to make to Dgraph. 1497 Do not confuse with `-C`. 1498 1499 `-C, --use_compression` (default: false): Enable compression for connections to and from the 1500 Alpha server. 1501 1502 `-a, --alpha` (default: `localhost:9080`): Dgraph Alpha gRPC server address to connect for live loading. This can be a comma-separated list of Alphas addresses in the same cluster to distribute the load, e.g., `"alpha:grpc_port,alpha2:grpc_port,alpha3:grpc_port"`. 1503 1504 ### Bulk Loader 1505 1506 {{% notice "note" %}} 1507 It's crucial to tune the bulk loader's flags to get good performance. See the 1508 section below for details. 1509 {{% /notice %}} 1510 1511 Dgraph Bulk Loader serves a similar purpose to the Dgraph Live Loader, but can 1512 only be used to load data into a new cluster. It cannot be run on an existing 1513 Dgraph cluster. Dgraph Bulk Loader is **considerably faster** than the Dgraph 1514 Live Loader and is the recommended way to perform the initial import of large 1515 datasets into Dgraph. 1516 1517 Only one or more Dgraph Zeros should be running for bulk loading. Dgraph Alphas 1518 will be started later. 1519 1520 {{% notice "warning" %}} 1521 Don't use bulk loader once the Dgraph cluster is up and running. Use it to import 1522 your existing data to a new cluster. 1523 {{% /notice %}} 1524 1525 You can [read some technical details](https://blog.dgraph.io/post/bulkloader/) 1526 about the bulk loader on the blog. 1527 1528 See [Fast Data Loading]({{< relref "#fast-data-loading" >}}) for more info about 1529 the expected N-Quads format. 1530 1531 **Reduce shards**: Before running the bulk load, you need to decide how many 1532 Alpha groups will be running when the cluster starts. The number of Alpha groups 1533 will be the same number of reduce shards you set with the `--reduce_shards` 1534 flag. For example, if your cluster will run 3 Alpha with 3 replicas per group, 1535 then there is 1 group and `--reduce_shards` should be set to 1. If your cluster 1536 will run 6 Alphas with 3 replicas per group, then there are 2 groups and 1537 `--reduce_shards` should be set to 2. 1538 1539 **Map shards**: The `--map_shards` option must be set to at least what's set for 1540 `--reduce_shards`. A higher number helps the bulk loader evenly distribute 1541 predicates between the reduce shards. 1542 1543 ```sh 1544 $ dgraph bulk -f goldendata.rdf.gz -s goldendata.schema --map_shards=4 --reduce_shards=2 --http localhost:8000 --zero=localhost:5080 1545 ``` 1546 ``` 1547 { 1548 "DataFiles": "goldendata.rdf.gz", 1549 "DataFormat": "", 1550 "SchemaFile": "goldendata.schema", 1551 "DgraphsDir": "out", 1552 "TmpDir": "tmp", 1553 "NumGoroutines": 4, 1554 "MapBufSize": 67108864, 1555 "ExpandEdges": true, 1556 "SkipMapPhase": false, 1557 "CleanupTmp": true, 1558 "NumShufflers": 1, 1559 "Version": false, 1560 "StoreXids": false, 1561 "ZeroAddr": "localhost:5080", 1562 "HttpAddr": "localhost:8000", 1563 "IgnoreErrors": false, 1564 "MapShards": 4, 1565 "ReduceShards": 2 1566 } 1567 The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See `man ulimit` for details of how to change the limit. 1568 Current max open files limit: 1024 1569 MAP 01s rdf_count:176.0 rdf_speed:174.4/sec edge_count:564.0 edge_speed:558.8/sec 1570 MAP 02s rdf_count:399.0 rdf_speed:198.5/sec edge_count:1.291k edge_speed:642.4/sec 1571 MAP 03s rdf_count:666.0 rdf_speed:221.3/sec edge_count:2.164k edge_speed:718.9/sec 1572 MAP 04s rdf_count:952.0 rdf_speed:237.4/sec edge_count:3.014k edge_speed:751.5/sec 1573 MAP 05s rdf_count:1.327k rdf_speed:264.8/sec edge_count:4.243k edge_speed:846.7/sec 1574 MAP 06s rdf_count:1.774k rdf_speed:295.1/sec edge_count:5.720k edge_speed:951.5/sec 1575 MAP 07s rdf_count:2.375k rdf_speed:338.7/sec edge_count:7.607k edge_speed:1.085k/sec 1576 MAP 08s rdf_count:3.697k rdf_speed:461.4/sec edge_count:11.89k edge_speed:1.484k/sec 1577 MAP 09s rdf_count:71.98k rdf_speed:7.987k/sec edge_count:225.4k edge_speed:25.01k/sec 1578 MAP 10s rdf_count:354.8k rdf_speed:35.44k/sec edge_count:1.132M edge_speed:113.1k/sec 1579 MAP 11s rdf_count:610.5k rdf_speed:55.39k/sec edge_count:1.985M edge_speed:180.1k/sec 1580 MAP 12s rdf_count:883.9k rdf_speed:73.52k/sec edge_count:2.907M edge_speed:241.8k/sec 1581 MAP 13s rdf_count:1.108M rdf_speed:85.10k/sec edge_count:3.653M edge_speed:280.5k/sec 1582 MAP 14s rdf_count:1.121M rdf_speed:79.93k/sec edge_count:3.695M edge_speed:263.5k/sec 1583 MAP 15s rdf_count:1.121M rdf_speed:74.61k/sec edge_count:3.695M edge_speed:246.0k/sec 1584 REDUCE 16s [1.69%] edge_count:62.61k edge_speed:62.61k/sec plist_count:29.98k plist_speed:29.98k/sec 1585 REDUCE 17s [18.43%] edge_count:681.2k edge_speed:651.7k/sec plist_count:328.1k plist_speed:313.9k/sec 1586 REDUCE 18s [33.28%] edge_count:1.230M edge_speed:601.1k/sec plist_count:678.9k plist_speed:331.8k/sec 1587 REDUCE 19s [45.70%] edge_count:1.689M edge_speed:554.4k/sec plist_count:905.9k plist_speed:297.4k/sec 1588 REDUCE 20s [60.94%] edge_count:2.252M edge_speed:556.5k/sec plist_count:1.278M plist_speed:315.9k/sec 1589 REDUCE 21s [93.21%] edge_count:3.444M edge_speed:681.5k/sec plist_count:1.555M plist_speed:307.7k/sec 1590 REDUCE 22s [100.00%] edge_count:3.695M edge_speed:610.4k/sec plist_count:1.778M plist_speed:293.8k/sec 1591 REDUCE 22s [100.00%] edge_count:3.695M edge_speed:584.4k/sec plist_count:1.778M plist_speed:281.3k/sec 1592 Total: 22s 1593 ``` 1594 1595 The output will be generated in the `out` directory by default. Here's the bulk 1596 load output from the example above: 1597 1598 ```sh 1599 $ tree ./out 1600 ``` 1601 ``` 1602 ./out 1603 ├── 0 1604 │ └── p 1605 │ ├── 000000.vlog 1606 │ ├── 000002.sst 1607 │ └── MANIFEST 1608 └── 1 1609 └── p 1610 ├── 000000.vlog 1611 ├── 000002.sst 1612 └── MANIFEST 1613 1614 4 directories, 6 files 1615 ``` 1616 1617 Because `--reduce_shards` was set to 2, there are two sets of p directories: one 1618 in `./out/0` directory and another in the `./out/1` directory. 1619 1620 Once the output is created, they can be copied to all the servers that will run 1621 Dgraph Alphas. Each Dgraph Alpha must have its own copy of the group's p 1622 directory output. Each replica of the first group should have its own copy of 1623 `./out/0/p`, each replica of the second group should have its own copy of 1624 `./out/1/p`, and so on. 1625 1626 ```sh 1627 $ dgraph bulk --help # To see the available flags. 1628 1629 # Read RDFs or JSON from the passed file. 1630 $ dgraph bulk -f <path-to-gzipped-RDF-or-JSON-file> ... 1631 1632 # Read multiple RDFs or JSON from the passed path. 1633 $ dgraph bulk -f <./path-to-gzipped-RDF-or-JSON-files> ... 1634 1635 # Read multiple files strictly by name. 1636 $ dgraph bulk -f <file1.rdf, file2.rdf> ... 1637 1638 ``` 1639 1640 #### Other Bulk Loader options 1641 1642 `--new_uids` (default: false): Assign new UIDs instead of using the existing 1643 UIDs in data files. This is useful to avoid overriding the data in a DB already 1644 in operation. 1645 1646 `-f, --files`: Location of *.rdf(.gz) or *.json(.gz) file(s) to load. It can 1647 load multiple files in a given path. If the path is a directory, then all files 1648 ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded. 1649 1650 `--format`: Specify file format (rdf or json) instead of getting it from 1651 filenames. This is useful if you need to define a strict format manually. 1652 1653 #### Tuning & monitoring 1654 1655 ##### Performance Tuning 1656 1657 {{% notice "tip" %}} 1658 We highly recommend [disabling swap 1659 space](https://askubuntu.com/questions/214805/how-do-i-disable-swap) when 1660 running Bulk Loader. It is better to fix the parameters to decrease memory 1661 usage, than to have swapping grind the loader down to a halt. 1662 {{% /notice %}} 1663 1664 Flags can be used to control the behaviour and performance characteristics of 1665 the bulk loader. You can see the full list by running `dgraph bulk --help`. In 1666 particular, **the flags should be tuned so that the bulk loader doesn't use more 1667 memory than is available as RAM**. If it starts swapping, it will become 1668 incredibly slow. 1669 1670 **In the map phase**, tweaking the following flags can reduce memory usage: 1671 1672 - The `--num_go_routines` flag controls the number of worker threads. Lowering reduces memory 1673 consumption. 1674 1675 - The `--mapoutput_mb` flag controls the size of the map output files. Lowering 1676 reduces memory consumption. 1677 1678 For bigger datasets and machines with many cores, gzip decoding can be a 1679 bottleneck during the map phase. Performance improvements can be obtained by 1680 first splitting the RDFs up into many `.rdf.gz` files (e.g. 256MB each). This 1681 has a negligible impact on memory usage. 1682 1683 **The reduce phase** is less memory heavy than the map phase, although can still 1684 use a lot. Some flags may be increased to improve performance, *but only if 1685 you have large amounts of RAM*: 1686 1687 - The `--reduce_shards` flag controls the number of resultant Dgraph alpha instances. 1688 Increasing this increases memory consumption, but in exchange allows for 1689 higher CPU utilization. 1690 1691 - The `--map_shards` flag controls the number of separate map output shards. 1692 Increasing this increases memory consumption but balances the resultant 1693 Dgraph alpha instances more evenly. 1694 1695 - The `--shufflers` controls the level of parallelism in the shuffle/reduce 1696 stage. Increasing this increases memory consumption. 1697 1698 ## Monitoring 1699 Dgraph exposes metrics via the `/debug/vars` endpoint in json format and the `/debug/prometheus_metrics` endpoint in Prometheus's text-based format. Dgraph doesn't store the metrics and only exposes the value of the metrics at that instant. You can either poll this endpoint to get the data in your monitoring systems or install **[Prometheus](https://prometheus.io/docs/introduction/install/)**. Replace targets in the below config file with the ip of your Dgraph instances and run prometheus using the command `prometheus -config.file my_config.yaml`. 1700 ```sh 1701 scrape_configs: 1702 - job_name: "dgraph" 1703 metrics_path: "/debug/prometheus_metrics" 1704 scrape_interval: "2s" 1705 static_configs: 1706 - targets: 1707 - 172.31.9.133:6080 #For Dgraph zero, 6080 is the http endpoint exposing metrics. 1708 - 172.31.15.230:8080 1709 - 172.31.0.170:8080 1710 - 172.31.8.118:8080 1711 ``` 1712 1713 {{% notice "note" %}} 1714 Raw data exported by Prometheus is available via `/debug/prometheus_metrics` endpoint on Dgraph alphas. 1715 {{% /notice %}} 1716 1717 Install **[Grafana](http://docs.grafana.org/installation/)** to plot the metrics. Grafana runs at port 3000 in default settings. Create a prometheus datasource by following these **[steps](https://prometheus.io/docs/visualization/grafana/#creating-a-prometheus-data-source)**. Import **[grafana_dashboard.json](https://github.com/dgraph-io/benchmarks/blob/master/scripts/grafana_dashboard.json)** by following this **[link](http://docs.grafana.org/reference/export_import/#importing-a-dashboard)**. 1718 1719 ## Metrics 1720 1721 Dgraph metrics follow the [metric and label conventions for 1722 Prometheus](https://prometheus.io/docs/practices/naming/). 1723 1724 ### Disk Metrics 1725 1726 The disk metrics let you track the disk activity of the Dgraph process. Dgraph does not interact 1727 directly with the filesystem. Instead it relies on [Badger](https://github.com/dgraph-io/badger) to 1728 read from and write to disk. 1729 1730 Metrics | Description 1731 ------- | ----------- 1732 `badger_disk_reads_total` | Total count of disk reads in Badger. 1733 `badger_disk_writes_total` | Total count of disk writes in Badger. 1734 `badger_gets_total` | Total count of calls to Badger's `get`. 1735 `badger_memtable_gets_total` | Total count of memtable accesses to Badger's `get`. 1736 `badger_puts_total` | Total count of calls to Badger's `put`. 1737 `badger_read_bytes` | Total bytes read from Badger. 1738 `badger_written_bytes` | Total bytes written to Badger. 1739 1740 ### Memory Metrics 1741 1742 The memory metrics let you track the memory usage of the Dgraph process. The idle and inuse metrics 1743 gives you a better sense of the active memory usage of the Dgraph process. The process memory metric 1744 shows the memory usage as measured by the operating system. 1745 1746 By looking at all three metrics you can see how much memory a Dgraph process is holding from the 1747 operating system and how much is actively in use. 1748 1749 Metrics | Description 1750 ------- | ----------- 1751 `dgraph_memory_idle_bytes` | Estimated amount of memory that is being held idle that could be reclaimed by the OS. 1752 `dgraph_memory_inuse_bytes` | Total memory usage in bytes (sum of heap usage and stack usage). 1753 `dgraph_memory_proc_bytes` | Total memory usage in bytes of the Dgraph process. On Linux/macOS, this metric is equivalent to resident set size. On Windows, this metric is equivalent to [Go's runtime.ReadMemStats](https://golang.org/pkg/runtime/#ReadMemStats). 1754 1755 ### Activity Metrics 1756 1757 The activity metrics let you track the mutations, queries, and proposals of an Dgraph instance. 1758 1759 Metrics | Description 1760 ------- | ----------- 1761 `dgraph_goroutines_total` | Total number of Goroutines currently running in Dgraph. 1762 `dgraph_active_mutations_total` | Total number of mutations currently running. 1763 `dgraph_pending_proposals_total` | Total pending Raft proposals. 1764 `dgraph_pending_queries_total` | Total number of queries in progress. 1765 `dgraph_num_queries_total` | Total number of queries run in Dgraph. 1766 1767 ### Health Metrics 1768 1769 The health metrics let you track to check the availability of an Dgraph Alpha instance. 1770 1771 Metrics | Description 1772 ------- | ----------- 1773 `dgraph_alpha_health_status` | **Only applicable to Dgraph Alpha**. Value is 1 when the Alpha is ready to accept requests; otherwise 0. 1774 1775 ### Go Metrics 1776 1777 Go's built-in metrics may also be useful to measure for memory usage and garbage collection time. 1778 1779 Metrics | Description 1780 ------- | ----------- 1781 `go_memstats_gc_cpu_fraction` | The fraction of this program's available CPU time used by the GC since the program started. 1782 `go_memstats_heap_idle_bytes` | Number of heap bytes waiting to be used. 1783 `go_memstats_heap_inuse_bytes` | Number of heap bytes that are in use. 1784 1785 ## Tracing 1786 1787 Dgraph is integrated with [OpenCensus](https://opencensus.io/zpages/) to collect distributed traces from the Dgraph cluster. 1788 1789 Trace data is always collected within Dgraph. You can adjust the trace sampling rate for Dgraph queries with the `--trace` option for Dgraph Alphas. By default, `--trace` is set to 1 to trace 100% of queries. 1790 1791 ### Examining Traces with zPages 1792 1793 The most basic way to view traces is with the integrated trace pages. 1794 1795 OpenCensus's [zPages](https://opencensus.io/zpages/) are accessible via the Zero or Alpha HTTP port at `/z/tracez`. 1796 1797 ### Examining Traces with Jaeger 1798 1799 Jaeger collects distributed traces and provides a UI to view and query traces across different services. This provides the necessary observability to figure out what is happening in the system. 1800 1801 Dgraph can be configured to send traces directly to a Jaeger collector with the `--jaeger.collector` flag. For example, if the Jaeger collector is running on `http://localhost:14268`, then pass the flag to the Dgraph Zero and Dgraph Alpha instances as `--jaeger.collector=http://localhost:14268`. 1802 1803 See [Jaeger's Getting Started docs](https://www.jaegertracing.io/docs/getting-started/) to get up and running with Jaeger. 1804 1805 ## Dgraph Administration 1806 1807 Each Dgraph Alpha exposes administrative operations over HTTP to export data and to perform a clean shutdown. 1808 1809 ### Whitelist Admin Operations 1810 1811 By default, admin operations can only be initiated from the machine on which the Dgraph Alpha runs. 1812 You can use the `--whitelist` option to specify whitelisted IP addresses and ranges for hosts from which admin operations can be initiated. 1813 1814 ```sh 1815 dgraph alpha --whitelist 172.17.0.0:172.20.0.0,192.168.1.1 --lru_mb <one-third RAM> ... 1816 ``` 1817 This would allow admin operations from hosts with IP between `172.17.0.0` and `172.20.0.0` along with 1818 the server which has IP address as `192.168.1.1`. 1819 1820 ### Restrict Mutation Operations 1821 1822 By default, you can perform mutation operations for any predicate. 1823 If the predicate in mutation doesn't exist in the schema, 1824 the predicate gets added to the schema with an appropriate 1825 [Dgraph Type](https://docs.dgraph.io/master/query-language/#schema-types). 1826 1827 You can use `--mutations disallow` to disable all mutations, 1828 which is set to `allow` by default. 1829 1830 ```sh 1831 dgraph alpha --mutations disallow 1832 ``` 1833 1834 Enforce a strict schema by setting `--mutations strict`. 1835 This mode allows mutations only on predicates already in the schema. 1836 Before performing a mutation on a predicate that doesn't exist in the schema, 1837 you need to perform an alter operation with that predicate and its schema type. 1838 1839 ```sh 1840 dgraph alpha --mutations strict 1841 ``` 1842 1843 ### Secure Alter Operations 1844 1845 Clients can use alter operations to apply schema updates and drop particular or all predicates from the database. 1846 By default, all clients are allowed to perform alter operations. 1847 You can configure Dgraph to only allow alter operations when the client provides a specific token. 1848 This can be used to prevent clients from making unintended or accidental schema updates or predicate drops. 1849 1850 You can specify the auth token with the `--auth_token` option for each Dgraph Alpha in the cluster. 1851 Clients must include the same auth token to make alter requests. 1852 1853 ```sh 1854 $ dgraph alpha --lru_mb=2048 --auth_token=<authtokenstring> 1855 ``` 1856 1857 ```sh 1858 $ curl -s localhost:8080/alter -d '{ "drop_all": true }' 1859 # Permission denied. No token provided. 1860 ``` 1861 1862 ```sh 1863 $ curl -s -H 'X-Dgraph-AuthToken: <wrongsecret>' localhost:8180/alter -d '{ "drop_all": true }' 1864 # Permission denied. Incorrect token. 1865 ``` 1866 1867 ```sh 1868 $ curl -H 'X-Dgraph-AuthToken: <authtokenstring>' localhost:8180/alter -d '{ "drop_all": true }' 1869 # Success. Token matches. 1870 ``` 1871 1872 {{% notice "note" %}} 1873 To fully secure alter operations in the cluster, the auth token must be set for every Alpha. 1874 {{% /notice %}} 1875 1876 1877 ### Export Database 1878 1879 An export of all nodes is started by locally accessing the export endpoint of any Alpha in the cluster. 1880 1881 ```sh 1882 $ curl localhost:8080/admin/export 1883 ``` 1884 {{% notice "warning" %}}By default, this won't work if called from outside the server where the Dgraph Alpha is running. 1885 You can specify a list or range of whitelisted IP addresses from which export or other admin operations 1886 can be initiated using the `--whitelist` flag on `dgraph alpha`. 1887 {{% /notice %}} 1888 1889 This also works from a browser, provided the HTTP GET is being run from the same server where the Dgraph alpha instance is running. 1890 1891 This triggers an export for all Alpha groups of the cluster. The data is exported from the following Dgraph instances: 1892 1893 1. For the Alpha instance that receives the GET request, the group's export data is stored with this Alpha. 1894 2. For every other group, its group's export data is stored with the Alpha leader of that group. 1895 1896 It is up to the user to retrieve the right export files from the Alphas in the 1897 cluster. Dgraph does not copy all files to the Alpha that initiated the export. 1898 The user must also ensure that there is sufficient space on disk to store the 1899 export. 1900 1901 Each Alpha leader for a group writes output as a gzipped file to the export 1902 directory specified via the `--export` flag (defaults to a directory called `"export"`). If any of the groups fail, the 1903 entire export process is considered failed and an error is returned. 1904 1905 The data is exported in RDF format by default. A different output format may be specified with the 1906 `format` URL parameter. For example: 1907 1908 ```sh 1909 $ curl 'localhost:8080/admin/export?format=json' 1910 ``` 1911 1912 Currently, "rdf" and "json" are the only formats supported. 1913 1914 ### Shutdown Database 1915 1916 A clean exit of a single Dgraph node is initiated by running the following command on that node. 1917 {{% notice "warning" %}}This won't work if called from outside the server where Dgraph is running. 1918 {{% /notice %}} 1919 1920 ```sh 1921 $ curl localhost:8080/admin/shutdown 1922 ``` 1923 1924 This stops the Alpha on which the command is executed and not the entire cluster. 1925 1926 ### Delete database 1927 1928 Individual triples, patterns of triples and predicates can be deleted as described in the [query languge docs](/query-language#delete). 1929 1930 To drop all data, you could send a `DropAll` request via `/alter` endpoint. 1931 1932 Alternatively, you could: 1933 1934 * [stop Dgraph]({{< relref "#shutdown-database" >}}) and wait for all writes to complete, 1935 * delete (maybe do an export first) the `p` and `w` directories, then 1936 * restart Dgraph. 1937 1938 ### Upgrade Database 1939 1940 Doing periodic exports is always a good idea. This is particularly useful if you wish to upgrade Dgraph or reconfigure the sharding of a cluster. The following are the right steps safely export and restart. 1941 1942 - Start an [export]({{< relref "#export">}}) 1943 - Ensure it's successful 1944 - Bring down the cluster 1945 - Run Dgraph using new data directories. 1946 - Reload the data via [bulk loader]({{< relref "#bulk-loader" >}}). 1947 - If all looks good, you can delete the old directories (export serves as an insurance) 1948 1949 These steps are necessary because Dgraph's underlying data format could have changed, and reloading the export avoids encoding incompatibilities. 1950 1951 ### Post Installation 1952 1953 Now that Dgraph is up and running, to understand how to add and query data to Dgraph, follow [Query Language Spec](/query-language). Also, have a look at [Frequently asked questions](/faq). 1954 1955 ## Troubleshooting 1956 1957 Here are some problems that you may encounter and some solutions to try. 1958 1959 #### Running OOM (out of memory) 1960 1961 During bulk loading of data, Dgraph can consume more memory than usual, due to high volume of writes. That's generally when you see the OOM crashes. 1962 1963 The recommended minimum RAM to run on desktops and laptops is 16GB. Dgraph can take up to 7-8 GB with the default setting `--lru_mb` set to 4096; so having the rest 8GB for desktop applications should keep your machine humming along. 1964 1965 On EC2/GCE instances, the recommended minimum is 8GB. It's recommended to set `--lru_mb` to one-third of RAM size. 1966 1967 You could also decrease memory usage of Dgraph by setting `--badger.vlog=disk`. 1968 1969 #### Too many open files 1970 1971 If you see an log error messages saying `too many open files`, you should increase the per-process file descriptors limit. 1972 1973 During normal operations, Dgraph must be able to open many files. Your operating system may set by default a open file descriptor limit lower than what's needed for a database such as Dgraph. 1974 1975 On Linux and Mac, you can check the file descriptor limit with `ulimit -n -H` for the hard limit and `ulimit -n -S` for the soft limit. The soft limit should be set high enough for Dgraph to run properly. A soft limit of 65535 is a good lower bound for a production setup. You can adjust the limit as needed. 1976 1977 ## See Also 1978 1979 * [Product Roadmap to v1.0](https://github.com/dgraph-io/dgraph/issues/1)