github.com/dgraph-io/dgraph@v1.2.8/wiki/content/deploy/index.md (about)

     1  +++
     2  date = "2017-03-20T22:25:17+11:00"
     3  title = "Deploy"
     4  +++
     5  
     6  This page talks about running Dgraph in various deployment modes, in a distributed fashion and involves
     7  running multiple instances of Dgraph, over multiple servers in a cluster.
     8  
     9  {{% notice "tip" %}}
    10  For a single server setup, recommended for new users, please see [Get Started](/get-started) page.
    11  {{% /notice %}}
    12  
    13  ## Install Dgraph
    14  #### Docker
    15  
    16  ```sh
    17  docker pull dgraph/dgraph:latest
    18  
    19  # You can test that it worked fine, by running:
    20  docker run -it dgraph/dgraph:latest dgraph
    21  ```
    22  
    23  #### Automatic download
    24  
    25  Running
    26  ```sh
    27  curl https://get.dgraph.io -sSf | bash
    28  
    29  # Test that it worked fine, by running:
    30  dgraph
    31  ```
    32  would install the `dgraph` binary into your system.
    33  
    34  #### Manual download [optional]
    35  
    36  If you don't want to follow the automatic installation method, you could manually download the appropriate tar for your platform from **[Dgraph releases](https://github.com/dgraph-io/dgraph/releases)**. After downloading the tar for your platform from Github, extract the binary to `/usr/local/bin` like so.
    37  
    38  ```sh
    39  # For Linux
    40  $ sudo tar -C /usr/local/bin -xzf dgraph-linux-amd64-VERSION.tar.gz
    41  
    42  # For Mac
    43  $ sudo tar -C /usr/local/bin -xzf dgraph-darwin-amd64-VERSION.tar.gz
    44  
    45  # Test that it worked fine, by running:
    46  dgraph
    47  ```
    48  
    49  #### Building from Source
    50  
    51  {{% notice "note" %}}
    52  You can build the Ratel UI from source seperately following its build
    53  [instructions](https://github.com/dgraph-io/ratel/blob/master/INSTRUCTIONS.md).
    54  Ratel UI is distributed via Dgraph releases using any of the download methods
    55  listed above.
    56  {{% /notice %}}
    57  
    58  Make sure you have [Go](https://golang.org/dl/) v1.11+ installed.
    59  
    60  You'll need the following dependencies to install Dgraph using `make`:
    61  ```bash
    62  sudo apt-get update
    63  sudo apt-get install gcc make
    64  ```
    65  
    66  After installing Go, run
    67  ```sh
    68  # This should install dgraph binary in your $GOPATH/bin.
    69  
    70  git clone https://github.com/dgraph-io/dgraph.git
    71  cd ./dgraph
    72  make install
    73  ```
    74  
    75  If you get errors related to `grpc` while building them, your
    76  `go-grpc` version might be outdated. We don't vendor in `go-grpc`(because it
    77  causes issues while using the Go client). Update your `go-grpc` by running.
    78  ```sh
    79  go get -u -v google.golang.org/grpc
    80  ```
    81  
    82  #### Config
    83  
    84  The full set of dgraph's configuration options (along with brief descriptions)
    85  can be viewed by invoking dgraph with the `--help` flag. For example, to see
    86  the options available for `dgraph alpha`, run `dgraph alpha --help`.
    87  
    88  The options can be configured in multiple ways (from highest precedence to
    89  lowest precedence):
    90  
    91  - Using command line flags (as described in the help output).
    92  
    93  - Using environment variables.
    94  
    95  - Using a configuration file.
    96  
    97  If no configuration for an option is used, then the default value as described
    98  in the `--help` output applies.
    99  
   100  Multiple configuration methods can be used all at the same time. E.g. a core
   101  set of options could be set in a config file, and instance specific options
   102  could be set using environment vars or flags.
   103  
   104  The environment variable names mirror the flag names as seen in the `--help`
   105  output. They are the concatenation of `DGRAPH`, the subcommand invoked
   106  (`ALPHA`, `ZERO`, `LIVE`, or `BULK`), and then the name of the flag (in
   107  uppercase). For example, instead of using `dgraph alpha --lru_mb=8096`, you
   108  could use `DGRAPH_ALPHA_LRU_MB=8096 dgraph alpha`.
   109  
   110  Configuration file formats supported are JSON, TOML, YAML, HCL, and Java
   111  properties (detected via file extension). The file extensions are .json, .toml,
   112  .yml or .yaml, .hcl, and .properties for each format.
   113  
   114  A configuration file can be specified using the `--config` flag, or an
   115  environment variable. E.g. `dgraph zero --config my_config.json` or
   116  `DGRAPH_ZERO_CONFIG=my_config.json dgraph zero`.
   117  
   118  The config file structure is just simple key/value pairs (mirroring the flag
   119  names).
   120  
   121  Example JSON config file (config.json):
   122  
   123  ```json
   124  {
   125    "my": "localhost:7080",
   126    "zero": "localhost:5080",
   127    "lru_mb": 4096,
   128    "postings": "/path/to/p",
   129    "wal": "/path/to/w"
   130  }
   131  ```
   132  
   133  Example TOML config file (config.toml):
   134  
   135  ```toml
   136  my = "localhost:7080"
   137  zero = "localhost:5080"
   138  lru_mb = 4096
   139  postings = "/path/to/p"
   140  wal = "/path/to/w"
   141  ```
   142  
   143  
   144  Example YAML config file (config.yml):
   145  
   146  ```yaml
   147  my: "localhost:7080"
   148  zero: "localhost:5080"
   149  lru_mb: 4096
   150  postings: "/path/to/p"
   151  wal: "/path/to/w"
   152  ```
   153  
   154  Example HCL config file (config.hcl):
   155  
   156  ```hcl
   157  my = "localhost:7080"
   158  zero = "localhost:5080"
   159  lru_mb = 4096
   160  postings = "/path/to/p"
   161  wal = "/path/to/w"
   162  ```
   163  
   164  Example Java properties config file (config.properties):
   165  ```text
   166  my=localhost:7080
   167  zero=localhost:5080
   168  lru_mb=4096
   169  postings=/path/to/p
   170  wal=/path/to/w
   171  ```
   172  
   173  ## Cluster Setup
   174  
   175  ### Understanding Dgraph cluster
   176  
   177  Dgraph is a truly distributed graph database - not a master-slave replication of
   178  universal dataset. It shards by predicate and replicates predicates across the
   179  cluster, queries can be run on any node and joins are handled over the
   180  distributed data.  A query is resolved locally for predicates the node stores,
   181  and via distributed joins for predicates stored on other nodes.
   182  
   183  For effectively running a Dgraph cluster, it's important to understand how
   184  sharding, replication and rebalancing works.
   185  
   186  **Sharding**
   187  
   188  Dgraph colocates data per predicate (* P *, in RDF terminology), thus the
   189  smallest unit of data is one predicate. To shard the graph, one or many
   190  predicates are assigned to a group. Each Alpha node in the cluster serves a
   191  single group. Dgraph Zero assigns a group to each Alpha node.
   192  
   193  **Shard rebalancing**
   194  
   195  Dgraph Zero tries to rebalance the cluster based on the disk usage in each
   196  group. If Zero detects an imbalance, it would try to move a predicate along with
   197  its indices to a group that has minimum disk usage. This can make the predicate
   198  temporarily read-only. Queries for the predicate will still be serviced, but any
   199  mutations for the predicate will be rejected and should be retried after the
   200  move is finished.
   201  
   202  Zero would continuously try to keep the amount of data on each server even,
   203  typically running this check on a 10-min frequency.  Thus, each additional
   204  Dgraph Alpha instance would allow Zero to further split the predicates from
   205  groups and move them to the new node.
   206  
   207  **Consistent Replication**
   208  
   209  If `--replicas` flag is set to something greater than one, Zero would assign the
   210  same group to multiple nodes. These nodes would then form a Raft group aka
   211  quorum. Every write would be consistently replicated to the quorum. To achieve
   212  consensus, its important that the size of quorum be an odd number. Therefore, we
   213  recommend setting `--replicas` to 1, 3 or 5 (not 2 or 4). This allows 0, 1, or 2
   214  nodes serving the same group to be down, respectively without affecting the
   215  overall health of that group.
   216  
   217  ## Ports Usage
   218  
   219  Dgraph cluster nodes use different ports to communicate over gRPC and HTTP. User has to pay attention while choosing these ports based on their topology and deployment-mode as each port needs different access security rules or firewall.
   220  
   221  ### Types of ports
   222  
   223  - **gRPC-internal:** Port that is used between the cluster nodes for internal communication and message exchange.
   224  - **gRPC-external:** Port that is used by Dgraph clients, Dgraph Live Loader , and Dgraph Bulk loader to access APIs over gRPC.
   225  - **http-external:** Port that is used by clients to access APIs over HTTP and other monitoring & administrative tasks.
   226  
   227  ### Ports used by different nodes
   228  
   229   Dgraph Node Type | gRPC-internal  | gRPC-external | HTTP-external
   230  ------------------|----------------|---------------|---------------
   231         zero       |  --Not Used--  |     5080      |     6080
   232         alpha      |      7080      |     9080      |     8080
   233         ratel      |  --Not Used--  | --Not Used--  |     8000
   234  
   235  Users have to modify security rules or open firewall depending up on their underlying network to allow communication between cluster nodes and between a server and a client. During development a general rule could be wide open *-external (gRPC/HTTP) ports to public and gRPC-internal to be open within the cluster nodes.
   236  
   237  **Ratel UI** accesses Dgraph Alpha on the HTTP-external port (default localhost:8080) and can be configured to talk to remote Dgraph cluster. This way you can run Ratel on your local machine and point to a remote cluster. But if you are deploying Ratel along with Dgraph cluster, then you may have to expose 8000 to the public.
   238  
   239  **Port Offset** To make it easier for user to setup the cluster, Dgraph defaults the ports used by Dgraph nodes and let user to provide an offset  (through command option `--port_offset`) to define actual ports used by the node. Offset can also be used when starting multiple zero nodes in a HA setup.
   240  
   241  For example, when a user runs a Dgraph Alpha by setting `--port_offset 2`, then the Alpha node binds to 7082 (gRPC-internal), 8082 (HTTP-external) & 9092 (gRPC-external) respectively.
   242  
   243  **Ratel UI** by default listens on port 8000. You can use the `-port` flag to configure to listen on any other port.
   244  
   245  {{% notice "tip" %}}
   246  **For Dgraph v1.0.2 (or older)**
   247  
   248  Zero's default ports are 7080 and 8080. When following instructions for the different setup guides below, override the Zero ports using `--port_offset` to match the current default ports.
   249  
   250  ```sh
   251  # Run Zero with ports 5080 and 6080
   252  dgraph zero --idx=1 --port_offset -2000
   253  # Run Zero with ports 5081 and 6081
   254  dgraph zero --idx=2 --port_offset -1999
   255  ```
   256  Likewise, Ratel's default port is 8081, so override it using `--port` to the current default port.
   257  
   258  ```sh
   259  dgraph-ratel --port 8080
   260  ```
   261  {{% /notice %}}
   262  
   263  ### HA Cluster Setup
   264  
   265  In a high-availability setup, we need to run 3 or 5 replicas for Zero, and similarly, 3 or 5 replicas for Alpha.
   266  {{% notice "note" %}}
   267  If number of replicas is 2K + 1, up to **K servers** can be down without any impact on reads or writes.
   268  
   269  Avoid keeping replicas to 2K (even number). If K servers go down, this would block reads and writes, due to lack of consensus.
   270  {{% /notice %}}
   271  
   272  **Dgraph Zero**
   273  Run three Zero instances, assigning a unique ID(Integer) to each via `--idx` flag, and
   274  passing the address of any healthy Zero instance via `--peer` flag.
   275  
   276  To run three replicas for the alphas, set `--replicas=3`. Every time a new
   277  Dgraph Alpha is added, Zero would check the existing groups and assign them to
   278  one, which doesn't have three replicas.
   279  
   280  **Dgraph Alpha**
   281  Run as many Dgraph Alphas as you want. You can manually set `--idx` flag, or you
   282  can leave that flag empty, and Zero would auto-assign an id to the Alpha. This
   283  id would get persisted in the write-ahead log, so be careful not to delete it.
   284  
   285  The new Alphas will automatically detect each other by communicating with
   286  Dgraph zero and establish connections to each other. You can provide a list of
   287  zero addresses to alpha using the `--zero` flag. Alpha will try to connect to
   288  one of the zeros starting from the first zero address in the list. For example:
   289  `--zero=zero1,zero2,zero3` where zero1 is the `host:port` of a zero instance.
   290  
   291  Typically, Zero would first attempt to replicate a group, by assigning a new
   292  Dgraph alpha to run the same group as assigned to another. Once the group has
   293  been replicated as per the `--replicas` flag, Zero would create a new group.
   294  
   295  Over time, the data would be evenly split across all the groups. So, it's
   296  important to ensure that the number of Dgraph alphas is a multiple of the
   297  replication setting. For e.g., if you set `--replicas=3` in Zero, then run three
   298  Dgraph alphas for no sharding, but 3x replication. Run six Dgraph alphas, for
   299  sharding the data into two groups, with 3x replication.
   300  
   301  ## Single Host Setup
   302  
   303  ### Run directly on the host
   304  
   305  **Run dgraph zero**
   306  
   307  ```sh
   308  dgraph zero --my=IPADDR:5080
   309  ```
   310  The `--my` flag is the connection that Dgraph alphas would dial to talk to
   311  zero. So, the port `5080` and the IP address must be visible to all the Dgraph alphas.
   312  
   313  For all other various flags, run `dgraph zero --help`.
   314  
   315  **Run dgraph alpha**
   316  
   317  ```sh
   318  dgraph alpha --lru_mb=<typically one-third the RAM> --my=IPADDR:7080 --zero=localhost:5080
   319  dgraph alpha --lru_mb=<typically one-third the RAM> --my=IPADDR:7081 --zero=localhost:5080 -o=1
   320  ```
   321  
   322  Notice the use of `-o` for the second Alpha to add offset to the default ports used. Zero automatically assigns an unique ID to each Alpha, which is persisted in the write ahead log (wal) directory, users can specify the index using `--idx` option. Dgraph Alphas use two directories to persist data and
   323  wal logs, and these directories must be different for each Alpha if they are running on the same host. You can use `-p` and `-w` to change the location of the data and WAL directories. For all other flags, run
   324  
   325  `dgraph alpha --help`.
   326  
   327  **Run dgraph UI**
   328  
   329  ```sh
   330  dgraph-ratel
   331  ```
   332  
   333  ### Run using Docker
   334  
   335  Dgraph cluster can be setup running as containers on a single host. First, you'd want to figure out the host IP address. You can typically do that via
   336  
   337  ```sh
   338  ip addr  # On Arch Linux
   339  ifconfig # On Ubuntu/Mac
   340  ```
   341  We'll refer to the host IP address via `HOSTIPADDR`.
   342  
   343  **Run dgraph zero**
   344  
   345  ```sh
   346  mkdir ~/zero # Or any other directory where data should be stored.
   347  
   348  docker run -it -p 5080:5080 -p 6080:6080 -v ~/zero:/dgraph dgraph/dgraph:latest dgraph zero --my=HOSTIPADDR:5080
   349  ```
   350  
   351  **Run dgraph alpha**
   352  ```sh
   353  mkdir ~/server1 # Or any other directory where data should be stored.
   354  
   355  docker run -it -p 7080:7080 -p 8080:8080 -p 9080:9080 -v ~/server1:/dgraph dgraph/dgraph:latest dgraph alpha --lru_mb=<typically one-third the RAM> --zero=HOSTIPADDR:5080 --my=HOSTIPADDR:7080
   356  
   357  mkdir ~/server2 # Or any other directory where data should be stored.
   358  
   359  docker run -it -p 7081:7081 -p 8081:8081 -p 9081:9081 -v ~/server2:/dgraph dgraph/dgraph:latest dgraph alpha --lru_mb=<typically one-third the RAM> --zero=HOSTIPADDR:5080 --my=HOSTIPADDR:7081  -o=1
   360  ```
   361  Notice the use of -o for server2 to override the default ports for server2.
   362  
   363  **Run dgraph UI**
   364  ```sh
   365  docker run -it -p 8000:8000 dgraph/dgraph:latest dgraph-ratel
   366  ```
   367  
   368  ### Run using Docker Compose (On single AWS instance)
   369  
   370  We will use [Docker Machine](https://docs.docker.com/machine/overview/). It is a tool that lets you install Docker Engine on virtual machines and easily deploy applications.
   371  
   372  * [Install Docker Machine](https://docs.docker.com/machine/install-machine/) on your machine.
   373  
   374  {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS config.
   375  Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}}
   376  
   377  Here we'll go through an example of deploying Dgraph Zero, Alpha and Ratel on an AWS instance.
   378  
   379  * Make sure you have Docker Machine installed by following [instructions](https://docs.docker.com/machine/install-machine/), provisioning an instance on AWS is just one step away. You'll have to [configure your AWS credentials](http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html) for programmatic access to the Amazon API.
   380  
   381  * Create a new docker machine.
   382  
   383  ```sh
   384  docker-machine create --driver amazonec2 aws01
   385  ```
   386  
   387  Your output should look like
   388  
   389  ```sh
   390  Running pre-create checks...
   391  Creating machine...
   392  (aws01) Launching instance...
   393  ...
   394  ...
   395  Docker is up and running!
   396  To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env aws01
   397  ```
   398  
   399  The command would provision a `t2-micro` instance with a security group called `docker-machine`
   400  (allowing inbound access on 2376 and 22). You can either edit the security group to allow inbound access to '5080`, `8080`, `9080` (default ports for Dgraph Zero & Alpha) or you can provide your own security
   401  group which allows inbound access on port 22, 2376 (required by Docker Machine), 5080, 8080 and 9080. Remember port *5080* is only required if you are running Dgraph Live Loader or Dgraph Bulk Loader from outside.
   402  
   403  [Here](https://docs.docker.com/machine/drivers/aws/#options) is a list of full options for the `amazonec2` driver which allows you choose the instance type, security group, AMI among many other things.
   404  
   405  {{% notice "tip" %}}Docker machine supports [other drivers](https://docs.docker.com/machine/drivers/gce/) like GCE, Azure etc.{{% /notice %}}
   406  
   407  * Install and run Dgraph using docker-compose
   408  
   409  Docker Compose is a tool for running multi-container Docker applications. You can follow the
   410  instructions [here](https://docs.docker.com/compose/install/) to install it.
   411  
   412  Copy the file below in a directory on your machine and name it `docker-compose.yml`.
   413  
   414  ```sh
   415  version: "3.2"
   416  services:
   417    zero:
   418      image: dgraph/dgraph:latest
   419      volumes:
   420        - /data:/dgraph
   421      ports:
   422        - 5080:5080
   423        - 6080:6080
   424      restart: on-failure
   425      command: dgraph zero --my=zero:5080
   426    server:
   427      image: dgraph/dgraph:latest
   428      volumes:
   429        - /data:/dgraph
   430      ports:
   431        - 8080:8080
   432        - 9080:9080
   433      restart: on-failure
   434      command: dgraph alpha --my=server:7080 --lru_mb=2048 --zero=zero:5080
   435    ratel:
   436      image: dgraph/dgraph:latest
   437      ports:
   438        - 8000:8000
   439      command: dgraph-ratel
   440  ```
   441  
   442  {{% notice "note" %}}The config mounts `/data`(you could mount something else) on the instance to `/dgraph` within the
   443  container for persistence.{{% /notice %}}
   444  
   445  * Connect to the Docker Engine running on the machine.
   446  
   447  Running `docker-machine env aws01` tells us to run the command below to configure
   448  our shell.
   449  ```
   450  eval $(docker-machine env aws01)
   451  ```
   452  This configures our Docker client to talk to the Docker engine running on the AWS Machine.
   453  
   454  Finally run the command below to start the Zero and Alpha.
   455  ```
   456  docker-compose up -d
   457  ```
   458  This would start 3 Docker containers running Dgraph Zero, Alpha and Ratel on the same machine. Docker would restart the containers in case there is any error.
   459  You can look at the logs using `docker-compose logs`.
   460  
   461  ## Multi Host Setup
   462  
   463  ### Using Docker Swarm
   464  
   465  #### Cluster Setup Using Docker Swarm
   466  
   467  {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS config.
   468  Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}}
   469  
   470  Here we'll go through an example of deploying 3 Dgraph Alpha nodes and 1 Zero on three different AWS instances using Docker Swarm with a replication factor of 3.
   471  
   472  * Make sure you have Docker Machine installed by following [instructions](https://docs.docker.com/machine/install-machine/).
   473  
   474  ```sh
   475  docker-machine --version
   476  ```
   477  
   478  * Create 3 instances on AWS and [install Docker Engine](https://docs.docker.com/engine/installation/) on them. This can be done manually or by using `docker-machine`.
   479  You'll have to [configure your AWS credentials](http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html) to create the instances using Docker Machine.
   480  
   481  Considering that you have AWS credentials setup, you can use the below commands to start 3 AWS
   482  `t2-micro` instances with Docker Engine installed on them.
   483  
   484  ```sh
   485  docker-machine create --driver amazonec2 aws01
   486  docker-machine create --driver amazonec2 aws02
   487  docker-machine create --driver amazonec2 aws03
   488  ```
   489  
   490  Your output should look like
   491  
   492  ```sh
   493  Running pre-create checks...
   494  Creating machine...
   495  (aws01) Launching instance...
   496  ...
   497  ...
   498  Docker is up and running!
   499  To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: docker-machine env aws01
   500  ```
   501  
   502  The command would provision a `t2-micro` instance with a security group called `docker-machine`
   503  (allowing inbound access on 2376 and 22).
   504  
   505  You would need to edit the `docker-machine` security group to open inbound traffic on the following ports.
   506  
   507  1. Allow all inbound traffic on all ports with Source being `docker-machine`
   508     security ports so that Docker related communication can happen easily.
   509  
   510  2. Also open inbound TCP traffic on the following ports required by Dgraph:
   511     `5080`, `6080`, `8000`, `808[0-2]`, `908[0-2]`. Remember port *5080* is only
   512     required if you are running Dgraph Live Loader or Dgraph Bulk Loader from
   513     outside. You need to open `7080` to enable Alpha-to-Alpha communication in
   514     case you have not opened all ports in #1.
   515  
   516  If you are on AWS, below is the security group (**docker-machine**) after
   517  necessary changes.
   518  
   519  {{% load-img "/images/aws.png" "AWS Security Group" %}}
   520  
   521  [Here](https://docs.docker.com/machine/drivers/aws/#options) is a list of full options for the `amazonec2` driver which allows you choose the
   522  instance type, security group, AMI among many other
   523  things.
   524  
   525  {{% notice "tip" %}}Docker machine supports [other drivers](https://docs.docker.com/machine/drivers/gce/) like GCE, Azure etc.{{% /notice %}}
   526  
   527  Running `docker-machine ps` shows all the AWS EC2 instances that we started.
   528  ```sh
   529  ➜  ~ docker-machine ls
   530  NAME    ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER        ERRORS
   531  aws01   -        amazonec2    Running   tcp://34.200.239.30:2376            v17.11.0-ce
   532  aws02   -        amazonec2    Running   tcp://54.236.58.120:2376            v17.11.0-ce
   533  aws03   -        amazonec2    Running   tcp://34.201.22.2:2376              v17.11.0-ce
   534  ```
   535  
   536  * Start the Swarm
   537  
   538  Docker Swarm has manager and worker nodes. Swarm can be started and updated on manager nodes. We
   539     will setup `aws01` as swarm manager. You can first run the following commands to initialize the
   540     swarm.
   541  
   542  We are going to use the internal IP address given by AWS. Run the following command to get the
   543  internal IP for `aws01`. Lets assume `172.31.64.18` is the internal IP in this case.
   544  ```
   545  docker-machine ssh aws01 ifconfig eth0
   546  ```
   547  
   548  Now that we have the internal IP, let's initiate the Swarm.
   549  
   550  ```sh
   551  # This configures our Docker client to talk to the Docker engine running on the aws01 host.
   552  eval $(docker-machine env aws01)
   553  docker swarm init --advertise-addr 172.31.64.18
   554  ```
   555  
   556  Output:
   557  ```
   558  Swarm initialized: current node (w9mpjhuju7nyewmg8043ypctf) is now a manager.
   559  
   560  To add a worker to this swarm, run the following command:
   561  
   562      docker swarm join \
   563      --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \
   564      172.31.64.18:2377
   565  
   566  To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
   567  ```
   568  
   569  Now we will make other nodes join the swarm.
   570  
   571  ```sh
   572  eval $(docker-machine env aws02)
   573  docker swarm join \
   574      --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \
   575      172.31.64.18:2377
   576  ```
   577  
   578  Output:
   579  ```
   580  This node joined a swarm as a worker.
   581  ```
   582  
   583  Similarly, aws03
   584  ```sh
   585  eval $(docker-machine env aws03)
   586  docker swarm join \
   587      --token SWMTKN-1-1y7lba98i5jv9oscf10sscbvkmttccdqtkxg478g3qahy8dqvg-5r5cbsntc1aamsw3s4h3thvgk \
   588      172.31.64.18:2377
   589  ```
   590  
   591  On the Swarm manager `aws01`, verify that your swarm is running.
   592  ```sh
   593  docker node ls
   594  ```
   595  
   596  Output:
   597  ```sh
   598  ID                            HOSTNAME            STATUS              AVAILABILITY        MANAGER STATUS
   599  ghzapjsto20c6d6l3n0m91zev     aws02               Ready               Active
   600  rb39d5lgv66it1yi4rto0gn6a     aws03               Ready               Active
   601  waqdyimp8llvca9i09k4202x5 *   aws01               Ready               Active              Leader
   602  ```
   603  
   604  * Start the Dgraph cluster
   605  
   606  Copy the following file on your host machine and name it as `docker-compose.yml`
   607  
   608  ```sh
   609  version: "3"
   610  networks:
   611    dgraph:
   612  services:
   613    zero:
   614      image: dgraph/dgraph:latest
   615      volumes:
   616        - data-volume:/dgraph
   617      ports:
   618        - 5080:5080
   619        - 6080:6080
   620      networks:
   621        - dgraph
   622      deploy:
   623        placement:
   624          constraints:
   625            - node.hostname == aws01
   626      command: dgraph zero --my=zero:5080 --replicas 3
   627    alpha1:
   628      image: dgraph/dgraph:latest
   629      hostname: "alpha1"
   630      volumes:
   631        - data-volume:/dgraph
   632      ports:
   633        - 8080:8080
   634        - 9080:9080
   635      networks:
   636        - dgraph
   637      deploy:
   638        placement:
   639          constraints:
   640            - node.hostname == aws01
   641      command: dgraph alpha --my=alpha1:7080 --lru_mb=2048 --zero=zero:5080
   642    alpha2:
   643      image: dgraph/dgraph:latest
   644      hostname: "alpha2"
   645      volumes:
   646        - data-volume:/dgraph
   647      ports:
   648        - 8081:8081
   649        - 9081:9081
   650      networks:
   651        - dgraph
   652      deploy:
   653        placement:
   654          constraints:
   655            - node.hostname == aws02
   656      command: dgraph alpha --my=alpha2:7081 --lru_mb=2048 --zero=zero:5080 -o 1
   657    alpha3:
   658      image: dgraph/dgraph:latest
   659      hostname: "alpha3"
   660      volumes:
   661        - data-volume:/dgraph
   662      ports:
   663        - 8082:8082
   664        - 9082:9082
   665      networks:
   666        - dgraph
   667      deploy:
   668        placement:
   669          constraints:
   670            - node.hostname == aws03
   671      command: dgraph alpha --my=alpha3:7082 --lru_mb=2048 --zero=zero:5080 -o 2
   672    ratel:
   673      image: dgraph/dgraph:latest
   674      hostname: "ratel"
   675      ports:
   676        - 8000:8000
   677      networks:
   678        - dgraph
   679      command: dgraph-ratel
   680  volumes:
   681    data-volume:
   682  ```
   683  Run the following command on the Swarm leader to deploy the Dgraph Cluster.
   684  
   685  ```sh
   686  eval $(docker-machine env aws01)
   687  docker stack deploy -c docker-compose.yml dgraph
   688  ```
   689  
   690  This should run three Dgraph Alpha services (one on each VM because of the
   691  constraint we have), one Dgraph Zero service on aws01 and one Dgraph Ratel.
   692  
   693  These placement constraints (as seen in the compose file) are important so that
   694  in case of restarting any containers, swarm places the respective Dgraph Alpha
   695  or Zero containers on the same hosts to re-use the volumes. Also, if you are
   696  running fewer than three hosts, make sure you use either different volumes or
   697  run Dgraph Alpha with `-p p1 -w w1` options.
   698  
   699  {{% notice "note" %}}
   700  
   701  1. This setup would create and use a local volume called `dgraph_data-volume` on
   702     the instances. If you plan to replace instances, you should use remote
   703     storage like
   704     [cloudstore](https://docs.docker.com/docker-for-aws/persistent-data-volumes)
   705     instead of local disk. {{% /notice %}}
   706  
   707  You can verify that all services were created successfully by running:
   708  
   709  ```sh
   710  docker service ls
   711  ```
   712  
   713  Output:
   714  ```
   715  ID                  NAME                MODE                REPLICAS            IMAGE                PORTS
   716  vp5bpwzwawoe        dgraph_ratel        replicated          1/1                 dgraph/dgraph:latest   *:8000->8000/tcp
   717  69oge03y0koz        dgraph_alpha2      replicated          1/1                 dgraph/dgraph:latest   *:8081->8081/tcp,*:9081->9081/tcp
   718  kq5yks92mnk6        dgraph_alpha3      replicated          1/1                 dgraph/dgraph:latest   *:8082->8082/tcp,*:9082->9082/tcp
   719  uild5cqp44dz        dgraph_zero         replicated          1/1                 dgraph/dgraph:latest   *:5080->5080/tcp,*:6080->6080/tcp
   720  v9jlw00iz2gg        dgraph_alpha1      replicated          1/1                 dgraph/dgraph:latest   *:8080->8080/tcp,*:9080->9080/tcp
   721  ```
   722  
   723  To stop the cluster run
   724  
   725  ```
   726  docker stack rm dgraph
   727  ```
   728  
   729  ### HA Cluster setup using Docker Swarm
   730  
   731  Here is a sample swarm config for running 6 Dgraph Alpha nodes and 3 Zero nodes on 6 different
   732  ec2 instances. Setup should be similar to [Cluster setup using Docker Swarm]({{< relref "#cluster-setup-using-docker-swarm" >}}) apart from a couple of differences. This setup would ensure replication with sharding of data. The file assumes that there are six hosts available as docker-machines. Also if you are running on fewer than six hosts, make sure you use either different volumes or run Dgraph Alpha with `-p p1 -w w1` options.
   733  
   734  You would need to edit the `docker-machine` security group to open inbound traffic on the following ports.
   735  
   736  1. Allow all inbound traffic on all ports with Source being `docker-machine` security ports so that
   737     docker related communication can happen easily.
   738  
   739  2. Also open inbound TCP traffic on the following ports required by Dgraph: `5080`, `8000`, `808[0-5]`, `908[0-5]`. Remember port *5080* is only required if you are running Dgraph Live Loader or Dgraph Bulk Loader from outside. You need to open `7080` to enable Alpha-to-Alpha communication in case you have not opened all ports in #1.
   740  
   741  If you are on AWS, below is the security group (**docker-machine**) after necessary changes.
   742  
   743  {{% load-img "/images/aws.png" "AWS Security Group" %}}
   744  
   745  Copy the following file on your host machine and name it as docker-compose.yml
   746  
   747  ```sh
   748  version: "3"
   749  networks:
   750    dgraph:
   751  services:
   752    zero1:
   753      image: dgraph/dgraph:latest
   754      volumes:
   755        - data-volume:/dgraph
   756      ports:
   757        - 5080:5080
   758        - 6080:6080
   759      networks:
   760        - dgraph
   761      deploy:
   762        placement:
   763          constraints:
   764            - node.hostname == aws01
   765      command: dgraph zero --my=zero1:5080 --replicas 3 --idx 1
   766    zero2:
   767      image: dgraph/dgraph:latest
   768      volumes:
   769        - data-volume:/dgraph
   770      ports:
   771        - 5081:5081
   772        - 6081:6081
   773      networks:
   774        - dgraph
   775      deploy:
   776        placement:
   777          constraints:
   778            - node.hostname == aws02
   779      command: dgraph zero -o 1 --my=zero2:5081 --replicas 3 --peer zero1:5080 --idx 2
   780    zero_3:
   781      image: dgraph/dgraph:latest
   782      volumes:
   783        - data-volume:/dgraph
   784      ports:
   785        - 5082:5082
   786        - 6082:6082
   787      networks:
   788        - dgraph
   789      deploy:
   790        placement:
   791          constraints:
   792            - node.hostname == aws03
   793      command: dgraph zero -o 2 --my=zero_3:5082 --replicas 3 --peer zero1:5080 --idx 3
   794    alpha1:
   795      image: dgraph/dgraph:latest
   796      hostname: "alpha1"
   797      volumes:
   798        - data-volume:/dgraph
   799      ports:
   800        - 8080:8080
   801        - 9080:9080
   802      networks:
   803        - dgraph
   804      deploy:
   805        replicas: 1
   806        placement:
   807          constraints:
   808            - node.hostname == aws01
   809      command: dgraph alpha --my=alpha1:7080 --lru_mb=2048 --zero=zero1:5080
   810    alpha2:
   811      image: dgraph/dgraph:latest
   812      hostname: "alpha2"
   813      volumes:
   814        - data-volume:/dgraph
   815      ports:
   816        - 8081:8081
   817        - 9081:9081
   818      networks:
   819        - dgraph
   820      deploy:
   821        replicas: 1
   822        placement:
   823          constraints:
   824            - node.hostname == aws02
   825      command: dgraph alpha --my=alpha2:7081 --lru_mb=2048 --zero=zero1:5080 -o 1
   826    alpha3:
   827      image: dgraph/dgraph:latest
   828      hostname: "alpha3"
   829      volumes:
   830        - data-volume:/dgraph
   831      ports:
   832        - 8082:8082
   833        - 9082:9082
   834      networks:
   835        - dgraph
   836      deploy:
   837        replicas: 1
   838        placement:
   839          constraints:
   840            - node.hostname == aws03
   841      command: dgraph alpha --my=alpha3:7082 --lru_mb=2048 --zero=zero1:5080 -o 2
   842    alpha_4:
   843      image: dgraph/dgraph:latest
   844      hostname: "alpha_4"
   845      volumes:
   846        - data-volume:/dgraph
   847      ports:
   848        - 8083:8083
   849        - 9083:9083
   850      networks:
   851        - dgraph
   852      deploy:
   853        placement:
   854          constraints:
   855            - node.hostname == aws04
   856      command: dgraph alpha --my=alpha_4:7083 --lru_mb=2048 --zero=zero1:5080 -o 3
   857    alpha_5:
   858      image: dgraph/dgraph:latest
   859      hostname: "alpha_5"
   860      volumes:
   861        - data-volume:/dgraph
   862      ports:
   863        - 8084:8084
   864        - 9084:9084
   865      networks:
   866        - dgraph
   867      deploy:
   868        placement:
   869          constraints:
   870            - node.hostname == aws05
   871      command: dgraph alpha --my=alpha_5:7084 --lru_mb=2048 --zero=zero1:5080 -o 4
   872    alpha_6:
   873      image: dgraph/dgraph:latest
   874      hostname: "alpha_6"
   875      volumes:
   876        - data-volume:/dgraph
   877      ports:
   878        - 8085:8085
   879        - 9085:9085
   880      networks:
   881        - dgraph
   882      deploy:
   883        placement:
   884          constraints:
   885            - node.hostname == aws06
   886      command: dgraph alpha --my=alpha_6:7085 --lru_mb=2048 --zero=zero1:5080 -o 5
   887    ratel:
   888      image: dgraph/dgraph:latest
   889      hostname: "ratel"
   890      ports:
   891        - 8000:8000
   892      networks:
   893        - dgraph
   894      command: dgraph-ratel
   895  volumes:
   896    data-volume:
   897  ```
   898  {{% notice "note" %}}
   899  1. This setup assumes that you are using 6 hosts, but if you are running fewer than 6 hosts then you have to either use different volumes between Dgraph alphas or use `-p` & `-w` to configure data directories.
   900  2. This setup would create and use a local volume called `dgraph_data-volume` on the instances. If you plan to replace instances, you should use remote storage like [cloudstore](https://docs.docker.com/docker-for-aws/persistent-data-volumes) instead of local disk. {{% /notice %}}
   901  
   902  ## Using Kubernetes
   903  
   904  The following section covers running Dgraph with Kubernetes.  We have tested Dgraph with Kubernetes 1.14 to 1.15 on [GKE](https://cloud.google.com/kubernetes-engine) and [EKS](https://aws.amazon.com/eks/).
   905  
   906  {{% notice "note" %}}These instructions are for running Dgraph Alpha without TLS configuration.
   907  Instructions for running with TLS refer [TLS instructions](#tls-configuration).{{% /notice %}}
   908  
   909  * Install [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) which is used to deploy
   910    and manage applications on kubernetes.
   911  * Get the Kubernetes cluster up and running on a cloud provider of your choice.
   912    * For Amazon [EKS](https://aws.amazon.com/eks/), you can use [eksctl](https://eksctl.io/) to quickly provision a new cluster. If you are new to this, Amazon has an article [Getting started with eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html).
   913    * For Google Cloud [GKE](https://cloud.google.com/kubernetes-engine), you can use [Google Cloud SDK](https://cloud.google.com/sdk/install) and the `gcloud container clusters create` command to quickly provision a new cluster.
   914  
   915  Verify that you have your cluster up and running using `kubectl get nodes`. If you used `eksctl` or `gcloud container clusters create` with the default options, you should have 2-3 worker nodes ready.
   916  
   917  On Amazon [EKS](https://aws.amazon.com/eks/), you would see something like this:
   918  
   919  ```sh
   920  ➜  kubernetes git:(master) ✗ kubectl get nodes
   921  NAME                                          STATUS   ROLES    AGE   VERSION
   922  <aws-ip-hostname>.<region>.compute.internal   Ready    <none>   1m   v1.15.11-eks-af3caf
   923  <aws-ip-hostname>.<region>.compute.internal   Ready    <none>   1m   v1.15.11-eks-af3caf
   924  ```
   925  
   926  On Google Cloud [GKE](https://cloud.google.com/kubernetes-engine), you would see something like this:
   927  
   928  ```sh
   929  ➜  kubernetes git:(master) ✗ kubectl get nodes
   930  NAME                                       STATUS   ROLES    AGE   VERSION
   931  gke-<cluster-name>-default-pool-<gce-id>   Ready    <none>   41s   v1.14.10-gke.36
   932  gke-<cluster-name>-default-pool-<gce-id>   Ready    <none>   40s   v1.14.10-gke.36
   933  gke-<cluster-name>-default-pool-<gce-id>   Ready    <none>   41s   v1.14.10-gke.36
   934  ```
   935  
   936  ### Single Server
   937  
   938  Once your Kubernetes cluster is up, you can use [dgraph-single.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml) to start a Zero, Alpha, and Ratel UI services.
   939  
   940  #### Deploy Single Server
   941  
   942  From your machine, run the following command to start a StatefulSet that creates a single Pod with Zero, Alpha, and Ratel UI running in it.
   943  
   944  ```sh
   945  kubectl create --filename https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml
   946  ```
   947  
   948  Output:
   949  ```
   950  service/dgraph-public created
   951  statefulset.apps/dgraph created
   952  ```
   953  
   954  #### Verify Single Server
   955  
   956  Confirm that the pod was created successfully.
   957  
   958  ```sh
   959  kubectl get pods
   960  ```
   961  
   962  Output:
   963  ```
   964  NAME       READY     STATUS    RESTARTS   AGE
   965  dgraph-0   3/3       Running   0          1m
   966  ```
   967  
   968  {{% notice "tip" %}}
   969  You can check the logs for the containers in the pod using
   970  `kubectl logs --follow dgraph-0 <container_name>`. For example, try
   971  `kubectl logs --follow dgraph-0 alpha` for server logs.
   972  {{% /notice %}}
   973  
   974  #### Test Single Server Setup
   975  
   976  Port forward from your local machine to the pod
   977  
   978  ```sh
   979  kubectl port-forward pod/dgraph-0 8080:8080
   980  kubectl port-forward pod/dgraph-0 8000:8000
   981  ```
   982  
   983  Go to `http://localhost:8000` and verify Dgraph is working as expected.
   984  
   985  #### Remove Single Server Resources
   986  
   987  Delete all the resources
   988  
   989  ```sh
   990  kubectl delete --filename https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-single/dgraph-single.yaml
   991  kubectl delete persistentvolumeclaims --selector app=dgraph
   992  ```
   993  
   994  ### HA Cluster Setup Using Kubernetes
   995  
   996  This setup allows you to run 3 Dgraph Alphas and 3 Dgraph Zeros. We start Zero with `--replicas
   997  3` flag, so all data would be replicated on 3 Alphas and form 1 alpha group.
   998  
   999  {{% notice "note" %}} Ideally you should have at least three worker nodes as part of your Kubernetes
  1000  cluster so that each Dgraph Alpha runs on a separate worker node.{{% /notice %}}
  1001  
  1002  #### Validate Kubernetes Cluster for HA
  1003  
  1004  Check the nodes that are part of the Kubernetes cluster.
  1005  
  1006  ```sh
  1007  kubectl get nodes
  1008  ```
  1009  
  1010  Output for Amazon [EKS](https://aws.amazon.com/eks/):
  1011  
  1012  ```sh
  1013  NAME                                          STATUS   ROLES    AGE   VERSION
  1014  <aws-ip-hostname>.<region>.compute.internal   Ready    <none>   1m   v1.15.11-eks-af3caf
  1015  <aws-ip-hostname>.<region>.compute.internal   Ready    <none>   1m   v1.15.11-eks-af3caf
  1016  <aws-ip-hostname>.<region>.compute.internal   Ready    <none>   1m   v1.15.11-eks-af3caf
  1017  ```
  1018  
  1019  Output for Google Cloud [GKE](https://cloud.google.com/kubernetes-engine)
  1020  
  1021  ```sh
  1022  NAME                                       STATUS   ROLES    AGE   VERSION
  1023  gke-<cluster-name>-default-pool-<gce-id>   Ready    <none>   41s   v1.14.10-gke.36
  1024  gke-<cluster-name>-default-pool-<gce-id>   Ready    <none>   40s   v1.14.10-gke.36
  1025  gke-<cluster-name>-default-pool-<gce-id>   Ready    <none>   41s   v1.14.10-gke.36
  1026  ```
  1027  
  1028  Once your Kubernetes cluster is up, you can use [dgraph-ha.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml) to start the cluster.
  1029  
  1030  #### Deploy Dgraph HA Cluster
  1031  
  1032  From your machine, run the following command to start the cluster.
  1033  
  1034  ```sh
  1035  kubectl create --filename https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml
  1036  ```
  1037  
  1038  Output:
  1039  ```sh
  1040  service/dgraph-zero-public created
  1041  service/dgraph-alpha-public created
  1042  service/dgraph-ratel-public created
  1043  service/dgraph-zero created
  1044  service/dgraph-alpha created
  1045  statefulset.apps/dgraph-zero created
  1046  statefulset.apps/dgraph-alpha created
  1047  deployment.apps/dgraph-ratel created
  1048  ```
  1049  
  1050  #### Verify Dgraph HA Cluster
  1051  
  1052  Confirm that the pods were created successfully.
  1053  
  1054  It may take a few minutes for the pods to come up.
  1055  
  1056  ```sh
  1057  kubectl get pods
  1058  ```
  1059  
  1060  Output:
  1061  ```sh
  1062  NAME                  READY   STATUS    RESTARTS   AGE
  1063  dgraph-alpha-0        1/1     Running   0          6m24s
  1064  dgraph-alpha-1        1/1     Running   0          5m42s
  1065  dgraph-alpha-2        1/1     Running   0          5m2s
  1066  dgraph-ratel-<pod-id> 1/1     Running   0          6m23s
  1067  dgraph-zero-0         1/1     Running   0          6m24s
  1068  dgraph-zero-1         1/1     Running   0          5m41s
  1069  dgraph-zero-2         1/1     Running   0          5m6s
  1070  ```
  1071  
  1072  
  1073  {{% notice "tip" %}}You can check the logs for the containers in the pod using `kubectl logs --follow dgraph-alpha-0` and `kubectl logs --follow dgraph-zero-0`.{{% /notice %}}
  1074  
  1075  #### Test Dgraph HA Cluster Setup
  1076  
  1077  Port forward from your local machine to the pod
  1078  
  1079  ```sh
  1080  kubectl port-forward service/dgraph-alpha-public 8080:8080
  1081  kubectl port-forward service/dgraph-ratel-public 8000:8000
  1082  ```
  1083  
  1084  Go to `http://localhost:8000` and verify Dgraph is working as expected.
  1085  
  1086  {{% notice "note" %}} You can also access the service on its External IP address.{{% /notice %}}
  1087  
  1088  #### Delete Dgraph HA Cluster Resources
  1089  
  1090  Delete all the resources
  1091  
  1092  ```sh
  1093  kubectl delete --filename https://raw.githubusercontent.com/dgraph-io/dgraph/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml
  1094  kubectl delete persistentvolumeclaims --selector app=dgraph-zero
  1095  kubectl delete persistentvolumeclaims --selector app=dgraph-alpha
  1096  ```
  1097  
  1098  ### Using Helm Chart
  1099  
  1100  Once your Kubernetes cluster is up, you can make use of the Helm chart present
  1101  [in our official helm repository here](https://github.com/dgraph-io/charts/) to bring
  1102  up a Dgraph cluster.
  1103  
  1104  {{% notice "note" %}}The instructions below are for Helm versions >= 3.x.{{% /notice %}}
  1105  
  1106  #### Installing the Chart
  1107  
  1108  To add the Dgraph helm repository:
  1109  
  1110  ```sh
  1111  helm repo add dgraph https://charts.dgraph.io
  1112  ```
  1113  
  1114  To install the chart with the release name `my-release`:
  1115  
  1116  ```sh
  1117  helm install my-release dgraph/dgraph
  1118  ```
  1119  
  1120  The above command will install the latest available dgraph docker image. In order to install the older versions:
  1121  
  1122  ```sh
  1123  helm install my-release dgraph/dgraph --set image.tag="latest"
  1124  ```
  1125  
  1126  By default zero and alpha services are exposed only within the kubernetes cluster as
  1127  kubernetes service type `ClusterIP`. In order to expose the alpha service publicly
  1128  you can use kubernetes service type `LoadBalancer`:
  1129  
  1130  ```sh
  1131  helm install my-release dgraph/dgraph --set alpha.service.type="LoadBalancer"
  1132  ```
  1133  
  1134  Similarly, you can expose alpha and ratel service to the internet as follows:
  1135  
  1136  ```sh
  1137  helm install my-release dgraph/dgraph --set alpha.service.type="LoadBalancer" --set ratel.service.type="LoadBalancer"
  1138  ```
  1139  
  1140  #### Upgrading the Chart
  1141  
  1142  You can update your cluster configuration by updating the configuration of the
  1143  Helm chart. Dgraph is a stateful database that requires some attention on
  1144  upgrading the configuration carefully in order to update your cluster to your
  1145  desired configuration.
  1146  
  1147  In general, you can use [`helm upgrade`][helm-upgrade] to update the
  1148  configuration values of the cluster. Depending on your change, you may need to
  1149  upgrade the configuration in multiple steps following the steps below.
  1150  
  1151  [helm-upgrade]: https://helm.sh/docs/helm/helm_upgrade/
  1152  
  1153  **Upgrade to HA cluster setup**
  1154  
  1155  To upgrade to an [HA cluster setup]({{< relref "#ha-cluster-setup" >}}), ensure
  1156  that the shard replication setting is more than 1. When `zero.shardReplicaCount`
  1157  is not set to an HA configuration (3 or 5), follow the steps below:
  1158  
  1159  1. Set the shard replica flag on the Zero node group. For example: `zero.shardReplicaCount=3`.
  1160  2. Next, run the Helm upgrade command to restart the Zero node group:
  1161     ```sh
  1162     helm upgrade my-release dgraph/dgraph [options]
  1163     ```
  1164  3. Now set the Alpha replica count flag. For example: `alpha.replicaCount=3`.
  1165  4. Finally, run the Helm upgrade command again:
  1166     ```sh
  1167     helm upgrade my-release dgraph/dgraph [options]
  1168     ```
  1169  
  1170  
  1171  #### Deleting the Chart
  1172  
  1173  Delete the Helm deployment as normal
  1174  
  1175  ```sh
  1176  helm delete my-release
  1177  ```
  1178  Deletion of the StatefulSet doesn't cascade to deleting associated PVCs. To delete them:
  1179  
  1180  ```sh
  1181  kubectl delete pvc -l release=my-release,chart=dgraph
  1182  ```
  1183  
  1184  #### Configuration
  1185  
  1186  The following table lists the configurable parameters of the dgraph chart and their default values.
  1187  
  1188  |              Parameter               |                             Description                             |                       Default                       |
  1189  | ------------------------------------ | ------------------------------------------------------------------- | --------------------------------------------------- |
  1190  | `image.registry`                     | Container registry name                                             | `docker.io`                                         |
  1191  | `image.repository`                   | Container image name                                                | `dgraph/dgraph`                                     |
  1192  | `image.tag`                          | Container image tag                                                 | `latest`                                            |
  1193  | `image.pullPolicy`                   | Container pull policy                                               | `Always`                                            |
  1194  | `zero.name`                          | Zero component name                                                 | `zero`                                              |
  1195  | `zero.updateStrategy`                | Strategy for upgrading zero nodes                                   | `RollingUpdate`                                     |
  1196  | `zero.monitorLabel`                  | Monitor label for zero, used by prometheus.                         | `zero-dgraph-io`                                    |
  1197  | `zero.rollingUpdatePartition`        | Partition update strategy                                           | `nil`                                               |
  1198  | `zero.podManagementPolicy`           | Pod management policy for zero nodes                                | `OrderedReady`                                      |
  1199  | `zero.replicaCount`                  | Number of zero nodes                                                | `3`                                                 |
  1200  | `zero.shardReplicaCount`             | Max number of replicas per data shard                               | `5`                                                 |
  1201  | `zero.terminationGracePeriodSeconds` | Zero server pod termination grace period                            | `60`                                                |
  1202  | `zero.antiAffinity`                  | Zero anti-affinity policy                                           | `soft`                                              |
  1203  | `zero.podAntiAffinitytopologyKey`    | Anti affinity topology key for zero nodes                           | `kubernetes.io/hostname`                            |
  1204  | `zero.nodeAffinity`                  | Zero node affinity policy                                           | `{}`                                                |
  1205  | `zero.service.type`                  | Zero node service type                                              | `ClusterIP`                                         |
  1206  | `zero.securityContext.enabled`       | Security context for zero nodes enabled                             | `false`                                             |
  1207  | `zero.securityContext.fsGroup`       | Group id of the zero container                                      | `1001`                                              |
  1208  | `zero.securityContext.runAsUser`     | User ID for the zero container                                      | `1001`                                              |
  1209  | `zero.persistence.enabled`           | Enable persistence for zero using PVC                               | `true`                                              |
  1210  | `zero.persistence.storageClass`      | PVC Storage Class for zero volume                                   | `nil`                                               |
  1211  | `zero.persistence.accessModes`       | PVC Access Mode for zero volume                                     | `ReadWriteOnce`                                     |
  1212  | `zero.persistence.size`              | PVC Storage Request for zero volume                                 | `8Gi`                                               |
  1213  | `zero.nodeSelector`                  | Node labels for zero pod assignment                                 | `{}`                                                |
  1214  | `zero.tolerations`                   | Zero tolerations                                                    | `[]`                                                |
  1215  | `zero.resources`                     | Zero node resources requests & limits                               | `{}`                                                |
  1216  | `zero.livenessProbe`                 | Zero liveness probes                                                | `See values.yaml for defaults`                      |
  1217  | `zero.readinessProbe`                | Zero readiness probes                                               | `See values.yaml for defaults`                      |
  1218  | `alpha.name`                         | Alpha component name                                                | `alpha`                                             |
  1219  | `alpha.updateStrategy`               | Strategy for upgrading alpha nodes                                  | `RollingUpdate`                                     |
  1220  | `alpha.monitorLabel`                 | Monitor label for alpha, used by prometheus.                        | `alpha-dgraph-io`                                   |
  1221  | `alpha.rollingUpdatePartition`       | Partition update strategy                                           | `nil`                                               |
  1222  | `alpha.podManagementPolicy`          | Pod management policy for alpha nodes                               | `OrderedReady`                                      |
  1223  | `alpha.replicaCount`                 | Number of alpha nodes                                               | `3`                                                 |
  1224  | `alpha.terminationGracePeriodSeconds`| Alpha server pod termination grace period                           | `60`                                                |
  1225  | `alpha.antiAffinity`                 | Alpha anti-affinity policy                                          | `soft`                                              |
  1226  | `alpha.podAntiAffinitytopologyKey`   | Anti affinity topology key for zero nodes                           | `kubernetes.io/hostname`                            |
  1227  | `alpha.nodeAffinity`                 | Alpha node affinity policy                                          | `{}`                                                |
  1228  | `alpha.service.type`                 | Alpha node service type                                             | `ClusterIP`                                         |
  1229  | `alpha.securityContext.enabled`      | Security context for alpha nodes enabled                            | `false`                                             |
  1230  | `alpha.securityContext.fsGroup`      | Group id of the alpha container                                     | `1001`                                              |
  1231  | `alpha.securityContext.runAsUser`    | User ID for the alpha container                                     | `1001`                                              |
  1232  | `alpha.persistence.enabled`          | Enable persistence for alpha using PVC                              | `true`                                              |
  1233  | `alpha.persistence.storageClass`     | PVC Storage Class for alpha volume                                  | `nil`                                               |
  1234  | `alpha.persistence.accessModes`      | PVC Access Mode for alpha volume                                    | `ReadWriteOnce`                                     |
  1235  | `alpha.persistence.size`             | PVC Storage Request for alpha volume                                | `8Gi`                                               |
  1236  | `alpha.nodeSelector`                 | Node labels for alpha pod assignment                                | `{}`                                                |
  1237  | `alpha.tolerations`                  | Alpha tolerations                                                   | `[]`                                                |
  1238  | `alpha.resources`                    | Alpha node resources requests & limits                              | `{}`                                                |
  1239  | `alpha.livenessProbe`                | Alpha liveness probes                                               | `See values.yaml for defaults`                      |
  1240  | `alpha.readinessProbe`               | Alpha readiness probes                                              | `See values.yaml for defaults`                      |
  1241  | `ratel.name`                         | Ratel component name                                                | `ratel`                                             |
  1242  | `ratel.replicaCount`                 | Number of ratel nodes                                               | `1`                                                 |
  1243  | `ratel.service.type`                 | Ratel service type                                                  | `ClusterIP`                                         |
  1244  | `ratel.securityContext.enabled`      | Security context for ratel nodes enabled                            | `false`                                             |
  1245  | `ratel.securityContext.fsGroup`      | Group id of the ratel container                                     | `1001`                                              |
  1246  | `ratel.securityContext.runAsUser`    | User ID for the ratel container                                     | `1001`                                              |
  1247  | `ratel.livenessProbe`                | Ratel liveness probes                                               | `See values.yaml for defaults`                      |
  1248  | `ratel.readinessProbe`               | Ratel readiness probes                                              | `See values.yaml for defaults`                      |
  1249  
  1250  ### Monitoring in Kubernetes
  1251  
  1252  Dgraph exposes prometheus metrics to monitor the state of various components involved in the cluster, this includes dgraph alpha and zero.
  1253  
  1254  Follow the below mentioned steps to setup prometheus monitoring for your cluster:
  1255  
  1256  * Install Prometheus operator:
  1257  
  1258  ```sh
  1259  kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/release-0.34/bundle.yaml
  1260  ```
  1261  
  1262  * Ensure that the instance of `prometheus-operator` has started before continuing.
  1263  
  1264  ```sh
  1265  $ kubectl get deployments prometheus-operator
  1266  NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
  1267  prometheus-operator   1         1         1            1           3m
  1268  ```
  1269  
  1270  * Apply prometheus manifest present [here](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/prometheus.yaml).
  1271  
  1272  ```sh
  1273  $ kubectl apply -f prometheus.yaml
  1274  
  1275  serviceaccount/prometheus-dgraph-io created
  1276  clusterrole.rbac.authorization.k8s.io/prometheus-dgraph-io created
  1277  clusterrolebinding.rbac.authorization.k8s.io/prometheus-dgraph-io created
  1278  servicemonitor.monitoring.coreos.com/alpha.dgraph-io created
  1279  servicemonitor.monitoring.coreos.com/zero-dgraph-io created
  1280  prometheus.monitoring.coreos.com/dgraph-io created
  1281  ```
  1282  
  1283  To view prometheus UI locally run:
  1284  
  1285  ```sh
  1286  kubectl port-forward prometheus-dgraph-io-0 9090:9090
  1287  ```
  1288  
  1289  The UI is accessible at port 9090. Open http://localhost:9090 in your browser to play around.
  1290  
  1291  To register alerts from dgraph cluster with your prometheus deployment follow the steps below:
  1292  
  1293  * Create a kubernetes secret containing alertmanager configuration. Edit the configuration file present [here](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/alertmanager-config.yaml)
  1294  with the required reciever configuration including the slack webhook credential and create the secret.
  1295  
  1296  You can find more information about alertmanager configuration [here](https://prometheus.io/docs/alerting/configuration/).
  1297  
  1298  ```sh
  1299  $ kubectl create secret generic alertmanager-alertmanager-dgraph-io --from-file=alertmanager.yaml=alertmanager-config.yaml
  1300  
  1301  $ kubectl get secrets
  1302  NAME                                            TYPE                 DATA   AGE
  1303  alertmanager-alertmanager-dgraph-io             Opaque               1      87m
  1304  ```
  1305  
  1306  * Apply the [alertmanager](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/alertmanager.yaml) along with [alert-rules](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/alert-rules.yaml) manifest
  1307  to use the default configured alert configuration. You can also add custom rules based on the metrics exposed by dgraph cluster similar to [alert-rules](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/monitoring/prometheus/alert-rules.yaml)
  1308  manifest.
  1309  
  1310  ```sh
  1311  $ kubectl apply -f alertmanager.yaml
  1312  alertmanager.monitoring.coreos.com/alertmanager-dgraph-io created
  1313  service/alertmanager-dgraph-io created
  1314  
  1315  $ kubectl apply -f alert-rules.yaml
  1316  prometheusrule.monitoring.coreos.com/prometheus-rules-dgraph-io created
  1317  ```
  1318  
  1319  ### Kubernetes Storage
  1320  
  1321  The Kubernetes configurations in the previous sections were configured to run
  1322  Dgraph with any storage type (`storage-class: anything`). On the common cloud
  1323  environments like AWS, GCP, and Azure, the default storage type are slow disks
  1324  like hard disks or low IOPS SSDs. We highly recommend using faster disks for
  1325  ideal performance when running Dgraph.
  1326  
  1327  #### Local storage
  1328  
  1329  The AWS storage-optimized i-class instances provide locally attached NVMe-based
  1330  SSD storage which provide consistent very high IOPS. The Dgraph team uses
  1331  i3.large instances on AWS to test Dgraph.
  1332  
  1333  You can create a Kubernetes `StorageClass` object to provision a specific type
  1334  of storage volume which you can then attach to your Dgraph pods. You can set up
  1335  your cluster with local SSDs by using [Local Persistent
  1336  Volumes](https://kubernetes.io/blog/2018/04/13/local-persistent-volumes-beta/).
  1337  This Kubernetes feature is in beta at the time of this writing (Kubernetes
  1338  v1.13.1). You can first set up an EC2 instance with locally attached storage.
  1339  Once it is formatted and mounted properly, then you can create a StorageClass to
  1340  access it.:
  1341  
  1342  ```yaml
  1343  apiVersion: storage.k8s.io/v1
  1344  kind: StorageClass
  1345  metadata:
  1346    name: <your-local-storage-class-name>
  1347  provisioner: kubernetes.io/no-provisioner
  1348  volumeBindingMode: WaitForFirstConsumer
  1349  ```
  1350  
  1351  Currently, Kubernetes does not allow automatic provisioning of local storage. So
  1352  a PersistentVolume with a specific mount path should be created:
  1353  
  1354  ```yaml
  1355  apiVersion: v1
  1356  kind: PersistentVolume
  1357  metadata:
  1358    name: <your-local-pv-name>
  1359  spec:
  1360    capacity:
  1361      storage: 475Gi
  1362    volumeMode: Filesystem
  1363    accessModes:
  1364    - ReadWriteOnce
  1365    persistentVolumeReclaimPolicy: Delete
  1366    storageClassName: <your-local-storage-class-name>
  1367    local:
  1368      path: /data
  1369    nodeAffinity:
  1370      required:
  1371        nodeSelectorTerms:
  1372        - matchExpressions:
  1373          - key: kubernetes.io/hostname
  1374            operator: In
  1375            values:
  1376            - <node-name>
  1377  ```
  1378  
  1379  Then, in the StatefulSet configuration you can claim this local storage in
  1380  .spec.volumeClaimTemplate:
  1381  
  1382  ```
  1383  kind: StatefulSet
  1384  ...
  1385   volumeClaimTemplates:
  1386    - metadata:
  1387        name: datadir
  1388      spec:
  1389        accessModes:
  1390        - ReadWriteOnce
  1391        storageClassName: <your-local-storage-class-name>
  1392        resources:
  1393          requests:
  1394            storage: 500Gi
  1395  ```
  1396  
  1397  You can repeat these steps for each instance that's configured with local
  1398  node storage.
  1399  
  1400  #### Non-local persistent disks
  1401  
  1402  EBS volumes on AWS and PDs on GCP are persistent disks that can be configured
  1403  with Dgraph. The disk performance is much lower than locally attached storage
  1404  but can be sufficient for your workload such as testing environments.
  1405  
  1406  When using EBS volumes on AWS, we recommend using Provisioned IOPS SSD EBS
  1407  volumes (the io1 disk type) which provide consistent IOPS. The available IOPS
  1408  for AWS EBS volumes is based on the total disk size. With Kubernetes, you can
  1409  request io1 disks to be provisioned with this config with 50 IOPS/GB using the
  1410  `iopsPerGB` parameter:
  1411  
  1412  ```
  1413  kind: StorageClass
  1414  apiVersion: storage.k8s.io/v1
  1415  metadata:
  1416    name: <your-storage-class-name>
  1417  provisioner: kubernetes.io/aws-ebs
  1418  parameters:
  1419    type: io1
  1420    iopsPerGB: "50"
  1421    fsType: ext4
  1422  ```
  1423  
  1424  Example: Requesting a disk size of 250Gi with this storage class would provide
  1425  12.5K IOPS.
  1426  
  1427  ### Removing a Dgraph Pod
  1428  
  1429  In the event that you need to completely remove a pod (e.g., its disk got
  1430  corrupted and data cannot be recovered), you can use the `/removeNode` API to
  1431  remove the node from the cluster. With a Kubernetes StatefulSet, you'll need to
  1432  remove the node in this order:
  1433  
  1434  1. Call `/removeNode` to remove the Dgraph instance from the cluster (see [More
  1435     about Dgraph Zero]({{< relref "#more-about-dgraph-zero" >}})). The removed
  1436     instance will immediately stop running. Any further attempts to join the
  1437     cluster will fail for that instance since it has been removed.
  1438  2. Remove the PersistentVolumeClaim associated with the pod to delete its data.
  1439     This prepares the pod to join with a clean state.
  1440  3. Restart the pod. This will create a new PersistentVolumeClaim to create new
  1441     data directories.
  1442  
  1443  When an Alpha pod restarts in a replicated cluster, it will join as a new member
  1444  of the cluster, be assigned a group and an unused index from Zero, and receive
  1445  the latest snapshot from the Alpha leader of the group.
  1446  
  1447  When a Zero pod restarts, it must join the existing group with an unused index
  1448  ID. The index ID is set with the `--idx` flag. This may require the StatefulSet
  1449  configuration to be updated.
  1450  
  1451  ### Kubernetes and Bulk Loader
  1452  
  1453  You may want to initialize a new cluster with an existing data set such as data
  1454  from the [Dgraph Bulk Loader]({{< relref "#bulk-loader" >}}). You can use [Init
  1455  Containers](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
  1456  to copy the data to the pod volume before the Alpha process runs.
  1457  
  1458  See the `initContainers` configuration in
  1459  [dgraph-ha.yaml](https://github.com/dgraph-io/dgraph/blob/master/contrib/config/kubernetes/dgraph-ha/dgraph-ha.yaml)
  1460  to learn more.
  1461  
  1462  ## More about Dgraph Alpha
  1463  
  1464  On its HTTP port, a Dgraph Alpha exposes a number of admin endpoints.
  1465  
  1466  * `/health` returns HTTP status code 200 if the worker is running, HTTP 503 otherwise.
  1467  * `/admin/shutdown` initiates a proper [shutdown]({{< relref "#shutdown">}}) of the Alpha.
  1468  * `/admin/export` initiates a data [export]({{< relref "#export">}}).
  1469  
  1470  By default the Alpha listens on `localhost` for admin actions (the loopback address only accessible from the same machine). The `--bindall=true` option binds to `0.0.0.0` and thus allows external connections.
  1471  
  1472  {{% notice "tip" %}}Set max file descriptors to a high value like 10000 if you are going to load a lot of data.{{% /notice %}}
  1473  
  1474  ### More about /health endpoint
  1475  
  1476  The `/health` endpoint of Dgraph Alpha returns HTTP status 200 with a JSON consisting of basic information about the running worker.
  1477  
  1478  Here’s an example of JSON returned from `/health` endpoint:
  1479  
  1480  ```json
  1481  {
  1482    "version": "v1.1.1",
  1483    "instance": "alpha",
  1484    "uptime": 75011100974
  1485  }
  1486  ```
  1487  
  1488  - `version`: Version of Dgraph running the Alpha server.
  1489  - `instance`: Name of the instance. Always set to `alpha`.
  1490  - `uptime`: Time in nanoseconds since the Alpha server is up and running.
  1491  
  1492  ## More about Dgraph Zero
  1493  
  1494  Dgraph Zero controls the Dgraph cluster. It automatically moves data between
  1495  different Dgraph Alpha instances based on the size of the data served by each Alpha instance.
  1496  
  1497  It is mandatory to run at least one `dgraph zero` node before running any `dgraph alpha`.
  1498  Options present for `dgraph zero` can be seen by running `dgraph zero --help`.
  1499  
  1500  * Zero stores information about the cluster.
  1501  * `--replicas` is the option that controls the replication factor. (i.e. number of replicas per data shard, including the original shard)
  1502  * When a new Alpha joins the cluster, it is assigned a group based on the replication factor. If the replication factor is 1 then each Alpha node will serve different group. If replication factor is 2 and you launch 4 Alphas, then first two Alphas would serve group 1 and next two machines would serve group 2.
  1503  * Zero also monitors the space occupied by predicates in each group and moves them around to rebalance the cluster.
  1504  
  1505  Like Alpha, Zero also exposes HTTP on 6080 (+ any `--port_offset`). You can query (**GET** request) it
  1506  to see useful information, like the following:
  1507  
  1508  * `/state` Information about the nodes that are part of the cluster. Also contains information about
  1509  size of predicates and groups they belong to.
  1510  * `/assign?what=uids&num=100` This would allocate `num` uids and return a JSON map
  1511  containing `startId` and `endId`, both inclusive. This id range can be safely assigned
  1512  externally to new nodes during data ingestion.
  1513  * `/assign?what=timestamps&num=100` This would request timestamps from Zero.
  1514  This is useful to fast forward Zero state when starting from a postings
  1515  directory, which already has commits higher than Zero's leased timestamp.
  1516  * `/removeNode?id=3&group=2` If a replica goes down and can't be recovered, you
  1517  can remove it and add a new node to the quorum. This endpoint can be used to
  1518  remove a dead Zero or Dgraph Alpha node. To remove dead Zero nodes, pass
  1519  `group=0` and the id of the Zero node.
  1520  
  1521  {{% notice "note" %}}
  1522  Before using the API ensure that the node is down and ensure that it doesn't come back up ever again.
  1523  
  1524  You should not use the same `idx` of a node that was removed earlier.
  1525  {{% /notice %}}
  1526  
  1527  * `/moveTablet?tablet=name&group=2` This endpoint can be used to move a tablet to a group. Zero
  1528  already does shard rebalancing every 8 mins, this endpoint can be used to force move a tablet.
  1529  
  1530  
  1531  These are the **POST** endpoints available:
  1532  
  1533  * `/enterpriseLicense` Use endpoint to apply an enterprise license to the cluster by supplying it
  1534  as part of the body.
  1535  
  1536  ### More about /state endpoint
  1537  
  1538  The `/state` endpoint of Dgraph Zero returns a JSON document of the current group membership info:
  1539  
  1540  - Instances which are part of the cluster.
  1541  - Number of instances in Zero group and each Alpha groups.
  1542  - Current leader of each group.
  1543  - Predicates that belong to a group.
  1544  - Estimated size in bytes of each predicate.
  1545  - Enterprise license information.
  1546  - Max Leased transaction ID.
  1547  - Max Leased UID.
  1548  - CID (Cluster ID).
  1549  
  1550  Here’s an example of JSON returned from `/state` endpoint for a 6-node Dgraph cluster with three replicas:
  1551  
  1552  ```json
  1553  {
  1554    "counter": "15",
  1555    "groups": {
  1556      "1": {
  1557        "members": {
  1558          "1": {
  1559            "id": "1",
  1560            "groupId": 1,
  1561            "addr": "alpha1:7080",
  1562            "leader": true,
  1563            "lastUpdate": "1576112366"
  1564          },
  1565          "2": {
  1566            "id": "2",
  1567            "groupId": 1,
  1568            "addr": "alpha2:7080"
  1569          },
  1570          "3": {
  1571            "id": "3",
  1572            "groupId": 1,
  1573            "addr": "alpha3:7080"
  1574          }
  1575        },
  1576        "tablets": {
  1577          "counter.val": {
  1578            "groupId": 1,
  1579            "predicate": "counter.val"
  1580          },
  1581          "dgraph.type": {
  1582            "groupId": 1,
  1583            "predicate": "dgraph.type"
  1584          }
  1585        },
  1586        "checksum": "1021598189643258447"
  1587      }
  1588    },
  1589    "zeros": {
  1590      "1": {
  1591        "id": "1",
  1592        "addr": "zero1:5080",
  1593        "leader": true
  1594      },
  1595      "2": {
  1596        "id": "2",
  1597        "addr": "zero2:5080"
  1598      },
  1599      "3": {
  1600        "id": "3",
  1601        "addr": "zero3:5080"
  1602      }
  1603    },
  1604    "maxLeaseId": "10000",
  1605    "maxTxnTs": "10000",
  1606    "cid": "3602537a-ee49-43cb-9792-c766eea683dc",
  1607    "license": {
  1608      "maxNodes": "18446744073709551615",
  1609      "expiryTs": "1578704367",
  1610      "enabled": true
  1611    }
  1612  }
  1613  ```
  1614  
  1615  Here’s the information the above JSON document provides:
  1616  
  1617  - Group 0
  1618    - members
  1619      - zero1:5080, id: 1, leader
  1620      - zero2:5080, id: 2
  1621      - zero3:5080, id: 3
  1622  - Group 1
  1623      - members
  1624          - alpha1:7080, id: 1, leader
  1625          - alpha2:7080, id: 2
  1626          - alpha3:7080, id: 3
  1627      - predicates
  1628          - dgraph.type
  1629          - counter.val
  1630  - Enterprise license
  1631      - Enabled
  1632      - maxNodes: unlimited
  1633      - License expires on Friday, January 10, 2020 4:59:27 PM GMT-08:00 (converted from epoch timestamp)
  1634  - Other data:
  1635      - maxTxnTs
  1636          - The current max lease of transaction timestamps used to hand out start timestamps
  1637            and commit timestamps.
  1638          - This increments in batches of 10,000 IDs. Once the max lease is reached, another
  1639            10,000 IDs are leased. In the event that the Zero leader is lost, then the new
  1640            leader starts a brand new lease from maxTxnTs+1 . Any lost transaction IDs
  1641            in-between will never be used.
  1642          - An admin can use the Zero endpoint HTTP GET `/assign?what=timestamps&num=1000` to
  1643            increase the current transaction timestamp (in this case, by 1000). This is mainly
  1644            useful in special-case scenarios, e.g., using an existing p directory to a fresh
  1645            cluster in order to be able to query the latest data in the DB.
  1646      - maxLeaseId
  1647          - The current max lease of UIDs used for blank node UID assignment.
  1648          - This increments in batches of 10,000 IDs. Once the max lease is reached, another
  1649            10,000 IDs are leased. In the event that the Zero leader is lost, the new leader
  1650            starts a brand new lease from maxLeaseId+1. Any UIDs lost in-between will never
  1651            be used for blank-node UID assignment.
  1652          - An admin can use the Zero endpoint HTTP GET `/assign?what=uids&num=1000` to
  1653            reserve a range of UIDs (in this case, 1000) to use externally (Zero will NEVER
  1654            use these UIDs for blank node UID assignment, so the user can use the range
  1655            to assign UIDs manually to their own data sets.
  1656      - CID
  1657          - This is a unique UUID representing the *cluster-ID* for this cluster. It is generated
  1658            during the initial DB startup and is retained across restarts.
  1659      - Group checksum
  1660          - This is the checksum verification of the data per Alpha group. This is used internally
  1661            to verify group memberships in the event of a tablet move.
  1662  
  1663  {{% notice "note" %}}
  1664  "tablet", "predicate", and "edge" are synonymous terms today. The future plan to
  1665  improve data scalability is to shard a predicate into separate tablets that could
  1666  be assigned to different groups.
  1667  {{% /notice %}}
  1668  
  1669  ## TLS configuration
  1670  
  1671  {{% notice "note" %}}
  1672  This section refers to the `dgraph cert` command which was introduced in v1.0.9. For previous releases, see the previous [TLS configuration documentation](https://docs.dgraph.io/v1.0.7/deploy/#tls-configuration).
  1673  {{% /notice %}}
  1674  
  1675  
  1676  Connections between client and server can be secured with TLS. Password protected private keys are **not supported**.
  1677  
  1678  {{% notice "tip" %}}If you're generating encrypted private keys with `openssl`, be sure to specify encryption algorithm explicitly (like `-aes256`). This will force `openssl` to include `DEK-Info` header in private key, which is required to decrypt the key by Dgraph. When default encryption is used, `openssl` doesn't write that header and key can't be decrypted.{{% /notice %}}
  1679  
  1680  ### Dgraph Certificate Management Tool
  1681  
  1682  The `dgraph cert` program creates and manages CA-signed certificates and private keys using a generated Dgraph Root CA. The `dgraph cert` command simplifies certificate management for you.
  1683  
  1684  ```sh
  1685  # To see the available flags.
  1686  $ dgraph cert --help
  1687  
  1688  # Create Dgraph Root CA, used to sign all other certificates.
  1689  $ dgraph cert
  1690  
  1691  # Create node certificate and private key
  1692  $ dgraph cert -n localhost
  1693  
  1694  # Create client certificate and private key for mTLS (mutual TLS)
  1695  $ dgraph cert -c dgraphuser
  1696  
  1697  # Combine all in one command
  1698  $ dgraph cert -n localhost -c dgraphuser
  1699  
  1700  # List all your certificates and keys
  1701  $ dgraph cert ls
  1702  ```
  1703  
  1704  #### File naming conventions
  1705  
  1706  To enable TLS you must specify the directory path to find certificates and keys. The default location where the _cert_ command stores certificates (and keys) is `tls` under the Dgraph working directory; where the data files are found. The default dir path can be overridden using the `--dir` option.
  1707  
  1708  ```sh
  1709  $ dgraph cert --dir ~/mycerts
  1710  ```
  1711  
  1712  The following file naming conventions are used by Dgraph for proper TLS setup.
  1713  
  1714  | File name | Description | Use |
  1715  |-----------|-------------|-------|
  1716  | ca.crt | Dgraph Root CA certificate | Verify all certificates |
  1717  | ca.key | Dgraph CA private key | Validate CA certificate |
  1718  | node.crt | Dgraph node certificate | Shared by all nodes for accepting TLS connections |
  1719  | node.key | Dgraph node private key | Validate node certificate |
  1720  | client._name_.crt | Dgraph client certificate | Authenticate a client _name_ |
  1721  | client._name_.key | Dgraph client private key | Validate _name_ client certificate |
  1722  
  1723  The Root CA certificate is used for verifying node and client certificates, if changed you must regenerate all certificates.
  1724  
  1725  For client authentication, each client must have their own certificate and key. These are then used to connect to the Dgraph node(s).
  1726  
  1727  The node certificate `node.crt` can support multiple node names using multiple host names and/or IP address. Just separate the names with commas when generating the certificate.
  1728  
  1729  ```sh
  1730  $ dgraph cert -n localhost,104.25.165.23,dgraph.io,2400:cb00:2048:1::6819:a417
  1731  ```
  1732  
  1733  {{% notice "tip" %}}You must delete the old node cert and key before you can generate a new pair.{{% /notice %}}
  1734  
  1735  {{% notice "note" %}}When using host names for node certificates, including _localhost_, your clients must connect to the matching host name -- such as _localhost_ not 127.0.0.1. If you need to use IP addresses, then add them to the node certificate.{{% /notice %}}
  1736  
  1737  #### Certificate inspection
  1738  
  1739  The command `dgraph cert ls` lists all certificates and keys in the `--dir` directory (default 'tls'), along with details to inspect and validate cert/key pairs.
  1740  
  1741  Example of command output:
  1742  
  1743  ```sh
  1744  -rw-r--r-- ca.crt - Dgraph Root CA certificate
  1745          Issuer: Dgraph Labs, Inc.
  1746             S/N: 043c4d8fdd347f06
  1747      Expiration: 02 Apr 29 16:56 UTC
  1748  SHA-256 Digest: 4A2B0F0F 716BF5B6 C603E01A 6229D681 0B2AFDC5 CADF5A0D 17D59299 116119E5
  1749  
  1750  -r-------- ca.key - Dgraph Root CA key
  1751  SHA-256 Digest: 4A2B0F0F 716BF5B6 C603E01A 6229D681 0B2AFDC5 CADF5A0D 17D59299 116119E5
  1752  
  1753  -rw-r--r-- client.admin.crt - Dgraph client certificate: admin
  1754          Issuer: Dgraph Labs, Inc.
  1755       CA Verify: PASSED
  1756             S/N: 297e4cb4f97c71f9
  1757      Expiration: 03 Apr 24 17:29 UTC
  1758  SHA-256 Digest: D23EFB61 DE03C735 EB07B318 DB70D471 D3FE8556 B15D084C 62675857 788DF26C
  1759  
  1760  -rw------- client.admin.key - Dgraph Client key
  1761  SHA-256 Digest: D23EFB61 DE03C735 EB07B318 DB70D471 D3FE8556 B15D084C 62675857 788DF26C
  1762  
  1763  -rw-r--r-- node.crt - Dgraph Node certificate
  1764          Issuer: Dgraph Labs, Inc.
  1765       CA Verify: PASSED
  1766             S/N: 795ff0e0146fdb2d
  1767      Expiration: 03 Apr 24 17:00 UTC
  1768           Hosts: 104.25.165.23, 2400:cb00:2048:1::6819:a417, localhost, dgraph.io
  1769  SHA-256 Digest: 7E243ED5 3286AE71 B9B4E26C 5B2293DA D3E7F336 1B1AFFA7 885E8767 B1A84D28
  1770  
  1771  -rw------- node.key - Dgraph Node key
  1772  SHA-256 Digest: 7E243ED5 3286AE71 B9B4E26C 5B2293DA D3E7F336 1B1AFFA7 885E8767 B1A84D28
  1773  ```
  1774  
  1775  Important points:
  1776  
  1777  * The cert/key pairs should always have matching SHA-256 digests. Otherwise, the cert(s) must be
  1778    regenerated. If the Root CA pair differ, all cert/key must be regenerated; the flag `--force`
  1779    can help.
  1780  * All certificates must pass Dgraph CA verification.
  1781  * All key files should have the least access permissions, especially the `ca.key`, but be readable.
  1782  * Key files won't be overwritten if they have limited access, even with `--force`.
  1783  * Node certificates are only valid for the hosts listed.
  1784  * Client certificates are only valid for the named client/user.
  1785  
  1786  ### TLS Options
  1787  
  1788  The following configuration options are available for Alpha:
  1789  
  1790  * `--tls_dir string` - TLS dir path; this enables TLS connections (usually 'tls').
  1791  * `--tls_use_system_ca` - Include System CA with Dgraph Root CA.
  1792  * `--tls_client_auth string` - TLS client authentication used to validate client connection. See [Client Authentication Options](#client-authentication-options) for details.
  1793  
  1794  Dgraph Live Loader can be configured with the following options:
  1795  
  1796  * `--tls_cacert string` - Dgraph Root CA, such as `./tls/ca.crt`
  1797  * `--tls_use_system_ca` - Include System CA with Dgraph Root CA.
  1798  * `--tls_cert` - User cert file provided by the client to Alpha
  1799  * `--tls_key` - User private key file provided by the client to Alpha
  1800  * `--tls_server_name string` - Server name, used for validating the server's TLS host name.
  1801  
  1802  
  1803  #### Using TLS without Client Authentication
  1804  
  1805  For TLS without client authentication, you can configure certificates and run Alpha server using the following:
  1806  
  1807  ```sh
  1808  # First, create rootca and node certificates and private keys
  1809  $ dgraph cert -n localhost
  1810  # Default use for enabling TLS server (after generating certificates and private keys)
  1811  $ dgraph alpha --tls_dir tls
  1812  ```
  1813  
  1814  You can then run Dgraph live loader using the following:
  1815  
  1816  ```sh
  1817  # Now, connect to server using TLS
  1818  $ dgraph live --tls_cacert ./tls/ca.crt --tls_server_name "localhost" -s 21million.schema -f 21million.rdf.gz
  1819  ```
  1820  
  1821  #### Using TLS with Client Authentication
  1822  
  1823  If you do require Client Authentication (Mutual TLS), you can configure certificates and run Alpha server using the following:
  1824  
  1825  ```sh
  1826  # First, create a rootca, node, and client certificates and private keys
  1827  $ dgraph cert -n localhost -c dgraphuser
  1828  # Default use for enabling TLS server with client authentication (after generating certificates and private keys)
  1829  $ dgraph alpha --tls_dir tls --tls_client_auth="REQUIREANDVERIFY"
  1830  ```
  1831  
  1832  You can then run Dgraph live loader using the following:
  1833  
  1834  ```sh
  1835  # Now, connect to server using mTLS (mutual TLS)
  1836  $ dgraph live \
  1837     --tls_cacert ./tls/ca.crt \
  1838     --tls_cert ./tls/client.dgraphuser.crt \
  1839     --tls_key ./tls/client.dgraphuser.key \
  1840     --tls_server_name "localhost" \
  1841     -s 21million.schema \
  1842     -f 21million.rdf.gz
  1843  ```
  1844  
  1845  #### Client Authentication Options
  1846  
  1847  The server will always **request** Client Authentication.  There are four different values for the `--tls_client_auth` option that change the security policy of the client certificate.
  1848  
  1849  | Value              | Client Cert/Key | Client Certificate Verified |
  1850  |--------------------|-----------------|--------------------|
  1851  | `REQUEST`          | optional        | Client certificate is not VERIFIED if provided. (least secure) |
  1852  | `REQUIREANY`       | required        | Client certificate is never VERIFIED |
  1853  | `VERIFYIFGIVEN`    | optional        | Client certificate is VERIFIED if provided (default) |
  1854  | `REQUIREANDVERIFY` | required        | Client certificate is always VERIFIED (most secure) |
  1855  
  1856  {{% notice "note" %}}REQUIREANDVERIFY is the most secure but also the most difficult to configure for remote clients. When using this value, the value of `--tls_server_name` is matched against the certificate SANs values and the connection host.{{% /notice %}}
  1857  
  1858  ### Using Ratel UI with Client authentication
  1859  
  1860  Ratel UI (and any other JavaScript clients built on top of `dgraph-js-http`)
  1861  connect to Dgraph servers via HTTP, when TLS is enabled servers begin to expect
  1862  HTTPS requests only. Therefore some adjustments need to be made.
  1863  
  1864  If the `--tls_client_auth` option is set to `REQUEST` (default) or
  1865  `VERIFYIFGIVEN`:
  1866  1. Change the connection URL from `http://` to `https://` (e.g. `https://127.0.0.1:8080`).
  1867  2. Install / make trusted the certificate of the Dgraph certificate authority `ca.crt`. Refer to the documentation of your OS / browser for instructions.
  1868  (E.g. on Mac OS this means adding `ca.crt` to the KeyChain and making it trusted
  1869  for `Secure Socket Layer`).
  1870  
  1871  For `REQUIREANY` and `REQUIREANDVERIFY` you need to follow the steps above and
  1872  also need to install client certificate on your OS / browser:
  1873  
  1874  1. Generate a client certificate: `dgraph -c MyLaptop`.
  1875  2. Convert it to a `.p12` file:
  1876  `openssl pkcs12 -export -out MyLaptopCert.p12 -in tls/client.MyLaptop.crt -inkey tls/client.MyLaptop.key`. Use any password you like for export.
  1877  3. Install the generated `MyLaptopCert.p12` file on the client system
  1878  (on Mac OS this means simply double-click the file in Finder).
  1879  4. Next time you use Ratel to connect to an alpha with Client authentication
  1880  enabled the browser will prompt you for a client certificate to use. Select the
  1881  certificate you've just installed in the step above and queries/mutations will
  1882  succeed.
  1883  
  1884  ### Using Curl with Client authentication
  1885  
  1886  When TLS is enabled, `curl` requests to Dgraph will need some specific options to work.  For instance (for an export request):
  1887  
  1888  ```
  1889  curl --silent --cacert ./tls/ca.crt https://localhost:8080/admin/export
  1890  ```
  1891  
  1892  If you are using `curl` with [Client Authentication](#client-authentication-options) set to `REQUIREANY` or `REQUIREANDVERIFY`, you will need to provide the client certificate and private key.  For instance (for an export request):
  1893  
  1894  <<<<<<< HEAD
  1895  ``` 
  1896  curl --cacert ./tls/ca.crt --cert ./tls/node.crt --key ./tls/node.key https://localhost:8080/admin/export
  1897  =======
  1898  ```
  1899  curl --silent --cacert ./tls/ca.crt --cert ./tls/client.dgraphuser.crt --key ./tls/client.dgraphuser.key https://localhost:8080/admin/export
  1900  >>>>>>> cd454e58a... mTLS/TLS documentation fixes (#5382)
  1901  ```
  1902  
  1903  Refer to the `curl` documentation for further information on its TLS options.
  1904  
  1905  ### Access Data Using a Client
  1906  
  1907  Some examples of connecting via a [Client](/clients) when TLS is in use can be found below:
  1908  
  1909  - [dgraph4j](https://github.com/dgraph-io/dgraph4j#creating-a-secure-client-using-tls)
  1910  - [dgraph-js](https://github.com/dgraph-io/dgraph-js/tree/master/examples/tls)
  1911  - [dgo](https://github.com/dgraph-io/dgraph/blob/master/tlstest/acl/acl_over_tls_test.go)
  1912  - [pydgraph](https://github.com/dgraph-io/pydgraph/tree/master/examples/tls)
  1913  
  1914  ### Troubleshooting Ratel's Client authentication
  1915  
  1916  If you are getting errors in Ratel when server's TLS is enabled try opening
  1917  your alpha URL as a webpage.
  1918  
  1919  Assuming you are running Dgraph on your local machine, opening
  1920  `https://localhost:8080/` in browser should produce a message `Dgraph browser is available for running separately using the dgraph-ratel binary`.
  1921  
  1922  In case you are getting a connection error, try not passing the
  1923  `--tls_client_auth` flag when starting an alpha. If you are still getting an
  1924  error, check that your hostname is correct and the port is open; then make sure
  1925  that "Dgraph Root CA" certificate is installed and trusted correctly.
  1926  
  1927  After that, if things work without `--tls_client_auth` but stop working when
  1928  `REQUIREANY` and `REQUIREANDVERIFY` is set make sure the `.p12` file is
  1929  installed correctly.
  1930  
  1931  ## Cluster Checklist
  1932  
  1933  In setting up a cluster be sure the check the following.
  1934  
  1935  * Is at least one Dgraph Zero node running?
  1936  * Is each Dgraph Alpha instance in the cluster set up correctly?
  1937  * Will each Dgraph Alpha instance be accessible to all peers on 7080 (+ any port offset)?
  1938  * Does each instance have a unique ID on startup?
  1939  * Has `--bindall=true` been set for networked communication?
  1940  
  1941  ## Fast Data Loading
  1942  
  1943  There are two different tools that can be used for fast data loading:
  1944  
  1945  - `dgraph live` runs the Dgraph Live Loader
  1946  - `dgraph bulk` runs the Dgraph Bulk Loader
  1947  
  1948  {{% notice "note" %}} Both tools only accept [RDF N-Quad/Triple
  1949  data](https://www.w3.org/TR/n-quads/) or JSON in plain or gzipped format. Data
  1950  in other formats must be converted.{{% /notice %}}
  1951  
  1952  ### Live Loader
  1953  
  1954  Dgraph Live Loader (run with `dgraph live`) is a small helper program which reads RDF N-Quads from a gzipped file, batches them up, creates mutations (using the go client) and shoots off to Dgraph.
  1955  
  1956  Dgraph Live Loader correctly handles assigning unique IDs to blank nodes across multiple files, and can optionally persist them to disk to save memory, in case the loader was re-run.
  1957  
  1958  {{% notice "note" %}} Dgraph Live Loader can optionally write the xid->uid mapping to a directory specified using the `-x` flag, which can reused
  1959  given that live loader completed successfully in the previous run.{{% /notice %}}
  1960  
  1961  ```sh
  1962  $ dgraph live --help # To see the available flags.
  1963  
  1964  # Read RDFs or JSON from the passed file, and send them to Dgraph on localhost:9080.
  1965  $ dgraph live -f <path-to-gzipped-RDF-or-JSON-file>
  1966  
  1967  # Read multiple RDFs or JSON from the passed path, and send them to Dgraph on localhost:9080.
  1968  $ dgraph live -f <./path-to-gzipped-RDF-or-JSON-files>
  1969  
  1970  # Read multiple files strictly by name.
  1971  $ dgraph live -f <file1.rdf, file2.rdf>
  1972  
  1973  # Use compressed gRPC connections to and from Dgraph.
  1974  $ dgraph live -C -f <path-to-gzipped-RDF-or-JSON-file>
  1975  
  1976  # Read RDFs and a schema file and send to Dgraph running at given address.
  1977  $ dgraph live -f <path-to-gzipped-RDf-or-JSON-file> -s <path-to-schema-file> -a <dgraph-alpha-address:grpc_port> -z <dgraph-zero-address:grpc_port>
  1978  ```
  1979  
  1980  #### Encrypted imports via Live Loader
  1981  
  1982  A new flag keyfile is added to the Live Loader. This option is required to decrypt the encrypted export data and schema files. Once the export files are decrypted, the Live Loader streams the data to a live Alpha instance.
  1983  
  1984  {{% notice "note" %}}
  1985  If the live Alpha instance has encryption turned on, the `p` directory will be encrypted. Otherwise, the `p` directory is unencrypted.
  1986  {{% /notice %}}
  1987  
  1988  #### Encrypted RDF/JSON file and schema via Live Loader
  1989  `dgraph live -f <path-to-encrypted-gzipped-RDF-or-JSON-file> -s <path-to-encrypted-schema> -keyfile <path-to-keyfile-to-decrypt-files>`
  1990  
  1991  #### Other Live Loader options
  1992  
  1993  `--new_uids` (default: false): Assign new UIDs instead of using the existing
  1994  UIDs in data files. This is useful to avoid overriding the data in a DB already
  1995  in operation.
  1996  
  1997  `-f, --files`: Location of *.rdf(.gz) or *.json(.gz) file(s) to load. It can
  1998  load multiple files in a given path. If the path is a directory, then all files
  1999  ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded.
  2000  
  2001  `--format`: Specify file format (rdf or json) instead of getting it from
  2002  filenames. This is useful if you need to define a strict format manually.
  2003  
  2004  `-b, --batch` (default: 1000): Number of N-Quads to send as part of a mutation.
  2005  
  2006  `-c, --conc` (default: 10): Number of concurrent requests to make to Dgraph.
  2007  Do not confuse with `-C`.
  2008  
  2009  `-C, --use_compression` (default: false): Enable compression for connections to and from the
  2010  Alpha server.
  2011  
  2012  `-a, --alpha` (default: `localhost:9080`): Dgraph Alpha gRPC server address to connect for live loading. This can be a comma-separated list of Alphas addresses in the same cluster to distribute the load, e.g.,  `"alpha:grpc_port,alpha2:grpc_port,alpha3:grpc_port"`.
  2013  
  2014  ### Bulk Loader
  2015  
  2016  {{% notice "note" %}}
  2017  It's crucial to tune the bulk loader's flags to get good performance. See the
  2018  section below for details.
  2019  {{% /notice %}}
  2020  
  2021  Dgraph Bulk Loader serves a similar purpose to the Dgraph Live Loader, but can
  2022  only be used to load data into a new cluster. It cannot be run on an existing
  2023  Dgraph cluster. Dgraph Bulk Loader is **considerably faster** than the Dgraph
  2024  Live Loader and is the recommended way to perform the initial import of large
  2025  datasets into Dgraph.
  2026  
  2027  Only one or more Dgraph Zeros should be running for bulk loading. Dgraph Alphas
  2028  will be started later.
  2029  
  2030  {{% notice "warning" %}}
  2031  Don't use bulk loader once the Dgraph cluster is up and running. Use it to import
  2032  your existing data to a new cluster.
  2033  {{% /notice %}}
  2034  
  2035  You can [read some technical details](https://blog.dgraph.io/post/bulkloader/)
  2036  about the bulk loader on the blog.
  2037  
  2038  See [Fast Data Loading]({{< relref "#fast-data-loading" >}}) for more info about
  2039  the expected N-Quads format.
  2040  
  2041  **Reduce shards**: Before running the bulk load, you need to decide how many
  2042  Alpha groups will be running when the cluster starts. The number of Alpha groups
  2043  will be the same number of reduce shards you set with the `--reduce_shards`
  2044  flag. For example, if your cluster will run 3 Alpha with 3 replicas per group,
  2045  then there is 1 group and `--reduce_shards` should be set to 1. If your cluster
  2046  will run 6 Alphas with 3 replicas per group, then there are 2 groups and
  2047  `--reduce_shards` should be set to 2.
  2048  
  2049  **Map shards**: The `--map_shards` option must be set to at least what's set for
  2050  `--reduce_shards`. A higher number helps the bulk loader evenly distribute
  2051  predicates between the reduce shards.
  2052  
  2053  ```sh
  2054  $ dgraph bulk -f goldendata.rdf.gz -s goldendata.schema --map_shards=4 --reduce_shards=2 --http localhost:8000 --zero=localhost:5080
  2055  ```
  2056  ```
  2057  {
  2058  	"DataFiles": "goldendata.rdf.gz",
  2059  	"DataFormat": "",
  2060  	"SchemaFile": "goldendata.schema",
  2061  	"DgraphsDir": "out",
  2062  	"TmpDir": "tmp",
  2063  	"NumGoroutines": 4,
  2064  	"MapBufSize": 67108864,
  2065  	"ExpandEdges": true,
  2066  	"SkipMapPhase": false,
  2067  	"CleanupTmp": true,
  2068  	"NumShufflers": 1,
  2069  	"Version": false,
  2070  	"StoreXids": false,
  2071  	"ZeroAddr": "localhost:5080",
  2072  	"HttpAddr": "localhost:8000",
  2073  	"IgnoreErrors": false,
  2074  	"MapShards": 4,
  2075  	"ReduceShards": 2
  2076  }
  2077  The bulk loader needs to open many files at once. This number depends on the size of the data set loaded, the map file output size, and the level of indexing. 100,000 is adequate for most data set sizes. See `man ulimit` for details of how to change the limit.
  2078  Current max open files limit: 1024
  2079  MAP 01s rdf_count:176.0 rdf_speed:174.4/sec edge_count:564.0 edge_speed:558.8/sec
  2080  MAP 02s rdf_count:399.0 rdf_speed:198.5/sec edge_count:1.291k edge_speed:642.4/sec
  2081  MAP 03s rdf_count:666.0 rdf_speed:221.3/sec edge_count:2.164k edge_speed:718.9/sec
  2082  MAP 04s rdf_count:952.0 rdf_speed:237.4/sec edge_count:3.014k edge_speed:751.5/sec
  2083  MAP 05s rdf_count:1.327k rdf_speed:264.8/sec edge_count:4.243k edge_speed:846.7/sec
  2084  MAP 06s rdf_count:1.774k rdf_speed:295.1/sec edge_count:5.720k edge_speed:951.5/sec
  2085  MAP 07s rdf_count:2.375k rdf_speed:338.7/sec edge_count:7.607k edge_speed:1.085k/sec
  2086  MAP 08s rdf_count:3.697k rdf_speed:461.4/sec edge_count:11.89k edge_speed:1.484k/sec
  2087  MAP 09s rdf_count:71.98k rdf_speed:7.987k/sec edge_count:225.4k edge_speed:25.01k/sec
  2088  MAP 10s rdf_count:354.8k rdf_speed:35.44k/sec edge_count:1.132M edge_speed:113.1k/sec
  2089  MAP 11s rdf_count:610.5k rdf_speed:55.39k/sec edge_count:1.985M edge_speed:180.1k/sec
  2090  MAP 12s rdf_count:883.9k rdf_speed:73.52k/sec edge_count:2.907M edge_speed:241.8k/sec
  2091  MAP 13s rdf_count:1.108M rdf_speed:85.10k/sec edge_count:3.653M edge_speed:280.5k/sec
  2092  MAP 14s rdf_count:1.121M rdf_speed:79.93k/sec edge_count:3.695M edge_speed:263.5k/sec
  2093  MAP 15s rdf_count:1.121M rdf_speed:74.61k/sec edge_count:3.695M edge_speed:246.0k/sec
  2094  REDUCE 16s [1.69%] edge_count:62.61k edge_speed:62.61k/sec plist_count:29.98k plist_speed:29.98k/sec
  2095  REDUCE 17s [18.43%] edge_count:681.2k edge_speed:651.7k/sec plist_count:328.1k plist_speed:313.9k/sec
  2096  REDUCE 18s [33.28%] edge_count:1.230M edge_speed:601.1k/sec plist_count:678.9k plist_speed:331.8k/sec
  2097  REDUCE 19s [45.70%] edge_count:1.689M edge_speed:554.4k/sec plist_count:905.9k plist_speed:297.4k/sec
  2098  REDUCE 20s [60.94%] edge_count:2.252M edge_speed:556.5k/sec plist_count:1.278M plist_speed:315.9k/sec
  2099  REDUCE 21s [93.21%] edge_count:3.444M edge_speed:681.5k/sec plist_count:1.555M plist_speed:307.7k/sec
  2100  REDUCE 22s [100.00%] edge_count:3.695M edge_speed:610.4k/sec plist_count:1.778M plist_speed:293.8k/sec
  2101  REDUCE 22s [100.00%] edge_count:3.695M edge_speed:584.4k/sec plist_count:1.778M plist_speed:281.3k/sec
  2102  Total: 22s
  2103  ```
  2104  
  2105  The output will be generated in the `out` directory by default. Here's the bulk
  2106  load output from the example above:
  2107  
  2108  ```sh
  2109  $ tree ./out
  2110  ```
  2111  ```
  2112  ./out
  2113  ├── 0
  2114  │   └── p
  2115  │       ├── 000000.vlog
  2116  │       ├── 000002.sst
  2117  │       └── MANIFEST
  2118  └── 1
  2119      └── p
  2120          ├── 000000.vlog
  2121          ├── 000002.sst
  2122          └── MANIFEST
  2123  
  2124  4 directories, 6 files
  2125  ```
  2126  
  2127  Because `--reduce_shards` was set to 2, there are two sets of p directories: one
  2128  in `./out/0` directory and another in the `./out/1` directory.
  2129  
  2130  Once the output is created, they can be copied to all the servers that will run
  2131  Dgraph Alphas. Each Dgraph Alpha must have its own copy of the group's p
  2132  directory output. Each replica of the first group should have its own copy of
  2133  `./out/0/p`, each replica of the second group should have its own copy of
  2134  `./out/1/p`, and so on.
  2135  
  2136  ```sh
  2137  $ dgraph bulk --help # To see the available flags.
  2138  
  2139  # Read RDFs or JSON from the passed file.
  2140  $ dgraph bulk -f <path-to-gzipped-RDF-or-JSON-file> ...
  2141  
  2142  # Read multiple RDFs or JSON from the passed path.
  2143  $ dgraph bulk -f <./path-to-gzipped-RDF-or-JSON-files> ...
  2144  
  2145  # Read multiple files strictly by name.
  2146  $ dgraph bulk -f <file1.rdf, file2.rdf> ...
  2147  
  2148  ```
  2149  
  2150  #### Encryption at rest with Bulk Loader
  2151  
  2152  Even before the Dgraph cluster starts, we can load data using Bulk Loader with the encryption feature turned on. Later we can point the generated `p` directory to a new Alpha server.
  2153  
  2154  Here's an example to run Bulk Loader with a key used to write encrypted data:
  2155  
  2156  ```bash
  2157  dgraph bulk --encryption_key_file ./enc_key_file -f data.json.gz -s data.schema --map_shards=1 --reduce_shards=1 --http localhost:8000 --zero=localhost:5080
  2158  ```
  2159  
  2160  #### Encrypting imports via Bulk Loader
  2161  
  2162  The Bulk Loader’s `encryption_key_file` option was previously used to encrypt the output `p ` directory. This same option will also be used to decrypt the encrypted export data and schema files.
  2163  
  2164  Another option, `--encrypted`, indicates whether the input `rdf`/`json` data and schema files are encrypted or not. With this switch, we support the use case of migrating data from unencrypted exports to encrypted import.
  2165  
  2166  So, with the above two options we have 4 cases:
  2167  
  2168  1. `--encrypted=true` and no `encryption_key_file`.
  2169  
  2170  Error: If the input is encrypted, a key file must be provided.
  2171  
  2172  2. `--encrypted=true` and `encryption_key_file`=`path to key.
  2173  
  2174  Input is encrypted and output `p` dir is encrypted as well.
  2175  
  2176  3. `--encrypted=false` and no `encryption_key_file`.
  2177  
  2178  Input is not encrypted and the output `p` dir is also not encrypted.   
  2179  
  2180  4. `--encrypted=false` and `encryption_key_file`=`path to key`.
  2181  
  2182  Input is not encrypted but the output is encrypted. (This is the migration use case mentioned above).
  2183  
  2184  #### Other Bulk Loader options
  2185  
  2186  `--new_uids` (default: false): Assign new UIDs instead of using the existing
  2187  UIDs in data files. This is useful to avoid overriding the data in a DB already
  2188  in operation.
  2189  
  2190  `-f, --files`: Location of *.rdf(.gz) or *.json(.gz) file(s) to load. It can
  2191  load multiple files in a given path. If the path is a directory, then all files
  2192  ending in .rdf, .rdf.gz, .json, and .json.gz will be loaded.
  2193  
  2194  `--format`: Specify file format (rdf or json) instead of getting it from
  2195  filenames. This is useful if you need to define a strict format manually.
  2196  
  2197  #### Tuning & monitoring
  2198  
  2199  ##### Performance Tuning
  2200  
  2201  {{% notice "tip" %}}
  2202  We highly recommend [disabling swap
  2203  space](https://askubuntu.com/questions/214805/how-do-i-disable-swap) when
  2204  running Bulk Loader. It is better to fix the parameters to decrease memory
  2205  usage, than to have swapping grind the loader down to a halt.
  2206  {{% /notice %}}
  2207  
  2208  Flags can be used to control the behaviour and performance characteristics of
  2209  the bulk loader. You can see the full list by running `dgraph bulk --help`. In
  2210  particular, **the flags should be tuned so that the bulk loader doesn't use more
  2211  memory than is available as RAM**. If it starts swapping, it will become
  2212  incredibly slow.
  2213  
  2214  **In the map phase**, tweaking the following flags can reduce memory usage:
  2215  
  2216  - The `--num_go_routines` flag controls the number of worker threads. Lowering reduces memory
  2217    consumption.
  2218  
  2219  - The `--mapoutput_mb` flag controls the size of the map output files. Lowering
  2220    reduces memory consumption.
  2221  
  2222  For bigger datasets and machines with many cores, gzip decoding can be a
  2223  bottleneck during the map phase. Performance improvements can be obtained by
  2224  first splitting the RDFs up into many `.rdf.gz` files (e.g. 256MB each). This
  2225  has a negligible impact on memory usage.
  2226  
  2227  **The reduce phase** is less memory heavy than the map phase, although can still
  2228  use a lot.  Some flags may be increased to improve performance, *but only if
  2229  you have large amounts of RAM*:
  2230  
  2231  - The `--reduce_shards` flag controls the number of resultant Dgraph alpha instances.
  2232    Increasing this increases memory consumption, but in exchange allows for
  2233  higher CPU utilization.
  2234  
  2235  - The `--map_shards` flag controls the number of separate map output shards.
  2236    Increasing this increases memory consumption but balances the resultant
  2237  Dgraph alpha instances more evenly.
  2238  
  2239  - The `--shufflers` controls the level of parallelism in the shuffle/reduce
  2240    stage. Increasing this increases memory consumption.
  2241  
  2242  ## Monitoring
  2243  Dgraph exposes metrics via the `/debug/vars` endpoint in json format and the `/debug/prometheus_metrics` endpoint in Prometheus's text-based format. Dgraph doesn't store the metrics and only exposes the value of the metrics at that instant. You can either poll this endpoint to get the data in your monitoring systems or install **[Prometheus](https://prometheus.io/docs/introduction/install/)**. Replace targets in the below config file with the ip of your Dgraph instances and run prometheus using the command `prometheus -config.file my_config.yaml`.
  2244  ```sh
  2245  scrape_configs:
  2246    - job_name: "dgraph"
  2247      metrics_path: "/debug/prometheus_metrics"
  2248      scrape_interval: "2s"
  2249      static_configs:
  2250      - targets:
  2251        - 172.31.9.133:6080 #For Dgraph zero, 6080 is the http endpoint exposing metrics.
  2252        - 172.31.15.230:8080
  2253        - 172.31.0.170:8080
  2254        - 172.31.8.118:8080
  2255  ```
  2256  
  2257  {{% notice "note" %}}
  2258  Raw data exported by Prometheus is available via `/debug/prometheus_metrics` endpoint on Dgraph alphas.
  2259  {{% /notice %}}
  2260  
  2261  Install **[Grafana](http://docs.grafana.org/installation/)** to plot the metrics. Grafana runs at port 3000 in default settings. Create a prometheus datasource by following these **[steps](https://prometheus.io/docs/visualization/grafana/#creating-a-prometheus-data-source)**. Import **[grafana_dashboard.json](https://github.com/dgraph-io/benchmarks/blob/master/scripts/grafana_dashboard.json)** by following this **[link](http://docs.grafana.org/reference/export_import/#importing-a-dashboard)**.
  2262  
  2263  ## Metrics
  2264  
  2265  Dgraph metrics follow the [metric and label conventions for
  2266  Prometheus](https://prometheus.io/docs/practices/naming/).
  2267  
  2268  ### Disk Metrics
  2269  
  2270  The disk metrics let you track the disk activity of the Dgraph process. Dgraph does not interact
  2271  directly with the filesystem. Instead it relies on [Badger](https://github.com/dgraph-io/badger) to
  2272  read from and write to disk.
  2273  
  2274   Metrics                          	 | Description
  2275   -------                          	 | -----------
  2276   `badger_v2_disk_reads_total`        | Total count of disk reads in Badger.
  2277   `badger_v2_disk_writes_total`       | Total count of disk writes in Badger.
  2278   `badger_v2_gets_total`              | Total count of calls to Badger's `get`.
  2279   `badger_v2_memtable_gets_total`     | Total count of memtable accesses to Badger's `get`.
  2280   `badger_v2_puts_total`              | Total count of calls to Badger's `put`.
  2281   `badger_v2_read_bytes`              | Total bytes read from Badger.
  2282   `badger_v2_written_bytes`           | Total bytes written to Badger.
  2283  
  2284  ### Memory Metrics
  2285  
  2286  The memory metrics let you track the memory usage of the Dgraph process. The idle and inuse metrics
  2287  gives you a better sense of the active memory usage of the Dgraph process. The process memory metric
  2288  shows the memory usage as measured by the operating system.
  2289  
  2290  By looking at all three metrics you can see how much memory a Dgraph process is holding from the
  2291  operating system and how much is actively in use.
  2292  
  2293   Metrics                          | Description
  2294   -------                          | -----------
  2295   `dgraph_memory_idle_bytes`       | Estimated amount of memory that is being held idle that could be reclaimed by the OS.
  2296   `dgraph_memory_inuse_bytes`      | Total memory usage in bytes (sum of heap usage and stack usage).
  2297   `dgraph_memory_proc_bytes`       | Total memory usage in bytes of the Dgraph process. On Linux/macOS, this metric is equivalent to resident set size. On Windows, this metric is equivalent to [Go's runtime.ReadMemStats](https://golang.org/pkg/runtime/#ReadMemStats).
  2298  
  2299  ### Activity Metrics
  2300  
  2301  The activity metrics let you track the mutations, queries, and proposals of an Dgraph instance.
  2302  
  2303   Metrics                          | Description
  2304   -------                          | -----------
  2305   `dgraph_goroutines_total`        | Total number of Goroutines currently running in Dgraph.
  2306   `dgraph_active_mutations_total`  | Total number of mutations currently running.
  2307   `dgraph_pending_proposals_total` | Total pending Raft proposals.
  2308   `dgraph_pending_queries_total`   | Total number of queries in progress.
  2309   `dgraph_num_queries_total`       | Total number of queries run in Dgraph.
  2310  
  2311  ### Health Metrics
  2312  
  2313  The health metrics let you track to check the availability of an Dgraph Alpha instance.
  2314  
  2315   Metrics                          | Description
  2316   -------                          | -----------
  2317   `dgraph_alpha_health_status`     | **Only applicable to Dgraph Alpha**. Value is 1 when the Alpha is ready to accept requests; otherwise 0.
  2318  
  2319  ### Go Metrics
  2320  
  2321  Go's built-in metrics may also be useful to measure for memory usage and garbage collection time.
  2322  
  2323   Metrics                        | Description
  2324   -------                        | -----------
  2325   `go_memstats_gc_cpu_fraction`  | The fraction of this program's available CPU time used by the GC since the program started.
  2326   `go_memstats_heap_idle_bytes`  | Number of heap bytes waiting to be used.
  2327   `go_memstats_heap_inuse_bytes` | Number of heap bytes that are in use.
  2328  
  2329  ## Tracing
  2330  
  2331  Dgraph is integrated with [OpenCensus](https://opencensus.io/zpages/) to collect distributed traces from the Dgraph cluster.
  2332  
  2333  Trace data is always collected within Dgraph. You can adjust the trace sampling rate for Dgraph queries with the `--trace` option for Dgraph Alphas. By default, `--trace` is set to 1 to trace 100% of queries.
  2334  
  2335  ### Examining Traces with zPages
  2336  
  2337  The most basic way to view traces is with the integrated trace pages.
  2338  
  2339  OpenCensus's [zPages](https://opencensus.io/zpages/) are accessible via the Zero or Alpha HTTP port at `/z/tracez`.
  2340  
  2341  ### Examining Traces with Jaeger
  2342  
  2343  Jaeger collects distributed traces and provides a UI to view and query traces across different services. This provides the necessary observability to figure out what is happening in the system.
  2344  
  2345  Dgraph can be configured to send traces directly to a Jaeger collector with the `--jaeger.collector` flag. For example, if the Jaeger collector is running on `http://localhost:14268`, then pass the flag to the Dgraph Zero and Dgraph Alpha instances as `--jaeger.collector=http://localhost:14268`.
  2346  
  2347  See [Jaeger's Getting Started docs](https://www.jaegertracing.io/docs/getting-started/) to get up and running with Jaeger.
  2348  
  2349  #### Setting up multiple Dgraph clusters with Jaeger
  2350  
  2351  Jaeger allows you to examine traces from multiple Dgraph clusters. To do this, use the `--collector.tags` on a Jaeger collector to set custom trace tags. For example, run one collector with `--collector.tags env=qa` and then another collector with `--collector.tags env=dev`. In Dgraph, set the `--jaeger.collector` flag in the Dgraph QA cluster to the first collector and the flag in the Dgraph Dev cluster to the second collector.
  2352  You can run multiple Jaeger collector components for the same single Jaeger backend (e.g., many Jaeger collectors to a single Cassandra backend). This is still a single Jaeger installation but with different collectors customizing the tags per environment.
  2353  
  2354  Once you have this configured, you can filter by tags in the Jaeger UI. Filter traces by tags matching `env=dev`:
  2355  
  2356  {{% load-img "/images/jaeger-ui.png" "Jaeger UI" %}}
  2357  
  2358  Every trace has your custom tags set under the “Process” section of each span:
  2359  
  2360  {{% load-img "/images/jaeger-server-query.png" "Jaeger Query" %}}
  2361  
  2362  Filter traces by tags matching `env=qa`:
  2363  
  2364  {{% load-img "/images/jaeger-json.png" "Jaeger JSON" %}}
  2365  
  2366  {{% load-img "/images/jaeger-server-query-2.png" "Jaeger Query Result" %}}
  2367  
  2368  For more information, check out [Jaeger's Deployment Guide](https://www.jaegertracing.io/docs/deployment/).
  2369  
  2370  ## Dgraph Administration
  2371  
  2372  Each Dgraph Alpha exposes administrative operations over HTTP to export data and to perform a clean shutdown.
  2373  
  2374  ### Whitelist Admin Operations
  2375  
  2376  By default, admin operations can only be initiated from the machine on which the Dgraph Alpha runs.
  2377  You can use the `--whitelist` option to specify whitelisted IP addresses and ranges for hosts from which admin operations can be initiated.
  2378  
  2379  ```sh
  2380  dgraph alpha --whitelist 172.17.0.0:172.20.0.0,192.168.1.1 --lru_mb <one-third RAM> ...
  2381  ```
  2382  This would allow admin operations from hosts with IP between `172.17.0.0` and `172.20.0.0` along with
  2383  the server which has IP address as `192.168.1.1`.
  2384  
  2385  ### Restrict Mutation Operations
  2386  
  2387  By default, you can perform mutation operations for any predicate.
  2388  If the predicate in mutation doesn't exist in the schema,
  2389  the predicate gets added to the schema with an appropriate
  2390  [Dgraph Type](https://docs.dgraph.io/master/query-language/#schema-types).
  2391  
  2392  You can use `--mutations disallow` to disable all mutations,
  2393  which is set to `allow` by default.
  2394  
  2395  ```sh
  2396  dgraph alpha --mutations disallow
  2397  ```
  2398  
  2399  Enforce a strict schema by setting `--mutations strict`.
  2400  This mode allows mutations only on predicates already in the schema.
  2401  Before performing a mutation on a predicate that doesn't exist in the schema,
  2402  you need to perform an alter operation with that predicate and its schema type.
  2403  
  2404  ```sh
  2405  dgraph alpha --mutations strict
  2406  ```
  2407  
  2408  ### Secure Alter Operations
  2409  
  2410  Clients can use alter operations to apply schema updates and drop particular or all predicates from the database.
  2411  By default, all clients are allowed to perform alter operations.
  2412  You can configure Dgraph to only allow alter operations when the client provides a specific token.
  2413  This can be used to prevent clients from making unintended or accidental schema updates or predicate drops.
  2414  
  2415  You can specify the auth token with the `--auth_token` option for each Dgraph Alpha in the cluster.
  2416  Clients must include the same auth token to make alter requests.
  2417  
  2418  ```sh
  2419  $ dgraph alpha --lru_mb=2048 --auth_token=<authtokenstring>
  2420  ```
  2421  
  2422  ```sh
  2423  $ curl -s localhost:8080/alter -d '{ "drop_all": true }'
  2424  # Permission denied. No token provided.
  2425  ```
  2426  
  2427  ```sh
  2428  $ curl -s -H 'X-Dgraph-AuthToken: <wrongsecret>' localhost:8180/alter -d '{ "drop_all": true }'
  2429  # Permission denied. Incorrect token.
  2430  ```
  2431  
  2432  ```sh
  2433  $ curl -H 'X-Dgraph-AuthToken: <authtokenstring>' localhost:8180/alter -d '{ "drop_all": true }'
  2434  # Success. Token matches.
  2435  ```
  2436  
  2437  {{% notice "note" %}}
  2438  To fully secure alter operations in the cluster, the auth token must be set for every Alpha.
  2439  {{% /notice %}}
  2440  
  2441  
  2442  ### Export Database
  2443  
  2444  An export of all nodes is started by locally accessing the export endpoint of any Alpha in the cluster.
  2445  
  2446  ```sh
  2447  $ curl localhost:8080/admin/export
  2448  ```
  2449  {{% notice "warning" %}}By default, this won't work if called from outside the server where the Dgraph Alpha is running.
  2450  You can specify a list or range of whitelisted IP addresses from which export or other admin operations
  2451  can be initiated using the `--whitelist` flag on `dgraph alpha`.
  2452  {{% /notice %}}
  2453  
  2454  This also works from a browser, provided the HTTP GET is being run from the same server where the Dgraph alpha instance is running.
  2455  
  2456  This triggers an export for all Alpha groups of the cluster. The data is exported from the following Dgraph instances:
  2457  
  2458  1. For the Alpha instance that receives the GET request, the group's export data is stored with this Alpha.
  2459  2. For every other group, its group's export data is stored with the Alpha leader of that group.
  2460  
  2461  It is up to the user to retrieve the right export files from the Alphas in the
  2462  cluster. Dgraph does not copy all files to the Alpha that initiated the export.
  2463  The user must also ensure that there is sufficient space on disk to store the
  2464  export.
  2465  
  2466  Each Alpha leader for a group writes output as a gzipped file to the export
  2467  directory specified via the `--export` flag (defaults to a directory called `"export"`). If any of the groups fail, the
  2468  entire export process is considered failed and an error is returned.
  2469  
  2470  The data is exported in RDF format by default. A different output format may be specified with the
  2471  `format` URL parameter. For example:
  2472  
  2473  ```sh
  2474  $ curl 'localhost:8080/admin/export?format=json'
  2475  ```
  2476  
  2477  Currently, "rdf" and "json" are the only formats supported.
  2478  
  2479  #### Encrypting Exports
  2480  
  2481  Export is available wherever an Alpha is running. To encrypt an export, the Alpha must be configured with the `encryption-key-file`.
  2482  
  2483  {{% notice "note" %}}
  2484  The `encryption-key-file` was used for `encryption-at-rest` and will now also be used for encrypted backups and exports.
  2485  {{% /notice %}}
  2486  
  2487  ### Shutdown Database
  2488  
  2489  To shutdown a Dgraph cluster, shutdown all its Alpha and Zero nodes. This can be done in different ways,
  2490  depending on how Dgraph was started (e.g. sending a `SIGTERM` to the processes, or using `systemctl stop service-name`
  2491  if you are using systemd).
  2492  
  2493  A clean exit of a single Dgraph Alpha node can be initiated by running the following command on that node.
  2494  {{% notice "warning" %}}This won't work if called from outside the server where Dgraph is running.
  2495  You can specify a list or range of whitelisted IP addresses from which shutdown or other admin operations
  2496  can be initiated using the `--whitelist` flag on `dgraph alpha`.
  2497  {{% /notice %}}
  2498  
  2499  ```sh
  2500  $ curl localhost:8080/admin/shutdown
  2501  ```
  2502  
  2503  This stops the Alpha on which the command is executed and not the entire cluster.
  2504  
  2505  ### Delete database
  2506  
  2507  Individual triples, patterns of triples and predicates can be deleted as described in the [query languge docs](/query-language#delete).
  2508  
  2509  To drop all data, you could send a `DropAll` request via `/alter` endpoint.
  2510  
  2511  Alternatively, you could:
  2512  
  2513  * [Shutdown Dgraph]({{< relref "#shutdown-database" >}}) and wait for all writes to complete,
  2514  * Delete (maybe do an export first) the `p` and `w` directories, then
  2515  * Restart Dgraph.
  2516  
  2517  ### Upgrade Database
  2518  
  2519  Doing periodic exports is always a good idea. This is particularly useful if you wish to upgrade Dgraph or reconfigure the sharding of a cluster. The following are the right steps to safely export and restart.
  2520  
  2521  1. Start an [export]({{< relref "#export-database">}})
  2522  2. Ensure it is successful
  2523  3. [Shutdown Dgraph]({{< relref "#shutdown-database" >}}) and wait for all writes to complete
  2524  4. Start a new Dgraph cluster using new data directories (this can be done by passing empty directories to the options `-p` and `-w` for Alphas and `-w` for Zeros)
  2525  5. Reload the data via [bulk loader]({{< relref "#bulk-loader" >}})
  2526  6. Verify the correctness of the new Dgraph cluster. If all looks good, you can delete the old directories (export serves as an insurance)
  2527  
  2528  These steps are necessary because Dgraph's underlying data format could have changed, and reloading the export avoids encoding incompatibilities.
  2529  
  2530  Blue-green deployment is a common approach to minimize downtime during the upgrade process. 
  2531  This approach involves switching your application to read-only mode. To make sure that no mutations are executed during the maintenance window you can 
  2532  do a rolling restart of all your Alpha using the option `--mutations disallow` when you restart the Alphas. This will ensure the cluster is in read-only mode.
  2533  
  2534  At this point your application can still read from the old cluster and you can perform the steps 4. and 5. described above.
  2535  When the new cluster (that uses the upgraded version of Dgraph) is up and running, you can point your application to it, and shutdown the old cluster.
  2536  
  2537  {{% notice "note" %}}
  2538  If you are upgrading from v1.0, please make sure you follow the schema migration steps described in [this section](/howto/#schema-types-scalar-uid-and-list-uid).
  2539  {{% /notice %}}
  2540  
  2541  ### Post Installation
  2542  
  2543  Now that Dgraph is up and running, to understand how to add and query data to Dgraph, follow [Query Language Spec](/query-language). Also, have a look at [Frequently asked questions](/faq).
  2544  
  2545  ## Troubleshooting
  2546  
  2547  Here are some problems that you may encounter and some solutions to try.
  2548  
  2549  #### Running OOM (out of memory)
  2550  
  2551  During bulk loading of data, Dgraph can consume more memory than usual, due to high volume of writes. That's generally when you see the OOM crashes.
  2552  
  2553  The recommended minimum RAM to run on desktops and laptops is 16GB. Dgraph can take up to 7-8 GB with the default setting `--lru_mb` set to 4096; so having the rest 8GB for desktop applications should keep your machine humming along.
  2554  
  2555  On EC2/GCE instances, the recommended minimum is 8GB. It's recommended to set `--lru_mb` to one-third of RAM size.
  2556  
  2557  You could also decrease memory usage of Dgraph by setting `--badger.vlog=disk`.
  2558  
  2559  #### Too many open files
  2560  
  2561  If you see an log error messages saying `too many open files`, you should increase the per-process file descriptors limit.
  2562  
  2563  During normal operations, Dgraph must be able to open many files. Your operating system may set by default a open file descriptor limit lower than what's needed for a database such as Dgraph.
  2564  
  2565  On Linux and Mac, you can check the file descriptor limit with `ulimit -n -H` for the hard limit and `ulimit -n -S` for the soft limit. The soft limit should be set high enough for Dgraph to run properly. A soft limit of 65535 is a good lower bound for a production setup. You can adjust the limit as needed.
  2566  
  2567  ## See Also
  2568  
  2569  * [Product Roadmap to v1.0](https://github.com/dgraph-io/dgraph/issues/1)