github.com/unigraph-dev/dgraph@v1.1.1-0.20200923154953-8b52b426f765/wiki/content/howto/index.md (about) 1 +++ 2 date = "2017-03-20T19:35:35+11:00" 3 title = "How To Guides" 4 +++ 5 6 ## Retrieving Debug Information 7 8 Each Dgraph data node exposes profile over `/debug/pprof` endpoint and metrics over `/debug/vars` endpoint. Each Dgraph data node has it's own profiling and metrics information. Below is a list of debugging information exposed by Dgraph and the corresponding commands to retrieve them. 9 10 ### Metrics Information 11 12 If you are collecting these metrics from outside the Dgraph instance you need to pass `--expose_trace=true` flag, otherwise there metrics can be collected by connecting to the instance over localhost. 13 14 ``` 15 curl http://<IP>:<HTTP_PORT>/debug/vars 16 ``` 17 18 Metrics can also be retrieved in the Prometheus format at `/debug/prometheus_metrics`. See the [Metrics]({{< relref "deploy/index.md#metrics" >}}) section for the full list of metrics. 19 20 ### Profiling Information 21 22 Profiling information is available via the `go tool pprof` profiling tool built into Go. The ["Profiling Go programs"](https://blog.golang.org/profiling-go-programs) Go blog post will help you get started with using pprof. Each Dgraph Zero and Dgraph Alpha exposes a debug endpoint at `/debug/pprof/<profile>` via the HTTP port. 23 24 ``` 25 go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/heap 26 #Fetching profile from ... 27 #Saved Profile in ... 28 ``` 29 The output of the command would show the location where the profile is stored. 30 31 In the interactive pprof shell, you can use commands like `top` to get a listing of the top functions in the profile, `web` to get a visual graph of the profile opened in a web browser, or `list` to display a code listing with profiling information overlaid. 32 33 #### CPU Profile 34 35 ``` 36 go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/profile 37 ``` 38 39 #### Memory Profile 40 41 ``` 42 go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/heap 43 ``` 44 45 #### Block Profile 46 47 Dgraph by default doesn't collect the block profile. Dgraph must be started with `--profile_mode=block` and `--block_rate=<N>` with N > 1. 48 49 ``` 50 go tool pprof http://<IP>:<HTTP_PORT>/debug/pprof/block 51 ``` 52 53 #### Goroutine stack 54 55 The HTTP page `/debug/pprof/` is available at the HTTP port of a Dgraph Zero or Dgraph Alpha. From this page a link to the "full goroutine stack dump" is available (e.g., on a Dgraph Alpha this page would be at `http://localhost:8080/debug/pprof/goroutine?debug=2`). Looking at the full goroutine stack can be useful to understand goroutine usage at that moment. 56 57 ## Using the Debug Tool 58 59 {{% notice "note" %}} 60 To debug a running Dgraph cluster, first copy the postings ("p") directory to 61 another location. If the Dgraph cluster is not running, then you can use the 62 same postings directory with the debug tool. 63 {{% /notice %}} 64 65 The `dgraph debug` tool can be used to inspect Dgraph's posting list structure. 66 You can use the debug tool to inspect the data, schema, and indices of your 67 Dgraph cluster. 68 69 Some scenarios where the debug tool is useful: 70 71 - Verify that mutations committed to Dgraph have been persisted to disk. 72 - Verify that indices are created. 73 - Inspect the history of a posting list. 74 75 ### Example Usage 76 77 Debug the p directory. 78 79 ```sh 80 $ dgraph debug --postings ./p 81 ``` 82 83 Debug the p directory, not opening in read-only mode. This is typically necessary when the database was not closed properly. 84 85 ```sh 86 $ dgraph debug --postings ./p --readonly=false 87 ``` 88 89 Debug the p directory, only outputing the keys for the predicate `name`. 90 91 ```sh 92 $ dgraph debug --postings ./p --readonly=false --pred=name 93 ``` 94 95 Debug the p directory, looking up a particular key: 96 97 ```sh 98 $ dgraph debug --postings ./p --lookup 00000b6465736372697074696f6e020866617374 99 ``` 100 101 Debug the p directory, inspecting the history of a particular key: 102 103 ```sh 104 $ dgraph debug --postings ./p --lookup 00000b6465736372697074696f6e020866617374 --history 105 ``` 106 107 108 ### Debug Tool Output 109 110 Let's go over an example with a Dgraph cluster with the following schema with a term index, full-text index, and two separately committed mutations: 111 112 ```sh 113 $ curl localhost:8080/alter -d ' 114 name: string @index(term) . 115 url: string . 116 description: string @index(fulltext) . 117 ' 118 ``` 119 120 ```sh 121 $ curl -H "Content-Type: application/rdf" localhost:8080/mutate?commitNow=true -d '{ 122 set { 123 _:dgraph <name> "Dgraph" . 124 _:dgraph <url> "https://github.com/dgraph-io/dgraph" . 125 _:dgraph <description> "Fast, Transactional, Distributed Graph Database." . 126 } 127 }' 128 ``` 129 130 ```sh 131 $ curl -H "Content-Type: application/rdf" localhost:8080/mutate?commitNow=true -d '{ 132 set { 133 _:badger <name> "Badger" . 134 _:badger <url> "https://github.com/dgraph-io/badger" . 135 _:badger <description> "Embeddable, persistent and fast key-value (KV) database written in pure Go." . 136 } 137 }' 138 ``` 139 140 After stopping Dgraph, you can run the debug tool to inspect the postings directory: 141 142 {{% notice "note" %}} 143 The debug output can be very large. Typically you would redirect the debug tool to a file first for easier analysis. 144 {{% /notice %}} 145 146 ```sh 147 $ dgraph debug --postings ./p 148 ``` 149 150 ```text 151 Opening DB: ./p 152 Min commit: 1. Max commit: 5, w.r.t 18446744073709551615 153 prefix = 154 {d} {v.ok} attr: url uid: 1 key: 00000375726c000000000000000001 item: [71, b0100] ts: 3 155 {d} {v.ok} attr: url uid: 2 key: 00000375726c000000000000000002 item: [71, b0100] ts: 5 156 {d} {v.ok} attr: name uid: 1 key: 0000046e616d65000000000000000001 item: [43, b0100] ts: 3 157 {d} {v.ok} attr: name uid: 2 key: 0000046e616d65000000000000000002 item: [43, b0100] ts: 5 158 {i} {v.ok} attr: name term: [1] badger key: 0000046e616d650201626164676572 item: [30, b0100] ts: 5 159 {i} {v.ok} attr: name term: [1] dgraph key: 0000046e616d650201646772617068 item: [30, b0100] ts: 3 160 {d} {v.ok} attr: _predicate_ uid: 1 key: 00000b5f7072656469636174655f000000000000000001 item: [104, b0100] ts: 3 161 {d} {v.ok} attr: _predicate_ uid: 2 key: 00000b5f7072656469636174655f000000000000000002 item: [104, b0100] ts: 5 162 {d} {v.ok} attr: description uid: 1 key: 00000b6465736372697074696f6e000000000000000001 item: [92, b0100] ts: 3 163 {d} {v.ok} attr: description uid: 2 key: 00000b6465736372697074696f6e000000000000000002 item: [119, b0100] ts: 5 164 {i} {v.ok} attr: description term: [8] databas key: 00000b6465736372697074696f6e020864617461626173 item: [38, b0100] ts: 5 165 {i} {v.ok} attr: description term: [8] distribut key: 00000b6465736372697074696f6e0208646973747269627574 item: [40, b0100] ts: 3 166 {i} {v.ok} attr: description term: [8] embedd key: 00000b6465736372697074696f6e0208656d62656464 item: [37, b0100] ts: 5 167 {i} {v.ok} attr: description term: [8] fast key: 00000b6465736372697074696f6e020866617374 item: [35, b0100] ts: 5 168 {i} {v.ok} attr: description term: [8] go key: 00000b6465736372697074696f6e0208676f item: [33, b0100] ts: 5 169 {i} {v.ok} attr: description term: [8] graph key: 00000b6465736372697074696f6e02086772617068 item: [36, b0100] ts: 3 170 {i} {v.ok} attr: description term: [8] kei key: 00000b6465736372697074696f6e02086b6569 item: [34, b0100] ts: 5 171 {i} {v.ok} attr: description term: [8] kv key: 00000b6465736372697074696f6e02086b76 item: [33, b0100] ts: 5 172 {i} {v.ok} attr: description term: [8] persist key: 00000b6465736372697074696f6e020870657273697374 item: [38, b0100] ts: 5 173 {i} {v.ok} attr: description term: [8] pure key: 00000b6465736372697074696f6e020870757265 item: [35, b0100] ts: 5 174 {i} {v.ok} attr: description term: [8] transact key: 00000b6465736372697074696f6e02087472616e73616374 item: [39, b0100] ts: 3 175 {i} {v.ok} attr: description term: [8] valu key: 00000b6465736372697074696f6e020876616c75 item: [35, b0100] ts: 5 176 {i} {v.ok} attr: description term: [8] written key: 00000b6465736372697074696f6e02087772697474656e item: [38, b0100] ts: 5 177 {s} {v.ok} attr: url key: 01000375726c item: [13, b0001] ts: 1 178 {s} {v.ok} attr: name key: 0100046e616d65 item: [23, b0001] ts: 1 179 {s} {v.ok} attr: _predicate_ key: 01000b5f7072656469636174655f item: [31, b0001] ts: 1 180 {s} {v.ok} attr: description key: 01000b6465736372697074696f6e item: [41, b0001] ts: 1 181 {s} {v.ok} attr: dgraph.type key: 01000b6467726170682e74797065 item: [40, b0001] ts: 1 182 Found 28 keys 183 ``` 184 185 Each line in the debug output contains a prefix indicating the type of the key: `{d}`: Data key; `{i}`: Index key; `{c}`: Count key; `{r}`: Reverse key; `{s}`: Schema key. In the debug output above, we see data keys, index keys, and schema keys. 186 187 Each index key has a corresponding index type. For example, in `attr: name term: [1] dgraph` the `[1]` shows that this is the term index ([0x1][tok_term]); in `attr: description term: [8] fast`, the `[8]` shows that this is the full-text index ([0x8][tok_fulltext]). These IDs match the index IDs in [tok.go][tok]. 188 189 [tok_term]: https://github.com/dgraph-io/dgraph/blob/ce82aaafba3d9e57cf5ea1aeb9b637193441e1e2/tok/tok.go#L39 190 [tok_fulltext]: https://github.com/dgraph-io/dgraph/blob/ce82aaafba3d9e57cf5ea1aeb9b637193441e1e2/tok/tok.go#L48 191 [tok]: https://github.com/dgraph-io/dgraph/blob/ce82aaafba3d9e57cf5ea1aeb9b637193441e1e2/tok/tok.go#L37-L53 192 193 ### Key Lookup 194 195 Every key can be inspected further with the `--lookup` flag for the specific key. 196 197 ```sh 198 $ dgraph debug --postings ./p --lookup 00000b6465736372697074696f6e020866617374 199 ``` 200 201 ```text 202 Opening DB: ./p 203 Min commit: 1. Max commit: 5, w.r.t 18446744073709551615 204 Key: 00000b6465736372697074696f6e0208676f Length: 2 205 Uid: 1 Op: 1 206 Uid: 2 Op: 1 207 ``` 208 209 For data keys, a lookup shows its type and value. Below, we see that the key for `attr: url uid: 1` is a string value. 210 211 ```sh 212 $ dgraph debug --postings ./p --lookup 00000375726c000000000000000001 213 ``` 214 215 ```text 216 Opening DB: ./p 217 Min commit: 1. Max commit: 5, w.r.t 18446744073709551615 218 Key: 0000046e616d65000000000000000001 Length: 1 219 Uid: 18446744073709551615 Op: 1 Type: STRING. String Value: "https://github.com/dgraph-io/dgraph" 220 ``` 221 222 For index keys, a lookup shows the UIDs that are part of this index. Below, we see that the `fast` index for the `<description>` predicate has UIDs 0x1 and 0x2. 223 224 ```sh 225 $ dgraph debug --postings ./p --lookup 00000b6465736372697074696f6e020866617374 226 ``` 227 228 ```text 229 Opening DB: ./p 230 Min commit: 1. Max commit: 5, w.r.t 18446744073709551615 231 Key: 00000b6465736372697074696f6e0208676f Length: 2 232 Uid: 1 Op: 1 233 Uid: 2 Op: 1 234 ``` 235 236 ### Key History 237 238 You can also look up the history of values for a key using the `--history` option. 239 240 ```sh 241 $ dgraph debug --postings ./p --lookup 00000b6465736372697074696f6e020866617374 --history 242 ``` 243 ```text 244 Opening DB: ./p 245 Min commit: 1. Max commit: 5, w.r.t 18446744073709551615 246 ==> key: 00000b6465736372697074696f6e020866617374. PK: &{byteType:2 Attr:description Uid:0 Termfast Count:0 bytePrefix:0} 247 ts: 5 {item}{delta} 248 Uid: 2 Op: 1 249 250 ts: 3 {item}{delta} 251 Uid: 1 Op: 1 252 ``` 253 254 Above, we see that UID 0x1 was committed to this index at ts 3, and UID 0x2 was committed to this index at ts 5. 255 256 The debug output also shows UserMeta information: 257 258 - `{complete}`: Complete posting list 259 - `{uid}`: UID posting list 260 - `{delta}`: Delta posting list 261 - `{empty}`: Empty posting list 262 - `{item}`: Item posting list 263 - `{deleted}`: Delete marker 264 265 ## Using the Increment Tool 266 267 The `dgraph increment` tool increments a counter value transactionally. The 268 increment tool can be used as a health check that an Alpha is able to service 269 transactions for both queries and mutations. 270 271 ### Example Usage 272 273 Increment the default predicate (`counter.val`) once. If the predicate doesn't yet 274 exist, then it will be created starting at counter 0. 275 276 ```sh 277 $ dgraph increment 278 ``` 279 280 Increment the counter predicate against the Alpha running at address `--alpha` (default: `localhost:9080`): 281 282 ```sh 283 $ dgraph increment --alpha=192.168.1.10:9080 284 ``` 285 286 Increment the counter predicate specified by `--pred` (default: `counter.val`): 287 288 ```sh 289 $ dgraph increment --pred=counter.val.healthcheck 290 ``` 291 292 Run a read-only query for the counter predicate and does not run a mutation to increment it: 293 294 ```sh 295 $ dgraph increment --ro 296 ``` 297 298 Run a best-effort query for the counter predicate and does not run a mutation to increment it: 299 300 ```sh 301 $ dgraph increment --be 302 ``` 303 304 Run the increment tool 1000 times every 1 second: 305 306 ```sh 307 $ dgraph increment --num=1000 --wait=1s 308 ``` 309 310 ### Increment Tool Output 311 312 ```sh 313 # Run increment a few times 314 $ dgraph increment 315 0410 10:31:16.379 Counter VAL: 1 [ Ts: 1 ] 316 $ dgraph increment 317 0410 10:34:53.017 Counter VAL: 2 [ Ts: 3 ] 318 $ dgraph increment 319 0410 10:34:53.648 Counter VAL: 3 [ Ts: 5 ] 320 321 # Run read-only queries to read the counter a few times 322 $ dgraph increment --ro 323 0410 10:34:57.35 Counter VAL: 3 [ Ts: 7 ] 324 $ dgraph increment --ro 325 0410 10:34:57.886 Counter VAL: 3 [ Ts: 7 ] 326 $ dgraph increment --ro 327 0410 10:34:58.129 Counter VAL: 3 [ Ts: 7 ] 328 329 # Run best-effort query to read the counter a few times 330 $ dgraph increment --be 331 0410 10:34:59.867 Counter VAL: 3 [ Ts: 7 ] 332 $ dgraph increment --be 333 0410 10:35:01.322 Counter VAL: 3 [ Ts: 7 ] 334 $ dgraph increment --be 335 0410 10:35:02.674 Counter VAL: 3 [ Ts: 7 ] 336 337 # Run a read-only query to read the counter 5 times 338 $ dgraph increment --ro --num=5 339 0410 10:35:18.812 Counter VAL: 3 [ Ts: 7 ] 340 0410 10:35:18.813 Counter VAL: 3 [ Ts: 7 ] 341 0410 10:35:18.815 Counter VAL: 3 [ Ts: 7 ] 342 0410 10:35:18.817 Counter VAL: 3 [ Ts: 7 ] 343 0410 10:35:18.818 Counter VAL: 3 [ Ts: 7 ] 344 345 # Increment the counter 5 times 346 $ dgraph increment --num=5 347 0410 10:35:24.028 Counter VAL: 4 [ Ts: 8 ] 348 0410 10:35:24.061 Counter VAL: 5 [ Ts: 10 ] 349 0410 10:35:24.104 Counter VAL: 6 [ Ts: 12 ] 350 0410 10:35:24.145 Counter VAL: 7 [ Ts: 14 ] 351 0410 10:35:24.178 Counter VAL: 8 [ Ts: 16 ] 352 353 # Increment the counter 5 times, once every second. 354 $ dgraph increment --num=5 --wait=1s 355 0410 10:35:26.95 Counter VAL: 9 [ Ts: 18 ] 356 0410 10:35:27.975 Counter VAL: 10 [ Ts: 20 ] 357 0410 10:35:28.999 Counter VAL: 11 [ Ts: 22 ] 358 0410 10:35:30.028 Counter VAL: 12 [ Ts: 24 ] 359 0410 10:35:31.054 Counter VAL: 13 [ Ts: 26 ] 360 361 # If the Alpha is too busy or unhealthy, the tool will timeout and retry. 362 $ dgraph increment 363 0410 10:36:50.857 While trying to process counter: Query error: rpc error: code = DeadlineExceeded desc = context deadline exceeded. Retrying... 364 ``` 365 366 ## Giving Nodes a Type 367 368 It's often useful to give the nodes in a graph *types* (also commonly referred 369 to as *labels* or *kinds*). You can do so using the [type system]({{< relref "query-language/index.md#type-system" >}}). 370 371 ## Loading CSV Data 372 373 [Dgraph mutations]({{< relref "mutations/index.md" >}}) are accepted in RDF 374 N-Quad and JSON formats. To load CSV-formatted data into Dgraph, first convert 375 the dataset into one of the accepted formats and then load the resulting dataset 376 into Dgraph. This section demonstrates converting CSV into JSON. There are 377 many tools available to convert CSV to JSON. For example, you can use 378 [`d3-dsv`](https://github.com/d3/d3-dsv)'s `csv2json` tool as shown below: 379 380 ```csv 381 Name,URL 382 Dgraph,https://github.com/dgraph-io/dgraph 383 Badger,https://github.com/dgraph-io/badger 384 ``` 385 386 ```sh 387 $ csv2json names.csv --out names.json 388 $ cat names.json | jq '.' 389 [ 390 { 391 "Name": "Dgraph", 392 "URL": "https://github.com/dgraph-io/dgraph" 393 }, 394 { 395 "Name": "Badger", 396 "URL": "https://github.com/dgraph-io/badger" 397 } 398 ] 399 ``` 400 401 This JSON can be loaded into Dgraph via the programmatic clients. This follows 402 the [JSON Mutation Format]({{< relref "mutations#json-mutation-format" >}}). 403 Note that each JSON object in the list above will be assigned a unique UID since 404 the `uid` field is omitted. 405 406 [The Ratel UI (and HTTP clients) expect JSON data to be stored within the `"set"` 407 key]({{< relref "mutations/index.md#json-syntax-using-raw-http-or-ratel-ui" 408 >}}). You can use `jq` to transform the JSON into the correct format: 409 410 ```sh 411 $ cat names.json | jq '{ set: . }' 412 ``` 413 ```json 414 { 415 "set": [ 416 { 417 "Name": "Dgraph", 418 "URL": "https://github.com/dgraph-io/dgraph" 419 }, 420 { 421 "Name": "Badger", 422 "URL": "https://github.com/dgraph-io/badger" 423 } 424 ] 425 } 426 ``` 427 428 Let's say you have CSV data in a file named connects.csv that's connecting nodes 429 together. Here, the `connects` field should `uid` type. 430 431 ```csv 432 uid,connects 433 _:a,_:b 434 _:a,_:c 435 _:c,_:d 436 _:d,_:a 437 ``` 438 439 {{% notice "note" %}} 440 To reuse existing integer IDs from a CSV file as UIDs in Dgraph, use Dgraph Zero's [assign endpoint]({{< relref "deploy/index.md#more-about-dgraph-zero" >}}) before data loading to allocate a range of UIDs that can be safely assigned. 441 {{% /notice %}} 442 443 To get the correct JSON format, you can convert the CSV into JSON and use `jq` 444 to transform it in the correct format where the `connects` edge is a node uid: 445 446 ```sh 447 $ csv2json connects.csv | jq '[ .[] | { uid: .uid, connects: { uid: .connects } } ]' 448 ``` 449 450 ```json 451 [ 452 { 453 "uid": "_:a", 454 "connects": { 455 "uid": "_:b" 456 } 457 }, 458 { 459 "uid": "_:a", 460 "connects": { 461 "uid": "_:c" 462 } 463 }, 464 { 465 "uid": "_:c", 466 "connects": { 467 "uid": "_:d" 468 } 469 }, 470 { 471 "uid": "_:d", 472 "connects": { 473 "uid": "_:a" 474 } 475 } 476 ] 477 ``` 478 479 You can modify the `jq` transformation to output the mutation format accepted by 480 Ratel UI and HTTP clients: 481 482 ```sh 483 $ csv2json connects.csv | jq '{ set: [ .[] | {uid: .uid, connects: { uid: .connects } } ] }' 484 ``` 485 ```json 486 { 487 "set": [ 488 { 489 "uid": "_:a", 490 "connects": { 491 "uid": "_:b" 492 } 493 }, 494 { 495 "uid": "_:a", 496 "connects": { 497 "uid": "_:c" 498 } 499 }, 500 { 501 "uid": "_:c", 502 "connects": { 503 "uid": "_:d" 504 } 505 }, 506 { 507 "uid": "_:d", 508 "connects": { 509 "uid": "_:a" 510 } 511 } 512 ] 513 } 514 ``` 515 516 ## A Simple Login System 517 518 {{% notice "note" %}} 519 This example is based on part of the [transactions in 520 v0.9](https://blog.dgraph.io/post/v0.9/) blogpost. Error checking has been 521 omitted for brevity. 522 {{% /notice %}} 523 524 Schema is assumed to be: 525 ``` 526 // @upsert directive is important to detect conflicts. 527 email: string @index(exact) @upsert . # @index(hash) would also work 528 pass: password . 529 ``` 530 531 ``` 532 // Create a new transaction. The deferred call to Discard 533 // ensures that server-side resources are cleaned up. 534 txn := client.NewTxn() 535 defer txn.Discard(ctx) 536 537 // Create and execute a query to looks up an email and checks if the password 538 // matches. 539 q := fmt.Sprintf(` 540 { 541 login_attempt(func: eq(email, %q)) { 542 checkpwd(pass, %q) 543 } 544 } 545 `, email, pass) 546 resp, err := txn.Query(ctx, q) 547 548 // Unmarshal the response into a struct. It will be empty if the email couldn't 549 // be found. Otherwise it will contain a bool to indicate if the password matched. 550 var login struct { 551 Account []struct { 552 Pass []struct { 553 CheckPwd bool `json:"checkpwd"` 554 } `json:"pass"` 555 } `json:"login_attempt"` 556 } 557 err = json.Unmarshal(resp.GetJson(), &login); err != nil { 558 559 // Now perform the upsert logic. 560 if len(login.Account) == 0 { 561 fmt.Println("Account doesn't exist! Creating new account.") 562 mu := &protos.Mutation{ 563 SetJson: []byte(fmt.Sprintf(`{ "email": %q, "pass": %q }`, email, pass)), 564 } 565 _, err = txn.Mutate(ctx, mu) 566 // Commit the mutation, making it visible outside of the transaction. 567 err = txn.Commit(ctx) 568 } else if login.Account[0].Pass[0].CheckPwd { 569 fmt.Println("Login successful!") 570 } else { 571 fmt.Println("Wrong email or password.") 572 } 573 ``` 574 575 ## Upserts 576 577 Upsert-style operations are operations where: 578 579 1. A node is searched for, and then 580 2. Depending on if it is found or not, either: 581 - Updating some of its attributes, or 582 - Creating a new node with those attributes. 583 584 The upsert has to be an atomic operation such that either a new node is 585 created, or an existing node is modified. It's not allowed that two concurrent 586 upserts both create a new node. 587 588 There are many examples where upserts are useful. Most examples involve the 589 creation of a 1 to 1 mapping between two different entities. E.g. associating 590 email addresses with user accounts. 591 592 Upserts are common in both traditional RDBMSs and newer NoSQL databases. 593 Dgraph is no exception. 594 595 ### Upsert Procedure 596 597 In Dgraph, upsert-style behaviour can be implemented by users on top of 598 transactions. The steps are as follows: 599 600 1. Create a new transaction. 601 602 2. Query for the node. This will usually be as simple as `{ q(func: eq(email, 603 "bob@example.com")) { uid }}`. If a `uid` result is returned, then that's the 604 `uid` for the existing node. If no results are returned, then the user account 605 doesn't exist. 606 607 3. In the case where the user account doesn't exist, then a new node has to be 608 created. This is done in the usual way by making a mutation (inside the 609 transaction), e.g. the RDF `_:newAccount <email> "bob@example.com" .`. The 610 `uid` assigned can be accessed by looking up the blank node name `newAccount` 611 in the `Assigned` object returned from the mutation. 612 613 4. Now that you have the `uid` of the account (either new or existing), you can 614 modify the account (using additional mutations) or perform queries on it in 615 whichever way you wish. 616 617 ### Upsert Block 618 619 You can also use the `Upsert Block` to achieve the upsert procedure in a single 620 mutation. The request will contain both the query and the mutation as explained 621 [here]({{< relref "mutations/index.md#upsert-block" >}}). 622 623 ### Conflicts 624 625 Upsert operations are intended to be run concurrently, as per the needs of the 626 application. As such, it's possible that two concurrently running operations 627 could try to add the same node at the same time. For example, both try to add a 628 user with the same email address. If they do, then one of the transactions will 629 fail with an error indicating that the transaction was aborted. 630 631 If this happens, the transaction is rolled back and it's up to the user's 632 application logic to retry the whole operation. The transaction has to be 633 retried in its entirety, all the way from creating a new transaction. 634 635 The choice of index placed on the predicate is important for performance. 636 **Hash is almost always the best choice of index for equality checking.** 637 638 {{% notice "note" %}} 639 It's the _index_ that typically causes upsert conflicts to occur. The index is 640 stored as many key/value pairs, where each key is a combination of the 641 predicate name and some function of the predicate value (e.g. its hash for the 642 hash index). If two transactions modify the same key concurrently, then one 643 will fail. 644 {{% /notice %}} 645 646 ## Run Jepsen tests 647 648 1. Clone the jepsen repo at [https://github.com/jepsen-io/jepsen](https://github.com/jepsen-io/jepsen). 649 650 ```sh 651 git clone git@github.com:jepsen-io/jepsen.git 652 ``` 653 654 2. Run the following command to setup the instances from the repo. 655 656 ```sh 657 cd docker && ./up.sh 658 ``` 659 660 This should start 5 jepsen nodes in docker containers. 661 662 3. Now ssh into `jepsen-control` container and run the tests. 663 664 {{% notice "note" %}} 665 You can use the [transfer](https://github.com/dgraph-io/dgraph/blob/master/contrib/nightly/transfer.sh) script to build the Dgraph binary and upload the tarball to https://transfer.sh, which gives you a url that can then be used in the Jepsen tests (using --package-url flag). 666 {{% /notice %}} 667 668 669 670 ```sh 671 docker exec -it jepsen-control bash 672 ``` 673 674 ```sh 675 root@control:/jepsen# cd dgraph 676 root@control:/jepsen/dgraph# lein run test -w upsert 677 678 # Specify a --package-url 679 680 root@control:/jepsen/dgraph# lein run test --force-download --package-url https://github.com/dgraph-io/dgraph/releases/download/nightly/dgraph-linux-amd64.tar.gz -w upsert 681 ``` 682 683 ## Migrate to Dgraph v1.1 684 685 ### Schema types: scalar `uid` and list `[uid]` 686 687 The semantics of predicates of type `uid` has changed in Dgraph 1.1. Whereas before all `uid` predicates implied a one-to-many relationship, now a one-to-one relationship or a one-to-many relationship can be expressed. 688 689 ``` 690 friend: [uid] . 691 best_friend: uid . 692 ``` 693 694 In the above, the predicate `friend` allows a one-to-many relationship (i.e a person can have more than one friend) and the predicate best_friend can be at most a one-to-one relationship. 695 696 This syntactic meaning is consistent with the other types, e.g., `string` indicating a single-value string and `[string]` representing many strings. This change makes the `uid` type work similarly to other types. 697 698 To migrate existing schemas from Dgraph v1.0 to Dgraph v1.1, update the schema file from an export so all predicates of type `uid` are changed to `[uid]`. Then use the updated schema when loading data into Dgraph v1.1. For example, for the following schema: 699 700 ```text 701 name: string . 702 friend: uid . 703 ``` 704 705 becomes 706 707 ```text 708 name: string . 709 friend: [uid] . 710 ``` 711 ### Type system 712 713 The new [type system]({{< relref "query-language/index.md#type-system" >}}) introduced in Dgraph 1.1 should not affect migrating data from a previous version. However, a couple of features in the query language will not work as they did before: `expand()` and `_predicate_`. 714 715 The reason is that the internal predicate that associated each node with its predicates (called `_predicate_`) has been removed. Instead, to get the predicates that belong to a node, the type system is used. 716 717 #### `expand()` 718 719 Expand queries will not work until the type system has been properly set up. For example, the following query will return an empty result in Dgraph 1.1 if the node 0xff has no type information. 720 721 ```text 722 { 723 me(func: uid(0xff)) { 724 expand(_all_) 725 } 726 } 727 ``` 728 729 To make it work again, add a type definition via the alter endpoint. Let’s assume the node in the previous example represents a person. Then, the basic Person type could be defined as follows: 730 731 ```text 732 type Person { 733 name: string 734 age: int 735 } 736 ``` 737 738 After that, the node is associated with the type by adding the following RDF triple to Dgraph (using a mutation): 739 740 ```text 741 <0xff> <dgraph.type> "Person" . 742 ``` 743 744 After that, the results of the query in both Dgraph v1.0 and Dgraph v1.1 should be the same. 745 746 #### `_predicate_` 747 748 The other consequence of removing `_predicate_` is that it cannot be referenced explicitly in queries. In Dgraph 1.0, the following query returns the predicates of the node 0xff. 749 750 ```ql 751 { 752 me(func: uid(0xff)) { 753 _predicate_ # NOT available in Dgraph v1.1 754 } 755 } 756 ``` 757 758 **There’s no exact equivalent of this behavior in Dgraph 1.1**, but the information can be queried by first querying for the types associated with that node with the query 759 760 ```text 761 { 762 me(func: uid(0xff)) { 763 dgraph.type 764 } 765 } 766 ``` 767 768 And then retrieving the definition of each type in the results using a schema query. 769 770 ```text 771 schema(type: Person) {} 772 ``` 773 774 ### Live Loader and Bulk Loader command-line flags 775 776 #### File input flags 777 In Dgraph v1.1, both the Dgraph Live Loader and Dgraph Bulk Loader tools support loading data in either RDF format or JSON format. To simplify the command-line interface for these tools, the `-r`/`--rdfs` flag has been removed in favor of `-f/--files`. The new flag accepts file or directory paths for either data format. By default, the tools will infer the file type based on the file suffix, e.g., `.rdf` and `.rdf.gz` or `.json` and `.json.gz` for RDF data or JSON data, respectively. To ignore the filenames and set the format explicitly, the `--format` flag can be set to `rdf` or `json`. 778 779 Before (in Dgraph v1.0): 780 781 ```sh 782 dgraph live -r data.rdf.gz 783 ``` 784 785 Now (in Dgraph v1.1): 786 787 ```sh 788 dgraph live -f data.rdf.gz 789 ``` 790 791 #### Dgraph Alpha address flag 792 For Dgraph Live Loader, the flag to specify the Dgraph Alpha address (default: `127.0.0.1:9080`) has changed from `-d`/`--dgraph` to `-a`/`--alpha`. 793 794 Before (in Dgraph v1.0): 795 796 ```sh 797 dgraph live -d 127.0.0.1:9080 798 ``` 799 800 Now (in Dgraph v1.1): 801 802 ```sh 803 dgraph live -a 127.0.0.1:9080 804 ``` 805 ### HTTP API 806 807 For HTTP API users (e.g., Curl, Postman), the custom Dgraph headers have been removed in favor of standard HTTP headers and query parameters. 808 809 #### Queries 810 811 There are two accepted `Content-Type` headers for queries over HTTP: `application/graphql+-` or `application/json`. 812 813 A `Content-Type` must be set to run a query. 814 815 Before (in Dgraph v1.0): 816 817 ```sh 818 curl localhost:8080/query -d '{ 819 q(func: eq(name, "Dgraph")) { 820 name 821 } 822 }' 823 ``` 824 825 Now (in Dgraph v1.1): 826 827 ```sh 828 curl -H 'Content-Type: application/graphql+-' localhost:8080/query -d '{ 829 q(func: eq(name, "Dgraph")) { 830 name 831 } 832 }' 833 ``` 834 835 For queries using [GraphQL Variables]({{< relref "query-language/index.md#graphql-variables" >}}), the query must be sent via the `application/json` content type, with the query and variables sent in a JSON payload: 836 837 Before (in Dgraph v1.0): 838 839 ```sh 840 curl -H 'X-Dgraph-Vars: {"$name": "Alice"}' localhost:8080/query -d 'query qWithVars($name: string) { 841 q(func: eq(name, $name)) { 842 name 843 } 844 } 845 ``` 846 847 Now (in Dgraph v1.1): 848 849 ```sh 850 curl -H 'Content-Type: application/json' localhost:8080/query -d '{ 851 "query": "query qWithVars($name: string) { q(func: eq(name, $name)) { name } }", 852 "variables": {"$name": "Alice"} 853 }' 854 ``` 855 856 #### Mutations 857 858 There are two accepted Content-Type headers for mutations over HTTP: `Content-Type: application/rdf` or `Content-Type: application/json`. 859 860 A `Content-Type` must be set to run a mutation. 861 862 These Content-Type headers supercede the Dgraph v1.0.x custom header `X-Dgraph-MutationType` to set the mutation type as RDF or JSON. 863 864 To commit the mutation immediately, use the query parameter `commitNow=true`. This replaces the custom header `X-Dgraph-CommitNow: true` from Dgraph v1.0.x. 865 866 Before (in Dgraph v1.0) 867 868 ```sh 869 curl -H 'X-Dgraph-CommitNow: true' localhost:8080/mutate -d '{ 870 set { 871 _:n <name> "Alice" . 872 } 873 }' 874 ``` 875 876 Now (in Dgraph v1.1): 877 878 ```sh 879 curl -H 'Content-Type: application/rdf' localhost:8080/mutate?commitNow=true -d '{ 880 set { 881 _:n <name> "Alice" . 882 } 883 }' 884 ``` 885 886 For JSON mutations, set the `Content-Type` header to `application/json`. 887 888 Before (in Dgraph v1.0): 889 890 ```sh 891 curl -H 'X-Dgraph-MutationType: json' -H "X-Dgraph-CommitNow: true" locahost:8080/mutate -d '{ 892 "set": [ 893 { 894 "name": "Alice" 895 } 896 ] 897 }' 898 ``` 899 900 Now (in Dgraph v1.1): 901 902 ```sh 903 curl -H 'Content-Type: application/json' locahost:8080/mutate?commitNow=true -d '{ 904 "set": [ 905 { 906 "name": "Alice" 907 } 908 ] 909 }' 910 ```