github.com/unigraph-dev/dgraph@v1.1.1-0.20200923154953-8b52b426f765/wiki/content/query-language/index.md (about) 1 +++ 2 title = "Query Language" 3 +++ 4 5 Dgraph's GraphQL+- is based on Facebook's [GraphQL](https://facebook.github.io/graphql/). GraphQL wasn't developed for Graph databases, but its graph-like query syntax, schema validation and subgraph shaped response make it a great language choice. We've modified the language to better support graph operations, adding and removing features to get the best fit for graph databases. We're calling this simplified, feature rich language, ''GraphQL+-''. 6 7 GraphQL+- is a work in progress. We're adding more features and we might further simplify existing ones. 8 9 ## Take a Tour - https://tour.dgraph.io 10 11 This document is the Dgraph query reference material. It is not a tutorial. It's designed as a reference for users who already know how to write queries in GraphQL+- but need to check syntax, or indices, or functions, etc. 12 13 {{% notice "note" %}}If you are new to Dgraph and want to learn how to use Dgraph and GraphQL+-, take the tour - https://tour.dgraph.io{{% /notice %}} 14 15 16 ### Running examples 17 18 The examples in this reference use a database of 21 million triples about movies and actors. The example queries run and return results. The queries are executed by an instance of Dgraph running at https://play.dgraph.io/. To run the queries locally or experiment a bit more, see the [Getting Started]({{< relref "get-started/index.md" >}}) guide, which also shows how to load the datasets used in the examples here. 19 20 ## GraphQL+- Fundamentals 21 22 A GraphQL+- query finds nodes based on search criteria, matches patterns in a graph and returns a graph as a result. 23 24 A query is composed of nested blocks, starting with a query root. The root finds the initial set of nodes against which the following graph matching and filtering is applied. 25 26 {{% notice "note" %}}See more about Queries in [Queries design concept]({{< relref "design-concepts/index.md#queries" >}}) {{% /notice %}} 27 28 ### Returning Values 29 30 Each query has a name, specified at the query root, and the same name identifies the results. 31 32 If an edge is of a value type, the value can be returned by giving the edge name. 33 34 Query Example: In the example dataset, edges that link movies to directors and actors, movies have a name, release date and identifiers for a number of well known movie databases. This query, with name `bladerunner`, and root matching a movie name, returns those values for the early 80's sci-fi classic "Blade Runner". 35 36 {{< runnable >}} 37 { 38 bladerunner(func: eq(name@en, "Blade Runner")) { 39 uid 40 name@en 41 initial_release_date 42 netflix_id 43 } 44 } 45 {{< /runnable >}} 46 47 The query first searches the graph, using indexes to make the search efficient, for all nodes with a `name` edge equaling "Blade Runner". For the found node the query then returns the listed outgoing edges. 48 49 Every node had a unique 64-bit identifier. The `uid` edge in the query above returns that identifier. If the required node is already known, then the function `uid` finds the node. 50 51 Query Example: "Blade Runner" movie data found by UID. 52 53 {{< runnable >}} 54 { 55 bladerunner(func: uid(0x2066e)) { 56 uid 57 name@en 58 initial_release_date 59 netflix_id 60 } 61 } 62 {{< /runnable >}} 63 64 A query can match many nodes and return the values for each. 65 66 Query Example: All nodes that have either "Blade" or "Runner" in the name. 67 68 {{< runnable >}} 69 { 70 bladerunner(func: anyofterms(name@en, "Blade Runner")) { 71 uid 72 name@en 73 initial_release_date 74 netflix_id 75 } 76 } 77 {{< /runnable >}} 78 79 Multiple IDs can be specified in a list to the `uid` function. 80 81 Query Example: 82 {{< runnable >}} 83 { 84 movies(func: uid(0x25280, 0x707f9)) { 85 uid 86 name@en 87 initial_release_date 88 netflix_id 89 } 90 } 91 {{< /runnable >}} 92 93 94 {{% notice "note" %}} If your predicate has special characters, then you should wrap it with angular 95 brackets while asking for it in the query. E.g. `<first:name>`{{% /notice %}} 96 97 ### Expanding Graph Edges 98 99 A query expands edges from node to node by nesting query blocks with `{ }`. 100 101 Query Example: The actors and characters played in "Blade Runner". The query first finds the node with name "Blade Runner", then follows outgoing `starring` edges to nodes representing an actor's performance as a character. From there the `performance.actor` and `performance.character` edges are expanded to find the actor names and roles for every actor in the movie. 102 {{< runnable >}} 103 { 104 brCharacters(func: eq(name@en, "Blade Runner")) { 105 name@en 106 initial_release_date 107 starring { 108 performance.actor { 109 name@en # actor name 110 } 111 performance.character { 112 name@en # character name 113 } 114 } 115 } 116 } 117 {{< /runnable >}} 118 119 120 ### Comments 121 122 Anything on a line following a `#` is a comment 123 124 ### Applying Filters 125 126 The query root finds an initial set of nodes and the query proceeds by returning values and following edges to further nodes - any node reached in the query is found by traversal after the search at root. The nodes found can be filtered by applying `@filter`, either after the root or at any edge. 127 128 Query Example: "Blade Runner" director Ridley Scott's movies released before the year 2000. 129 {{< runnable >}} 130 { 131 scott(func: eq(name@en, "Ridley Scott")) { 132 name@en 133 initial_release_date 134 director.film @filter(le(initial_release_date, "2000")) { 135 name@en 136 initial_release_date 137 } 138 } 139 } 140 {{< /runnable >}} 141 142 Query Example: Movies with either "Blade" or "Runner" in the title and released before the year 2000. 143 144 {{< runnable >}} 145 { 146 bladerunner(func: anyofterms(name@en, "Blade Runner")) @filter(le(initial_release_date, "2000")) { 147 uid 148 name@en 149 initial_release_date 150 netflix_id 151 } 152 } 153 {{< /runnable >}} 154 155 ### Language Support 156 157 {{% notice "note" %}}A `@lang` directive must be specified in the schema to query or mutate 158 predicates with language tags.{{% /notice %}} 159 160 Dgraph supports UTF-8 strings. 161 162 In a query, for a string valued edge `edge`, the syntax 163 ``` 164 edge@lang1:...:langN 165 ``` 166 specifies the preference order for returned languages, with the following rules. 167 168 * At most one result will be returned (except in the case where the language list is set to *). 169 * The preference list is considered left to right: if a value in given language is not found, the next language from the list is considered. 170 * If there are no values in any of the specified languages, no value is returned. 171 * A final `.` means that a value without a specified language is returned or if there is no value without language, a value in ''some'' language is returned. 172 * Setting the language list value to * will return all the values for that predicate along with their language. Values without a language tag are also returned. 173 174 For example: 175 176 - `name` => Look for an untagged string; return nothing if no untagged value exits. 177 - `name@.` => Look for an untagged string, then any language. 178 - `name@en` => Look for `en` tagged string; return nothing if no `en` tagged string exists. 179 - `name@en:.` => Look for `en`, then untagged, then any language. 180 - `name@en:pl` => Look for `en`, then `pl`, otherwise nothing. 181 - `name@en:pl:.` => Look for `en`, then `pl`, then untagged, then any language. 182 - `name@*` => Look for all the values of this predicate and return them along with their language. For example, if there are two values with languages en and hi, this query will return two keys named "name@en" and "name@hi". 183 184 185 {{% notice "note" %}}In functions, language lists (including the `@*` notation) are not allowed. Untagged predicates, Single language tags, and `.` notation work as described above. 186 187 --- 188 189 In [full-text search functions]({{< relref "#full-text-search" >}}) (`alloftext`, `anyoftext`), when no language is specified (untagged or `@.`), the default (English) full-text tokenizer is used.{{% /notice %}} 190 191 192 Query Example: Some of Bollywood director and actor Farhan Akhtar's movies have a name stored in Russian as well as Hindi and English, others do not. 193 194 {{< runnable >}} 195 { 196 q(func: allofterms(name@en, "Farhan Akhtar")) { 197 name@hi 198 name@en 199 200 director.film { 201 name@ru:hi:en 202 name@en 203 name@hi 204 name@ru 205 } 206 } 207 } 208 {{< /runnable >}} 209 210 211 212 213 ## Functions 214 215 {{% notice "note" %}}Functions can only be applied to [indexed]({{< relref "#indexing">}}) predicates.{{% /notice %}} 216 217 Functions allow filtering based on properties of nodes or variables. Functions can be applied in the query root or in filters. 218 219 For functions on string valued predicates, if no language preference is given, the function is applied to all languages and strings without a language tag; if a language preference is given, the function is applied only to strings of the given language. 220 221 222 ### Term matching 223 224 225 #### allofterms 226 227 Syntax Example: `allofterms(predicate, "space-separated term list")` 228 229 Schema Types: `string` 230 231 Index Required: `term` 232 233 234 Matches strings that have all specified terms in any order; case insensitive. 235 236 ##### Usage at root 237 238 Query Example: All nodes that have `name` containing terms `indiana` and `jones`, returning the English name and genre in English. 239 240 {{< runnable >}} 241 { 242 me(func: allofterms(name@en, "jones indiana")) { 243 name@en 244 genre { 245 name@en 246 } 247 } 248 } 249 {{< /runnable >}} 250 251 ##### Usage as Filter 252 253 Query Example: All Steven Spielberg films that contain the words `indiana` and `jones`. The `@filter(has(director.film))` removes nodes with name Steven Spielberg that aren't the director --- the data also contains a character in a film called Steven Spielberg. 254 255 {{< runnable >}} 256 { 257 me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) { 258 name@en 259 director.film @filter(allofterms(name@en, "jones indiana")) { 260 name@en 261 } 262 } 263 } 264 {{< /runnable >}} 265 266 267 #### anyofterms 268 269 270 Syntax Example: `anyofterms(predicate, "space-separated term list")` 271 272 Schema Types: `string` 273 274 Index Required: `term` 275 276 277 Matches strings that have any of the specified terms in any order; case insensitive. 278 279 ##### Usage at root 280 281 Query Example: All nodes that have a `name` containing either `poison` or `peacock`. Many of the returned nodes are movies, but people like Joan Peacock also meet the search terms because without a [cascade directive]({{< relref "#cascade-directive">}}) the query doesn't require a genre. 282 283 {{< runnable >}} 284 { 285 me(func:anyofterms(name@en, "poison peacock")) { 286 name@en 287 genre { 288 name@en 289 } 290 } 291 } 292 {{< /runnable >}} 293 294 295 ##### Usage as filter 296 297 Query Example: All Steven Spielberg movies that contain `war` or `spies`. The `@filter(has(director.film))` removes nodes with name Steven Spielberg that aren't the director --- the data also contains a character in a film called Steven Spielberg. 298 299 {{< runnable >}} 300 { 301 me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) { 302 name@en 303 director.film @filter(anyofterms(name@en, "war spies")) { 304 name@en 305 } 306 } 307 } 308 {{< /runnable >}} 309 310 311 ### Regular Expressions 312 313 314 Syntax Examples: `regexp(predicate, /regular-expression/)` or case insensitive `regexp(predicate, /regular-expression/i)` 315 316 Schema Types: `string` 317 318 Index Required: `trigram` 319 320 321 Matches strings by regular expression. The regular expression language is that of [go regular expressions](https://golang.org/pkg/regexp/syntax/). 322 323 Query Example: At root, match nodes with `Steven Sp` at the start of `name`, followed by any characters. For each such matched uid, match the films containing `ryan`. Note the difference with `allofterms`, which would match only `ryan` but regular expression search will also match within terms, such as `bryan`. 324 325 {{< runnable >}} 326 { 327 directors(func: regexp(name@en, /^Steven Sp.*$/)) { 328 name@en 329 director.film @filter(regexp(name@en, /ryan/i)) { 330 name@en 331 } 332 } 333 } 334 {{< /runnable >}} 335 336 337 #### Technical details 338 339 A Trigram is a substring of three continuous runes. For example, `Dgraph` has trigrams `Dgr`, `gra`, `rap`, `aph`. 340 341 To ensure efficiency of regular expression matching, Dgraph uses [trigram indexing](https://swtch.com/~rsc/regexp/regexp4.html). That is, Dgraph converts the regular expression to a trigram query, uses the trigram index and trigram query to find possible matches and applies the full regular expression search only to the possibles. 342 343 #### Writing Efficient Regular Expressions and Limitations 344 345 Keep the following in mind when designing regular expression queries. 346 347 - At least one trigram must be matched by the regular expression (patterns shorter than 3 runes are not supported). That is, Dgraph requires regular expressions that can be converted to a trigram query. 348 - The number of alternative trigrams matched by the regular expression should be as small as possible (`[a-zA-Z][a-zA-Z][0-9]` is not a good idea). Many possible matches means the full regular expression is checked against many strings; where as, if the expression enforces more trigrams to match, Dgraph can make better use of the index and check the full regular expression against a smaller set of possible matches. 349 - Thus, the regular expression should be as precise as possible. Matching longer strings means more required trigrams, which helps to effectively use the index. 350 - If repeat specifications (`*`, `+`, `?`, `{n,m}`) are used, the entire regular expression must not match the _empty_ string or _any_ string: for example, `*` may be used like `[Aa]bcd*` but not like `(abcd)*` or `(abcd)|((defg)*)` 351 - Repeat specifications after bracket expressions (e.g. `[fgh]{7}`, `[0-9]+` or `[a-z]{3,5}`) are often considered as matching any string because they match too many trigrams. 352 - If the partial result (for subset of trigrams) exceeds 1000000 uids during index scan, the query is stopped to prohibit expensive queries. 353 354 355 ### Fuzzy matching 356 357 358 Syntax: `match(predicate, string, distance)` 359 360 Schema Types: `string` 361 362 Index Required: `trigram` 363 364 Matches predicate values by calculating the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance) to the string, 365 also known as _fuzzy matching_. The distance parameter must be greater than zero (0). Using a greater distance value can yield more but less accurate results. 366 367 Query Example: At root, fuzzy match nodes similar to `Stephen`, with a distance value of 8. 368 369 {{< runnable >}} 370 { 371 directors(func: match(name@en, Stephen, 8)) { 372 name@en 373 } 374 } 375 {{< /runnable >}} 376 377 Same query with a Levenshtein distance of 3. 378 379 {{< runnable >}} 380 { 381 directors(func: match(name@en, Stephen, 3)) { 382 name@en 383 } 384 } 385 {{< /runnable >}} 386 387 388 ### Full-Text Search 389 390 Syntax Examples: `alloftext(predicate, "space-separated text")` and `anyoftext(predicate, "space-separated text")` 391 392 Schema Types: `string` 393 394 Index Required: `fulltext` 395 396 397 Apply full-text search with stemming and stop words to find strings matching all or any of the given text. 398 399 The following steps are applied during index generation and to process full-text search arguments: 400 401 1. Tokenization (according to Unicode word boundaries). 402 1. Conversion to lowercase. 403 1. Unicode-normalization (to [Normalization Form KC](http://unicode.org/reports/tr15/#Norm_Forms)). 404 1. Stemming using language-specific stemmer (if supported by language). 405 1. Stop words removal (if supported by language). 406 407 Dgraph uses [bleve](https://github.com/blevesearch/bleve) for its full-text search indexing. See also the bleve language specific [stop word lists](https://github.com/blevesearch/bleve/tree/master/analysis/lang). 408 409 Following table contains all supported languages, corresponding country-codes, stemming and stop words filtering support. 410 411 | Language | Country Code | Stemming | Stop words | 412 | :--------: | :----------: | :------: | :--------: | 413 | Arabic | ar | ✓ | ✓ | 414 | Armenian | hy | | ✓ | 415 | Basque | eu | | ✓ | 416 | Bulgarian | bg | | ✓ | 417 | Catalan | ca | | ✓ | 418 | Chinese | zh | ✓ | ✓ | 419 | Czech | cs | | ✓ | 420 | Danish | da | ✓ | ✓ | 421 | Dutch | nl | ✓ | ✓ | 422 | English | en | ✓ | ✓ | 423 | Finnish | fi | ✓ | ✓ | 424 | French | fr | ✓ | ✓ | 425 | Gaelic | ga | | ✓ | 426 | Galician | gl | | ✓ | 427 | German | de | ✓ | ✓ | 428 | Greek | el | | ✓ | 429 | Hindi | hi | ✓ | ✓ | 430 | Hungarian | hu | ✓ | ✓ | 431 | Indonesian | id | | ✓ | 432 | Italian | it | ✓ | ✓ | 433 | Japanese | ja | ✓ | ✓ | 434 | Korean | ko | ✓ | ✓ | 435 | Norwegian | no | ✓ | ✓ | 436 | Persian | fa | | ✓ | 437 | Portuguese | pt | ✓ | ✓ | 438 | Romanian | ro | ✓ | ✓ | 439 | Russian | ru | ✓ | ✓ | 440 | Spanish | es | ✓ | ✓ | 441 | Swedish | sv | ✓ | ✓ | 442 | Turkish | tr | ✓ | ✓ | 443 444 445 Query Example: All names that have `dog`, `dogs`, `bark`, `barks`, `barking`, etc. Stop word removal eliminates `the` and `which`. 446 447 {{< runnable >}} 448 { 449 movie(func:alloftext(name@en, "the dog which barks")) { 450 name@en 451 } 452 } 453 {{< /runnable >}} 454 455 456 ### Inequality 457 458 #### equal to 459 460 Syntax Examples: 461 462 * `eq(predicate, value)` 463 * `eq(val(varName), value)` 464 * `eq(predicate, val(varName))` 465 * `eq(count(predicate), value)` 466 * `eq(predicate, [val1, val2, ..., valN])` 467 * `eq(predicate, [$var1, "value", ..., $varN])` 468 469 Schema Types: `int`, `float`, `bool`, `string`, `dateTime` 470 471 Index Required: An index is required for the `eq(predicate, ...)` forms (see table below). For `count(predicate)` at the query root, the `@count` index is required. For variables the values have been calculated as part of the query, so no index is required. 472 473 | Type | Index Options | 474 |:-----------|:--------------| 475 | `int` | `int` | 476 | `float` | `float` | 477 | `bool` | `bool` | 478 | `string` | `exact`, `hash` | 479 | `dateTime` | `dateTime` | 480 481 Test for equality of a predicate or variable to a value or find in a list of values. 482 483 The boolean constants are `true` and `false`, so with `eq` this becomes, for example, `eq(boolPred, true)`. 484 485 Query Example: Movies with exactly thirteen genres. 486 487 {{< runnable >}} 488 { 489 me(func: eq(count(genre), 13)) { 490 name@en 491 genre { 492 name@en 493 } 494 } 495 } 496 {{< /runnable >}} 497 498 499 Query Example: Directors called Steven who have directed 1,2 or 3 movies. 500 501 {{< runnable >}} 502 { 503 steve as var(func: allofterms(name@en, "Steven")) { 504 films as count(director.film) 505 } 506 507 stevens(func: uid(steve)) @filter(eq(val(films), [1,2,3])) { 508 name@en 509 numFilms : val(films) 510 } 511 } 512 {{< /runnable >}} 513 514 515 #### less than, less than or equal to, greater than and greater than or equal to 516 517 Syntax Examples: for inequality `IE` 518 519 * `IE(predicate, value)` 520 * `IE(val(varName), value)` 521 * `IE(predicate, val(varName))` 522 * `IE(count(predicate), value)` 523 524 With `IE` replaced by 525 526 * `le` less than or equal to 527 * `lt` less than 528 * `ge` greater than or equal to 529 * `gt` greather than 530 531 Schema Types: `int`, `float`, `string`, `dateTime` 532 533 Index required: An index is required for the `IE(predicate, ...)` forms (see table below). For `count(predicate)` at the query root, the `@count` index is required. For variables the values have been calculated as part of the query, so no index is required. 534 535 | Type | Index Options | 536 |:-----------|:--------------| 537 | `int` | `int` | 538 | `float` | `float` | 539 | `string` | `exact` | 540 | `dateTime` | `dateTime` | 541 542 543 Query Example: Ridley Scott movies released before 1980. 544 545 {{< runnable >}} 546 { 547 me(func: eq(name@en, "Ridley Scott")) { 548 name@en 549 director.film @filter(lt(initial_release_date, "1980-01-01")) { 550 initial_release_date 551 name@en 552 } 553 } 554 } 555 {{< /runnable >}} 556 557 558 Query Example: Movies with directors with `Steven` in `name` and have directed more than `100` actors. 559 560 {{< runnable >}} 561 { 562 ID as var(func: allofterms(name@en, "Steven")) { 563 director.film { 564 num_actors as count(starring) 565 } 566 total as sum(val(num_actors)) 567 } 568 569 dirs(func: uid(ID)) @filter(gt(val(total), 100)) { 570 name@en 571 total_actors : val(total) 572 } 573 } 574 {{< /runnable >}} 575 576 577 578 Query Example: A movie in each genre that has over 30000 movies. Because there is no order specified on genres, the order will be by UID. The [count index]({{< relref "#count-index">}}) records the number of edges out of nodes and makes such queries more . 579 580 {{< runnable >}} 581 { 582 genre(func: gt(count(~genre), 30000)){ 583 name@en 584 ~genre (first:1) { 585 name@en 586 } 587 } 588 } 589 {{< /runnable >}} 590 591 Query Example: Directors called Steven and their movies which have `initial_release_date` greater 592 than that of the movie Minority Report. 593 594 {{< runnable >}} 595 { 596 var(func: eq(name@en,"Minority Report")) { 597 d as initial_release_date 598 } 599 600 me(func: eq(name@en, "Steven Spielberg")) { 601 name@en 602 director.film @filter(ge(initial_release_date, val(d))) { 603 initial_release_date 604 name@en 605 } 606 } 607 } 608 {{< /runnable >}} 609 610 611 ### uid 612 613 Syntax Examples: 614 615 * `q(func: uid(<uid>)) ` 616 * `predicate @filter(uid(<uid1>, ..., <uidn>))` 617 * `predicate @filter(uid(a))` for variable `a` 618 * `q(func: uid(a,b))` for variables `a` and `b` 619 620 621 Filters nodes at the current query level to only nodes in the given set of UIDs. 622 623 For query variable `a`, `uid(a)` represents the set of UIDs stored in `a`. For value variable `b`, `uid(b)` represents the UIDs from the UID to value map. With two or more variables, `uid(a,b,...)` represents the union of all the variables. 624 625 `uid(<uid>)`, like an identity function, will return the requested UID even if the node does not have any edges. 626 627 Query Example: If the UID of a node is known, values for the node can be read directly. The films of Priyanka Chopra by known UID 628 629 {{< runnable >}} 630 { 631 films(func: uid(0x1daf5)) { 632 name@hi 633 actor.film { 634 performance.film { 635 name@hi 636 } 637 } 638 } 639 } 640 {{< /runnable >}} 641 642 643 644 Query Example: The films of Taraji Henson by genre. 645 {{< runnable >}} 646 { 647 var(func: allofterms(name@en, "Taraji Henson")) { 648 actor.film { 649 F as performance.film { 650 G as genre 651 } 652 } 653 } 654 655 Taraji_films_by_genre(func: uid(G)) { 656 genre_name : name@en 657 films : ~genre @filter(uid(F)) { 658 film_name : name@en 659 } 660 } 661 } 662 {{< /runnable >}} 663 664 665 666 Query Example: Taraji Henson films ordered by number of genres, with genres listed in order of how many films Taraji has made in each genre. 667 {{< runnable >}} 668 { 669 var(func: allofterms(name@en, "Taraji Henson")) { 670 actor.film { 671 F as performance.film { 672 G as count(genre) 673 genre { 674 C as count(~genre @filter(uid(F))) 675 } 676 } 677 } 678 } 679 680 Taraji_films_by_genre_count(func: uid(G), orderdesc: val(G)) { 681 film_name : name@en 682 genres : genre (orderdesc: val(C)) { 683 genre_name : name@en 684 } 685 } 686 } 687 {{< /runnable >}} 688 689 690 ### uid_in 691 692 693 Syntax Examples: 694 695 * `q(func: ...) @filter(uid_in(predicate, <uid>))` 696 * `predicate1 @filter(uid_in(predicate2, <uid>))` 697 698 Schema Types: UID 699 700 Index Required: none 701 702 While the `uid` function filters nodes at the current level based on UID, function `uid_in` allows looking ahead along an edge to check that it leads to a particular UID. This can often save an extra query block and avoids returning the edge. 703 704 `uid_in` cannot be used at root, it accepts one UID constant as its argument (not a variable). 705 706 707 Query Example: The collaborations of Marc Caro and Jean-Pierre Jeunet (UID 0x99706). If the UID of Jean-Pierre Jeunet is known, querying this way removes the need to have a block extracting his UID into a variable and the extra edge traversal and filter for `~director.film`. 708 {{< runnable >}} 709 { 710 caro(func: eq(name@en, "Marc Caro")) { 711 name@en 712 director.film @filter(uid_in(~director.film, 0x99706)) { 713 name@en 714 } 715 } 716 } 717 {{< /runnable >}} 718 719 720 ### has 721 722 Syntax Examples: `has(predicate)` 723 724 Schema Types: all 725 726 Determines if a node has a particular predicate. 727 728 Query Example: First five directors and all their movies that have a release date recorded. Directors have directed at least one film --- equivalent semantics to `gt(count(director.film), 0)`. 729 {{< runnable >}} 730 { 731 me(func: has(director.film), first: 5) { 732 name@en 733 director.film @filter(has(initial_release_date)) { 734 initial_release_date 735 name@en 736 } 737 } 738 } 739 {{< /runnable >}} 740 741 ### Geolocation 742 743 {{% notice "note" %}} As of now we only support indexing Point, Polygon and MultiPolygon [geometry types](https://github.com/twpayne/go-geom#geometry-types). However, Dgraph can store other types of gelocation data. {{% /notice %}} 744 745 Note that for geo queries, any polygon with holes is replace with the outer loop, ignoring holes. Also, as for version 0.7.7 polygon containment checks are approximate. 746 747 #### Mutations 748 749 To make use of the geo functions you would need an index on your predicate. 750 ``` 751 loc: geo @index(geo) . 752 ``` 753 754 Here is how you would add a `Point`. 755 756 ``` 757 { 758 set { 759 <_:0xeb1dde9c> <loc> "{'type':'Point','coordinates':[-122.4220186,37.772318]}"^^<geo:geojson> . 760 <_:0xeb1dde9c> <name> "Hamon Tower" . 761 } 762 } 763 ``` 764 765 Here is how you would associate a `Polygon` with a node. Adding a `MultiPolygon` is also similar. 766 767 ``` 768 { 769 set { 770 <_:0xf76c276b> <loc> "{'type':'Polygon','coordinates':[[[-122.409869,37.7785442],[-122.4097444,37.7786443],[-122.4097544,37.7786521],[-122.4096334,37.7787494],[-122.4096233,37.7787416],[-122.4094004,37.7789207],[-122.4095818,37.7790617],[-122.4097883,37.7792189],[-122.4102599,37.7788413],[-122.409869,37.7785442]],[[-122.4097357,37.7787848],[-122.4098499,37.778693],[-122.4099025,37.7787339],[-122.4097882,37.7788257],[-122.4097357,37.7787848]]]}"^^<geo:geojson> . 771 <_:0xf76c276b> <name> "Best Western Americana Hotel" . 772 } 773 } 774 ``` 775 776 The above examples have been picked from our [SF Tourism](https://github.com/dgraph-io/benchmarks/blob/master/data/sf.tourism.gz?raw=true) dataset. 777 778 #### Query 779 780 ##### near 781 782 Syntax Example: `near(predicate, [long, lat], distance)` 783 784 Schema Types: `geo` 785 786 Index Required: `geo` 787 788 Matches all entities where the location given by `predicate` is within `distance` meters of geojson coordinate `[long, lat]`. 789 790 Query Example: Tourist destinations within 1000 meters (1 kilometer) of a point in Golden Gate Park in San Francisco. 791 792 {{< runnable >}} 793 { 794 tourist(func: near(loc, [-122.469829, 37.771935], 1000) ) { 795 name 796 } 797 } 798 {{< /runnable >}} 799 800 801 ##### within 802 803 Syntax Example: `within(predicate, [[[long1, lat1], ..., [longN, latN]]])` 804 805 Schema Types: `geo` 806 807 Index Required: `geo` 808 809 Matches all entities where the location given by `predicate` lies within the polygon specified by the geojson coordinate array. 810 811 Query Example: Tourist destinations within the specified area of Golden Gate Park, San Francisco. 812 813 {{< runnable >}} 814 { 815 tourist(func: within(loc, [[[-122.47266769409178, 37.769018558337926 ], [ -122.47266769409178, 37.773699921075135 ], [ -122.4651575088501, 37.773699921075135 ], [ -122.4651575088501, 37.769018558337926 ], [ -122.47266769409178, 37.769018558337926]]] )) { 816 name 817 } 818 } 819 {{< /runnable >}} 820 821 822 ##### contains 823 824 Syntax Examples: `contains(predicate, [long, lat])` or `contains(predicate, [[long1, lat1], ..., [longN, latN]])` 825 826 Schema Types: `geo` 827 828 Index Required: `geo` 829 830 Matches all entities where the polygon describing the location given by `predicate` contains geojson coordinate `[long, lat]` or given geojson polygon. 831 832 Query Example : All entities that contain a point in the flamingo enclosure of San Francisco Zoo. 833 {{< runnable >}} 834 { 835 tourist(func: contains(loc, [ -122.50326097011566, 37.73353615592843 ] )) { 836 name 837 } 838 } 839 {{< /runnable >}} 840 841 842 ##### intersects 843 844 Syntax Example: `intersects(predicate, [[[long1, lat1], ..., [longN, latN]]])` 845 846 Schema Types: `geo` 847 848 Index Required: `geo` 849 850 Matches all entities where the polygon describing the location given by `predicate` intersects the given geojson polygon. 851 852 853 {{< runnable >}} 854 { 855 tourist(func: intersects(loc, [[[-122.503325343132, 37.73345766902749 ], [ -122.503325343132, 37.733903134117966 ], [ -122.50271648168564, 37.733903134117966 ], [ -122.50271648168564, 37.73345766902749 ], [ -122.503325343132, 37.73345766902749]]] )) { 856 name 857 } 858 } 859 {{< /runnable >}} 860 861 862 863 ## Connecting Filters 864 865 Within `@filter` multiple functions can be used with boolean connectives. 866 867 ### AND, OR and NOT 868 869 Connectives `AND`, `OR` and `NOT` join filters and can be built into arbitrarily complex filters, such as `(NOT A OR B) AND (C AND NOT (D OR E))`. Note that, `NOT` binds more tightly than `AND` which binds more tightly than `OR`. 870 871 Query Example : All Steven Spielberg movies that contain either both "indiana" and "jones" OR both "jurassic" and "park". 872 873 {{< runnable >}} 874 { 875 me(func: eq(name@en, "Steven Spielberg")) @filter(has(director.film)) { 876 name@en 877 director.film @filter(allofterms(name@en, "jones indiana") OR allofterms(name@en, "jurassic park")) { 878 uid 879 name@en 880 } 881 } 882 } 883 {{< /runnable >}} 884 885 886 ## Alias 887 888 Syntax Examples: 889 890 * `aliasName : predicate` 891 * `aliasName : predicate { ... }` 892 * `aliasName : varName as ...` 893 * `aliasName : count(predicate)` 894 * `aliasName : max(val(varName))` 895 896 An alias provides an alternate name in results. Predicates, variables and aggregates can be aliased by prefixing with the alias name and `:`. Aliases do not have to be different to the original predicate name, but, within a block, an alias must be distinct from predicate names and other aliases returned in the same block. Aliases can be used to return the same predicate multiple times within a block. 897 898 899 900 Query Example: Directors with `name` matching term `Steven`, their UID, English name, average number of actors per movie, total number of films, and the name of each film in English and French. 901 {{< runnable >}} 902 { 903 ID as var(func: allofterms(name@en, "Steven")) @filter(has(director.film)) { 904 director.film { 905 num_actors as count(starring) 906 } 907 average as avg(val(num_actors)) 908 } 909 910 films(func: uid(ID)) { 911 director_id : uid 912 english_name : name@en 913 average_actors : val(average) 914 num_films : count(director.film) 915 916 films : director.film { 917 name : name@en 918 english_name : name@en 919 french_name : name@fr 920 } 921 } 922 } 923 {{< /runnable >}} 924 925 926 ## Pagination 927 928 Pagination allows returning only a portion, rather than the whole, result set. This can be useful for top-k style queries as well as to reduce the size of the result set for client side processing or to allow paged access to results. 929 930 Pagination is often used with [sorting]({{< relref "#sorting">}}). 931 932 {{% notice "note" %}}Without a sort order specified, the results are sorted by `uid`, which is assigned randomly. So the ordering, while deterministic, might not be what you expected.{{% /notice %}} 933 934 ### First 935 936 Syntax Examples: 937 938 * `q(func: ..., first: N)` 939 * `predicate (first: N) { ... }` 940 * `predicate @filter(...) (first: N) { ... }` 941 942 For positive `N`, `first: N` retrieves the first `N` results, by sorted or UID order. 943 944 For negative `N`, `first: N` retrieves the last `N` results, by sorted or UID order. Currently, negative is only supported when no order is applied. To achieve the effect of a negative with a sort, reverse the order of the sort and use a positive `N`. 945 946 947 Query Example: Last two films, by UID order, directed by Steven Spielberg and the first three genres of those movies, sorted alphabetically by English name. 948 949 {{< runnable >}} 950 { 951 me(func: allofterms(name@en, "Steven Spielberg")) { 952 director.film (first: -2) { 953 name@en 954 initial_release_date 955 genre (orderasc: name@en) (first: 3) { 956 name@en 957 } 958 } 959 } 960 } 961 {{< /runnable >}} 962 963 964 965 Query Example: The three directors named Steven who have directed the most actors of all directors named Steven. 966 967 {{< runnable >}} 968 { 969 ID as var(func: allofterms(name@en, "Steven")) @filter(has(director.film)) { 970 director.film { 971 stars as count(starring) 972 } 973 totalActors as sum(val(stars)) 974 } 975 976 mostStars(func: uid(ID), orderdesc: val(totalActors), first: 3) { 977 name@en 978 stars : val(totalActors) 979 980 director.film { 981 name@en 982 } 983 } 984 } 985 {{< /runnable >}} 986 987 ### Offset 988 989 Syntax Examples: 990 991 * `q(func: ..., offset: N)` 992 * `predicate (offset: N) { ... }` 993 * `predicate (first: M, offset: N) { ... }` 994 * `predicate @filter(...) (offset: N) { ... }` 995 996 With `offset: N` the first `N` results are not returned. Used in combination with first, `first: M, offset: N` skips over `N` results and returns the following `M`. 997 998 Query Example: Order Hark Tsui's films by English title, skip over the first 4 and return the following 6. 999 1000 {{< runnable >}} 1001 { 1002 me(func: allofterms(name@en, "Hark Tsui")) { 1003 name@zh 1004 name@en 1005 director.film (orderasc: name@en) (first:6, offset:4) { 1006 genre { 1007 name@en 1008 } 1009 name@zh 1010 name@en 1011 initial_release_date 1012 } 1013 } 1014 } 1015 {{< /runnable >}} 1016 1017 ### After 1018 1019 Syntax Examples: 1020 1021 * `q(func: ..., after: UID)` 1022 * `predicate (first: N, after: UID) { ... }` 1023 * `predicate @filter(...) (first: N, after: UID) { ... }` 1024 1025 Another way to get results after skipping over some results is to use the default UID ordering and skip directly past a node specified by UID. For example, a first query could be of the form `predicate (after: 0x0, first: N)`, or just `predicate (first: N)`, with subsequent queries of the form `predicate(after: <uid of last entity in last result>, first: N)`. 1026 1027 1028 Query Example: The first five of Baz Luhrmann's films, sorted by UID order. 1029 1030 {{< runnable >}} 1031 { 1032 me(func: allofterms(name@en, "Baz Luhrmann")) { 1033 name@en 1034 director.film (first:5) { 1035 uid 1036 name@en 1037 } 1038 } 1039 } 1040 {{< /runnable >}} 1041 1042 The fifth movie is the Australian movie classic Strictly Ballroom. It has UID `0x99e44`. The results after Strictly Ballroom can now be obtained with `after`. 1043 1044 {{< runnable >}} 1045 { 1046 me(func: allofterms(name@en, "Baz Luhrmann")) { 1047 name@en 1048 director.film (first:5, after: 0x99e44) { 1049 uid 1050 name@en 1051 } 1052 } 1053 } 1054 {{< /runnable >}} 1055 1056 1057 ## Count 1058 1059 Syntax Examples: 1060 1061 * `count(predicate)` 1062 * `count(uid)` 1063 1064 The form `count(predicate)` counts how many `predicate` edges lead out of a node. 1065 1066 The form `count(uid)` counts the number of UIDs matched in the enclosing block. 1067 1068 Query Example: The number of films acted in by each actor with `Orlando` in their name. 1069 1070 {{< runnable >}} 1071 { 1072 me(func: allofterms(name@en, "Orlando")) @filter(has(actor.film)) { 1073 name@en 1074 count(actor.film) 1075 } 1076 } 1077 {{< /runnable >}} 1078 1079 Count can be used at root and [aliased]({{< relref "#alias">}}). 1080 1081 Query Example: Count of directors who have directed more than five films. When used at the query root, the [count index]({{< relref "#count-index">}}) is required. 1082 1083 {{< runnable >}} 1084 { 1085 directors(func: gt(count(director.film), 5)) { 1086 totalDirectors : count(uid) 1087 } 1088 } 1089 {{< /runnable >}} 1090 1091 1092 Count can be assigned to a [value variable]({{< relref "#value-variables">}}). 1093 1094 Query Example: The actors of Ang Lee's "Eat Drink Man Woman" ordered by the number of movies acted in. 1095 1096 {{< runnable >}} 1097 { 1098 var(func: allofterms(name@en, "eat drink man woman")) { 1099 starring { 1100 actors as performance.actor { 1101 totalRoles as count(actor.film) 1102 } 1103 } 1104 } 1105 1106 edmw(func: uid(actors), orderdesc: val(totalRoles)) { 1107 name@en 1108 name@zh 1109 totalRoles : val(totalRoles) 1110 } 1111 } 1112 {{< /runnable >}} 1113 1114 1115 ## Sorting 1116 1117 Syntax Examples: 1118 1119 * `q(func: ..., orderasc: predicate)` 1120 * `q(func: ..., orderdesc: val(varName))` 1121 * `predicate (orderdesc: predicate) { ... }` 1122 * `predicate @filter(...) (orderasc: N) { ... }` 1123 * `q(func: ..., orderasc: predicate1, orderdesc: predicate2)` 1124 1125 Sortable Types: `int`, `float`, `String`, `dateTime`, `default` 1126 1127 Results can be sorted in ascending order (`orderasc`) or descending order (`orderdesc`) by a predicate or variable. 1128 1129 For sorting on predicates with [sortable indices]({{< relref "#sortable-indices">}}), Dgraph sorts on the values and with the index in parallel and returns whichever result is computed first. 1130 1131 Sorted queries retrieve up to 1000 results by default. This can be changed with [first]({{< relref "#first">}}). 1132 1133 1134 Query Example: French director Jean-Pierre Jeunet's movies sorted by release date. 1135 1136 {{< runnable >}} 1137 { 1138 me(func: allofterms(name@en, "Jean-Pierre Jeunet")) { 1139 name@fr 1140 director.film(orderasc: initial_release_date) { 1141 name@fr 1142 name@en 1143 initial_release_date 1144 } 1145 } 1146 } 1147 {{< /runnable >}} 1148 1149 Sorting can be performed at root and on value variables. 1150 1151 Query Example: All genres sorted alphabetically and the five movies in each genre with the most genres. 1152 1153 {{< runnable >}} 1154 { 1155 genres as var(func: has(~genre)) { 1156 ~genre { 1157 numGenres as count(genre) 1158 } 1159 } 1160 1161 genres(func: uid(genres), orderasc: name@en) { 1162 name@en 1163 ~genre (orderdesc: val(numGenres), first: 5) { 1164 name@en 1165 genres : val(numGenres) 1166 } 1167 } 1168 } 1169 {{< /runnable >}} 1170 1171 Sorting can also be performed by multiple predicates as shown below. If the values are equal for the 1172 first predicate, then they are sorted by the second predicate and so on. 1173 1174 Query Example: Find all nodes which have type Person, sort them by their first_name and among those 1175 that have the same first_name sort them by last_name in descending order. 1176 1177 ``` 1178 { 1179 me(func: type("Person"), orderasc: first_name, orderdesc: last_name) { 1180 first_name 1181 last_name 1182 } 1183 } 1184 ``` 1185 1186 ## Multiple Query Blocks 1187 1188 Inside a single query, multiple query blocks are allowed. The result is all blocks with corresponding block names. 1189 1190 Multiple query blocks are executed in parallel. 1191 1192 The blocks need not be related in any way. 1193 1194 Query Example: All of Angelina Jolie's films, with genres, and Peter Jackson's films since 2008. 1195 1196 {{< runnable >}} 1197 { 1198 AngelinaInfo(func:allofterms(name@en, "angelina jolie")) { 1199 name@en 1200 actor.film { 1201 performance.film { 1202 genre { 1203 name@en 1204 } 1205 } 1206 } 1207 } 1208 1209 DirectorInfo(func: eq(name@en, "Peter Jackson")) { 1210 name@en 1211 director.film @filter(ge(initial_release_date, "2008")) { 1212 Release_date: initial_release_date 1213 Name: name@en 1214 } 1215 } 1216 } 1217 {{< /runnable >}} 1218 1219 1220 If queries contain some overlap in answers, the result sets are still independent. 1221 1222 Query Example: The movies Mackenzie Crook has acted in and the movies Jack Davenport has acted in. The results sets overlap because both have acted in the Pirates of the Caribbean movies, but the results are independent and both contain the full answers sets. 1223 1224 {{< runnable >}} 1225 { 1226 Mackenzie(func:allofterms(name@en, "Mackenzie Crook")) { 1227 name@en 1228 actor.film { 1229 performance.film { 1230 uid 1231 name@en 1232 } 1233 performance.character { 1234 name@en 1235 } 1236 } 1237 } 1238 1239 Jack(func:allofterms(name@en, "Jack Davenport")) { 1240 name@en 1241 actor.film { 1242 performance.film { 1243 uid 1244 name@en 1245 } 1246 performance.character { 1247 name@en 1248 } 1249 } 1250 } 1251 } 1252 {{< /runnable >}} 1253 1254 1255 ### Var Blocks 1256 1257 Var blocks start with the keyword `var` and are not returned in the query results. 1258 1259 Query Example: Angelina Jolie's movies ordered by genre. 1260 1261 {{< runnable >}} 1262 { 1263 var(func:allofterms(name@en, "angelina jolie")) { 1264 name@en 1265 actor.film { 1266 A AS performance.film { 1267 B AS genre 1268 } 1269 } 1270 } 1271 1272 films(func: uid(B), orderasc: name@en) { 1273 name@en 1274 ~genre @filter(uid(A)) { 1275 name@en 1276 } 1277 } 1278 } 1279 {{< /runnable >}} 1280 1281 1282 ## Query Variables 1283 1284 Syntax Examples: 1285 1286 * `varName as q(func: ...) { ... }` 1287 * `varName as var(func: ...) { ... }` 1288 * `varName as predicate { ... }` 1289 * `varName as predicate @filter(...) { ... }` 1290 1291 Types : `uid` 1292 1293 Nodes (UIDs) matched at one place in a query can be stored in a variable and used elsewhere. Query variables can be used in other query blocks or in a child node of the defining block. 1294 1295 Query variables do not affect the semantics of the query at the point of definition. Query variables are evaluated to all nodes matched by the defining block. 1296 1297 In general, query blocks are executed in parallel, but variables impose an evaluation order on some blocks. Cycles induced by variable dependence are not permitted. 1298 1299 If a variable is defined, it must be used elsewhere in the query. 1300 1301 A query variable is used by extracting the UIDs in it with `uid(var-name)`. 1302 1303 The syntax `func: uid(A,B)` or `@filter(uid(A,B))` means the union of UIDs for variables `A` and `B`. 1304 1305 Query Example: The movies of Angelia Jolie and Brad Pitt where both have acted on movies in the same genre. Note that `B` and `D` match all genres for all movies, not genres per movie. 1306 {{< runnable >}} 1307 { 1308 var(func:allofterms(name@en, "angelina jolie")) { 1309 actor.film { 1310 A AS performance.film { # All films acted in by Angelina Jolie 1311 B As genre # Genres of all the films acted in by Angelina Jolie 1312 } 1313 } 1314 } 1315 1316 var(func:allofterms(name@en, "brad pitt")) { 1317 actor.film { 1318 C AS performance.film { # All films acted in by Brad Pitt 1319 D as genre # Genres of all the films acted in by Brad Pitt 1320 } 1321 } 1322 } 1323 1324 films(func: uid(D)) @filter(uid(B)) { # Genres from both Angelina and Brad 1325 name@en 1326 ~genre @filter(uid(A, C)) { # Movies in either A or C. 1327 name@en 1328 } 1329 } 1330 } 1331 {{< /runnable >}} 1332 1333 1334 ## Value Variables 1335 1336 Syntax Examples: 1337 1338 * `varName as scalarPredicate` 1339 * `varName as count(predicate)` 1340 * `varName as avg(...)` 1341 * `varName as math(...)` 1342 1343 Types : `int`, `float`, `String`, `dateTime`, `default`, `geo`, `bool` 1344 1345 Value variables store scalar values. Value variables are a map from the UIDs of the enclosing block to the corresponding values. 1346 1347 It therefore only makes sense to use the values from a value variable in a context that matches the same UIDs - if used in a block matching different UIDs the value variable is undefined. 1348 1349 It is an error to define a value variable but not use it elsewhere in the query. 1350 1351 Value variables are used by extracting the values with `val(var-name)`, or by extracting the UIDs with `uid(var-name)`. 1352 1353 [Facet]({{< relref "#facets-edge-attributes">}}) values can be stored in value variables. 1354 1355 Query Example: The number of movie roles played by the actors of the 80's classic "The Princess Bride". Query variable `pbActors` matches the UIDs of all actors from the movie. Value variable `roles` is thus a map from actor UID to number of roles. Value variable `roles` can be used in the `totalRoles` query block because that query block also matches the `pbActors` UIDs, so the actor to number of roles map is available. 1356 1357 {{< runnable >}} 1358 { 1359 var(func:allofterms(name@en, "The Princess Bride")) { 1360 starring { 1361 pbActors as performance.actor { 1362 roles as count(actor.film) 1363 } 1364 } 1365 } 1366 totalRoles(func: uid(pbActors), orderasc: val(roles)) { 1367 name@en 1368 numRoles : val(roles) 1369 } 1370 } 1371 {{< /runnable >}} 1372 1373 1374 Value variables can be used in place of UID variables by extracting the UID list from the map. 1375 1376 Query Example: The same query as the previous example, but using value variable `roles` for matching UIDs in the `totalRoles` query block. 1377 1378 {{< runnable >}} 1379 { 1380 var(func:allofterms(name@en, "The Princess Bride")) { 1381 starring { 1382 performance.actor { 1383 roles as count(actor.film) 1384 } 1385 } 1386 } 1387 totalRoles(func: uid(roles), orderasc: val(roles)) { 1388 name@en 1389 numRoles : val(roles) 1390 } 1391 } 1392 {{< /runnable >}} 1393 1394 1395 ### Variable Propagation 1396 1397 Like query variables, value variables can be used in other query blocks and in blocks nested within the defining block. When used in a block nested within the block that defines the variable, the value is computed as a sum of the variable for parent nodes along all paths to the point of use. This is called variable propagation. 1398 1399 For example: 1400 ``` 1401 { 1402 q(func: uid(0x01)) { 1403 myscore as math(1) # A 1404 friends { # B 1405 friends { # C 1406 ...myscore... 1407 } 1408 } 1409 } 1410 } 1411 ``` 1412 At line A, a value variable `myscore` is defined as mapping node with UID `0x01` to value 1. At B, the value for each friend is still 1: there is only one path to each friend. Traversing the friend edge twice reaches the friends of friends. The variable `myscore` gets propagated such that each friend of friend will receive the sum of its parents values: if a friend of a friend is reachable from only one friend, the value is still 1, if they are reachable from two friends, the value is two and so on. That is, the value of `myscore` for each friend of friends inside the block marked C will be the number of paths to them. 1413 1414 **The value that a node receives for a propagated variable is the sum of the values of all its parent nodes.** 1415 1416 This propagation is useful, for example, in normalizing a sum across users, finding the number of paths between nodes and accumulating a sum through a graph. 1417 1418 1419 1420 Query Example: For each Harry Potter movie, the number of roles played by actor Warwick Davis. 1421 {{< runnable >}} 1422 { 1423 num_roles(func: eq(name@en, "Warwick Davis")) @cascade @normalize { 1424 1425 paths as math(1) # records number of paths to each character 1426 1427 actor : name@en 1428 1429 actor.film { 1430 performance.film @filter(allofterms(name@en, "Harry Potter")) { 1431 film_name : name@en 1432 characters : math(paths) # how many paths (i.e. characters) reach this film 1433 } 1434 } 1435 } 1436 } 1437 {{< /runnable >}} 1438 1439 1440 Query Example: Each actor who has been in a Peter Jackson movie and the fraction of Peter Jackson movies they have appeared in. 1441 {{< runnable >}} 1442 { 1443 movie_fraction(func:eq(name@en, "Peter Jackson")) @normalize { 1444 1445 paths as math(1) 1446 total_films : num_films as count(director.film) 1447 director : name@en 1448 1449 director.film { 1450 starring { 1451 performance.actor { 1452 fraction : math(paths / (num_films/paths)) 1453 actor : name@en 1454 } 1455 } 1456 } 1457 } 1458 } 1459 {{< /runnable >}} 1460 1461 More examples can be found in two Dgraph blog posts about using variable propagation for recommendation engines ([post 1](https://open.dgraph.io/post/recommendation/), [post 2](https://open.dgraph.io/post/recommendation2/)). 1462 1463 ## Aggregation 1464 1465 Syntax Example: `AG(val(varName))` 1466 1467 For `AG` replaced with 1468 1469 * `min` : select the minimum value in the value variable `varName` 1470 * `max` : select the maximum value 1471 * `sum` : sum all values in value variable `varName` 1472 * `avg` : calculate the average of values in `varName` 1473 1474 Schema Types: 1475 1476 | Aggregation | Schema Types | 1477 |:-----------|:--------------| 1478 | `min` / `max` | `int`, `float`, `string`, `dateTime`, `default` | 1479 | `sum` / `avg` | `int`, `float` | 1480 1481 Aggregation can only be applied to [value variables]({{< relref "#value-variables">}}). An index is not required (the values have already been found and stored in the value variable mapping). 1482 1483 An aggregation is applied at the query block enclosing the variable definition. As opposed to query variables and value variables, which are global, aggregation is computed locally. For example: 1484 ``` 1485 A as predicateA { 1486 ... 1487 B as predicateB { 1488 x as ...some value... 1489 } 1490 min(val(x)) 1491 } 1492 ``` 1493 Here, `A` and `B` are the lists of all UIDs that match these blocks. Value variable `x` is a mapping from UIDs in `B` to values. The aggregation `min(val(x))`, however, is computed for each UID in `A`. That is, it has a semantics of: for each UID in `A`, take the slice of `x` that corresponds to `A`'s outgoing `predicateB` edges and compute the aggregation for those values. 1494 1495 Aggregations can themselves be assigned to value variables, making a UID to aggregation map. 1496 1497 1498 ### Min 1499 1500 #### Usage at Root 1501 1502 Query Example: Get the min initial release date for any Harry Potter movie. 1503 1504 The release date is assigned to a variable, then it is aggregated and fetched in an empty block. 1505 {{< runnable >}} 1506 { 1507 var(func: allofterms(name@en, "Harry Potter")) { 1508 d as initial_release_date 1509 } 1510 me() { 1511 min(val(d)) 1512 } 1513 } 1514 {{< /runnable >}} 1515 1516 #### Usage at other levels 1517 1518 Query Example: Directors called Steven and the date of release of their first movie, in ascending order of first movie. 1519 1520 {{< runnable >}} 1521 { 1522 stevens as var(func: allofterms(name@en, "steven")) { 1523 director.film { 1524 ird as initial_release_date 1525 # ird is a value variable mapping a film UID to its release date 1526 } 1527 minIRD as min(val(ird)) 1528 # minIRD is a value variable mapping a director UID to their first release date 1529 } 1530 1531 byIRD(func: uid(stevens), orderasc: val(minIRD)) { 1532 name@en 1533 firstRelease: val(minIRD) 1534 } 1535 } 1536 {{< /runnable >}} 1537 1538 ### Max 1539 1540 #### Usage at Root 1541 1542 Query Example: Get the max initial release date for any Harry Potter movie. 1543 1544 The release date is assigned to a variable, then it is aggregated and fetched in an empty block. 1545 {{< runnable >}} 1546 { 1547 var(func: allofterms(name@en, "Harry Potter")) { 1548 d as initial_release_date 1549 } 1550 me() { 1551 max(val(d)) 1552 } 1553 } 1554 {{< /runnable >}} 1555 1556 #### Usage at other levels 1557 1558 Query Example: Quentin Tarantino's movies and date of release of the most recent movie. 1559 1560 {{< runnable >}} 1561 { 1562 director(func: allofterms(name@en, "Quentin Tarantino")) { 1563 director.film { 1564 name@en 1565 x as initial_release_date 1566 } 1567 max(val(x)) 1568 } 1569 } 1570 {{< /runnable >}} 1571 1572 ### Sum and Avg 1573 1574 #### Usage at Root 1575 1576 Query Example: Get the sum and average of number of count of movies directed by people who have 1577 Steven or Tom in their name. 1578 1579 {{< runnable >}} 1580 { 1581 var(func: anyofterms(name@en, "Steven Tom")) { 1582 a as count(director.film) 1583 } 1584 1585 me() { 1586 avg(val(a)) 1587 sum(val(a)) 1588 } 1589 } 1590 {{< /runnable >}} 1591 1592 #### Usage at other levels 1593 1594 Query Example: Steven Spielberg's movies, with the number of recorded genres per movie, and the total number of genres and average genres per movie. 1595 1596 {{< runnable >}} 1597 { 1598 director(func: eq(name@en, "Steven Spielberg")) { 1599 name@en 1600 director.film { 1601 name@en 1602 numGenres : g as count(genre) 1603 } 1604 totalGenres : sum(val(g)) 1605 genresPerMovie : avg(val(g)) 1606 } 1607 } 1608 {{< /runnable >}} 1609 1610 1611 ### Aggregating Aggregates 1612 1613 Aggregations can be assigned to value variables, and so these variables can in turn be aggregated. 1614 1615 Query Example: For each actor in a Peter Jackson film, find the number of roles played in any movie. Sum these to find the total number of roles ever played by all actors in the movie. Then sum the lot to find the total number of roles ever played by actors who have appeared in Peter Jackson movies. Note that this demonstrates how to aggregate aggregates; the answer in this case isn't quite precise though, because actors that have appeared in multiple Peter Jackson movies are counted more than once. 1616 1617 {{< runnable >}} 1618 { 1619 PJ as var(func:allofterms(name@en, "Peter Jackson")) { 1620 director.film { 1621 starring { # starring an actor 1622 performance.actor { 1623 movies as count(actor.film) 1624 # number of roles for this actor 1625 } 1626 perf_total as sum(val(movies)) 1627 } 1628 movie_total as sum(val(perf_total)) 1629 # total roles for all actors in this movie 1630 } 1631 gt as sum(val(movie_total)) 1632 } 1633 1634 PJmovies(func: uid(PJ)) { 1635 name@en 1636 director.film (orderdesc: val(movie_total), first: 5) { 1637 name@en 1638 totalRoles : val(movie_total) 1639 } 1640 grandTotal : val(gt) 1641 } 1642 } 1643 {{< /runnable >}} 1644 1645 1646 ## Math on value variables 1647 1648 Value variables can be combined using mathematical functions. For example, this could be used to associate a score which is then used to order or perform other operations, such as might be used in building news feeds, simple recommendation systems, and so on. 1649 1650 Math statements must be enclosed within `math( <exp> )` and must be stored to a value variable. 1651 1652 The supported operators are as follows: 1653 1654 | Operators | Types accepted | What it does | 1655 | :------------: | :--------------: | :------------------------: | 1656 | `+` `-` `*` `/` `%` | `int`, `float` | performs the corresponding operation | 1657 | `min` `max` | All types except `geo`, `bool` (binary functions) | selects the min/max value among the two | 1658 | `<` `>` `<=` `>=` `==` `!=` | All types except `geo`, `bool` | Returns true or false based on the values | 1659 | `floor` `ceil` `ln` `exp` `sqrt` | `int`, `float` (unary function) | performs the corresponding operation | 1660 | `since` | `dateTime` | Returns the number of seconds in float from the time specified | 1661 | `pow(a, b)` | `int`, `float` | Returns `a to the power b` | 1662 | `logbase(a,b)` | `int`, `float` | Returns `log(a)` to the base `b` | 1663 | `cond(a, b, c)` | first operand must be a boolean | selects `b` if `a` is true else `c` | 1664 1665 1666 Query Example: Form a score for each of Steven Spielberg's movies as the sum of number of actors, number of genres and number of countries. List the top five such movies in order of decreasing score. 1667 1668 {{< runnable >}} 1669 { 1670 var(func:allofterms(name@en, "steven spielberg")) { 1671 films as director.film { 1672 p as count(starring) 1673 q as count(genre) 1674 r as count(country) 1675 score as math(p + q + r) 1676 } 1677 } 1678 1679 TopMovies(func: uid(films), orderdesc: val(score), first: 5){ 1680 name@en 1681 val(score) 1682 } 1683 } 1684 {{< /runnable >}} 1685 1686 Value variables and aggregations of them can be used in filters. 1687 1688 Query Example: Calculate a score for each Steven Spielberg movie with a condition on release date to penalize movies that are more than 10 years old, filtering on the resulting score. 1689 1690 {{< runnable >}} 1691 { 1692 var(func:allofterms(name@en, "steven spielberg")) { 1693 films as director.film { 1694 p as count(starring) 1695 q as count(genre) 1696 date as initial_release_date 1697 years as math(since(date)/(365*24*60*60)) 1698 score as math(cond(years > 10, 0, ln(p)+q-ln(years))) 1699 } 1700 } 1701 1702 TopMovies(func: uid(films), orderdesc: val(score)) @filter(gt(val(score), 2)){ 1703 name@en 1704 val(score) 1705 val(date) 1706 } 1707 } 1708 {{< /runnable >}} 1709 1710 1711 Values calculated with math operations are stored to value variables and so can be aggregated. 1712 1713 Query Example: Compute a score for each Steven Spielberg movie and then aggregate the score. 1714 1715 {{< runnable >}} 1716 { 1717 steven as var(func:eq(name@en, "Steven Spielberg")) @filter(has(director.film)) { 1718 director.film { 1719 p as count(starring) 1720 q as count(genre) 1721 r as count(country) 1722 score as math(p + q + r) 1723 } 1724 directorScore as sum(val(score)) 1725 } 1726 1727 score(func: uid(steven)){ 1728 name@en 1729 val(directorScore) 1730 } 1731 } 1732 {{< /runnable >}} 1733 1734 1735 ## GroupBy 1736 1737 Syntax Examples: 1738 1739 * `q(func: ...) @groupby(predicate) { min(...) }` 1740 * `predicate @groupby(pred) { count(uid) }`` 1741 1742 1743 A `groupby` query aggregates query results given a set of properties on which to group elements. For example, a query containing the block `friend @groupby(age) { count(uid) }`, finds all nodes reachable along the friend edge, partitions these into groups based on age, then counts how many nodes are in each group. The returned result is the grouped edges and the aggregations. 1744 1745 Inside a `groupby` block, only aggregations are allowed and `count` may only be applied to `uid`. 1746 1747 If the `groupby` is applied to a `uid` predicate, the resulting aggregations can be saved in a variable (mapping the grouped UIDs to aggregate values) and used elsewhere in the query to extract information other than the grouped or aggregated edges. 1748 1749 Query Example: For Steven Spielberg movies, count the number of movies in each genre and for each of those genres return the genre name and the count. The name can't be extracted in the `groupby` because it is not an aggregate, but `uid(a)` can be used to extract the UIDs from the UID to value map and thus organize the `byGenre` query by genre UID. 1750 1751 1752 {{< runnable >}} 1753 { 1754 var(func:allofterms(name@en, "steven spielberg")) { 1755 director.film @groupby(genre) { 1756 a as count(uid) 1757 # a is a genre UID to count value variable 1758 } 1759 } 1760 1761 byGenre(func: uid(a), orderdesc: val(a)) { 1762 name@en 1763 total_movies : val(a) 1764 } 1765 } 1766 {{< /runnable >}} 1767 1768 Query Example: Actors from Tim Burton movies and how many roles they have played in Tim Burton movies. 1769 {{< runnable >}} 1770 { 1771 var(func:allofterms(name@en, "Tim Burton")) { 1772 director.film { 1773 starring @groupby(performance.actor) { 1774 a as count(uid) 1775 # a is an actor UID to count value variable 1776 } 1777 } 1778 } 1779 1780 byActor(func: uid(a), orderdesc: val(a)) { 1781 name@en 1782 val(a) 1783 } 1784 } 1785 {{< /runnable >}} 1786 1787 1788 1789 ## Expand Predicates 1790 1791 The `expand()` function can be used to expand the predicates out of a node. To 1792 use `expand()`, the [type system]({{< relref "#type-system" >}}) is required. 1793 Refer to the section on the type system to check how to set the types 1794 nodes. The rest of this section assumes familiarity with that section. 1795 1796 There are four ways to use the `expand` function. 1797 1798 * Predicates can be stored in a variable and passed to `expand()` to expand all 1799 the predicates in the variable. 1800 * If `_all_` is passed as an argument to `expand()`, all the predicates for each 1801 node at that level are retrieved. More levels can be specified in a nested 1802 fashion under `expand()`. 1803 * If `_forward_` is passed as an argument to `expand()`, all predicates for each 1804 node at that level (minus any reverse predicates) are retrieved. 1805 * If `_reverse_` is passed as an argument to `expand()`, only the reverse 1806 predicates at each node in that level are retrieved. 1807 1808 The last three keywords require that the nodes have types. Dgraph will look 1809 for all the types that have been assigned to a node, 1810 query the types to check which attributes they have, and use those to compute 1811 the list of predicates to expand. 1812 1813 For example, consider a node that has types `Animal` and `Pet`, which have 1814 the following definitions: 1815 1816 ``` 1817 type Animal { 1818 name: string 1819 species: uid 1820 dob: datetime 1821 } 1822 1823 type Pet { 1824 owner: uid 1825 veterinarian: uid 1826 } 1827 ``` 1828 1829 When `expand(_all_)` is called on this node, Dgraph will first check which types 1830 the node has (`Animal` and `Pet`). Then it will get the definitions of 1831 `Animal` and `Pet` and build a list of predicates. 1832 Finally, it will query the schema to check if any of those predicates have a 1833 reverse node. If, for example, there's a reverse node in the `owner` predicate, 1834 the final list of predicates to expand will be: 1835 1836 ``` 1837 name 1838 species 1839 dob 1840 owner 1841 ~owner 1842 veterinarian 1843 ``` 1844 1845 For `string` predicates, `expand` only returns values not tagged with a language 1846 (see [language preference]({{< relref "#language-support" >}})). So it's often 1847 required to add `name@fr` or `name@.` as well as expand to a query. 1848 1849 ## Cascade Directive 1850 1851 With the `@cascade` directive, nodes that don't have all predicates specified in the query are removed. This can be useful in cases where some filter was applied or if nodes might not have all listed predicates. 1852 1853 1854 Query Example: Harry Potter movies, with each actor and characters played. With `@cascade`, any character not played by an actor called Warwick is removed, as is any Harry Potter movie without any actors called Warwick. Without `@cascade`, every character is returned, but only those played by actors called Warwick also have the actor name. 1855 {{< runnable >}} 1856 { 1857 HP(func: allofterms(name@en, "Harry Potter")) @cascade { 1858 name@en 1859 starring{ 1860 performance.character { 1861 name@en 1862 } 1863 performance.actor @filter(allofterms(name@en, "Warwick")){ 1864 name@en 1865 } 1866 } 1867 } 1868 } 1869 {{< /runnable >}} 1870 1871 ## Normalize directive 1872 1873 With the `@normalize` directive, only aliased predicates are returned and the result is flattened to remove nesting. 1874 1875 Query Example: Film name, country and first two actors (by UID order) of every Steven Spielberg movie, without `initial_release_date` because no alias is given and flattened by `@normalize` 1876 {{< runnable >}} 1877 { 1878 director(func:allofterms(name@en, "steven spielberg")) @normalize { 1879 director: name@en 1880 director.film { 1881 film: name@en 1882 initial_release_date 1883 starring(first: 2) { 1884 performance.actor { 1885 actor: name@en 1886 } 1887 performance.character { 1888 character: name@en 1889 } 1890 } 1891 country { 1892 country: name@en 1893 } 1894 } 1895 } 1896 } 1897 {{< /runnable >}} 1898 1899 1900 ## Ignorereflex directive 1901 1902 The `@ignorereflex` directive forces the removal of child nodes that are reachable from themselves as a parent, through any path in the query result 1903 1904 Query Example: All the co-actors of Rutger Hauer. Without `@ignorereflex`, the result would also include Rutger Hauer for every movie. 1905 1906 {{< runnable >}} 1907 { 1908 coactors(func: eq(name@en, "Rutger Hauer")) @ignorereflex { 1909 actor.film { 1910 performance.film { 1911 starring { 1912 performance.actor { 1913 name@en 1914 } 1915 } 1916 } 1917 } 1918 } 1919 } 1920 {{< /runnable >}} 1921 1922 ## Debug 1923 1924 For the purposes of debugging, you can attach a query parameter `debug=true` to a query. Attaching this parameter lets you retrieve the `uid` attribute for all the entities along with the `server_latency` and `start_ts` information under the `extensions` key of the response. 1925 1926 - `parsing_ns`: Latency in nanoseconds to parse the query. 1927 - `processing_ns`: Latency in nanoseconds to process the query. 1928 - `encoding_ns`: Latency in nanoseconds to encode the JSON response. 1929 - `start_ts`: The logical start timestamp of the transaction. 1930 1931 Query with debug as a query parameter 1932 ```sh 1933 curl -H "Content-Type: application/graphql+-" http://localhost:8080/query?debug=true -XPOST -d $'{ 1934 tbl(func: allofterms(name@en, "The Big Lebowski")) { 1935 name@en 1936 } 1937 }' | python -m json.tool | less 1938 ``` 1939 1940 Returns `uid` and `server_latency` 1941 ``` 1942 { 1943 "data": { 1944 "tbl": [ 1945 { 1946 "uid": "0x41434", 1947 "name@en": "The Big Lebowski" 1948 }, 1949 { 1950 "uid": "0x145834", 1951 "name@en": "The Big Lebowski 2" 1952 }, 1953 { 1954 "uid": "0x2c8a40", 1955 "name@en": "Jeffrey \"The Big\" Lebowski" 1956 }, 1957 { 1958 "uid": "0x3454c4", 1959 "name@en": "The Big Lebowski" 1960 } 1961 ], 1962 "extensions": { 1963 "server_latency": { 1964 "parsing_ns": 18559, 1965 "processing_ns": 802990982, 1966 "encoding_ns": 1177565 1967 }, 1968 "txn": { 1969 "start_ts": 40010 1970 } 1971 } 1972 } 1973 } 1974 ``` 1975 1976 1977 ## Schema 1978 1979 For each predicate, the schema specifies the target's type. If a predicate `p` has type `T`, then for all subject-predicate-object triples `s p o` the object `o` is of schema type `T`. 1980 1981 * On mutations, scalar types are checked and an error thrown if the value cannot be converted to the schema type. 1982 1983 * On query, value results are returned according to the schema type of the predicate. 1984 1985 If a schema type isn't specified before a mutation adds triples for a predicate, then the type is inferred from the first mutation. This type is either: 1986 1987 * type `uid`, if the first mutation for the predicate has nodes for the subject and object, or 1988 1989 * derived from the [RDF type]({{< relref "#rdf-types" >}}), if the object is a literal and an RDF type is present in the first mutation, or 1990 1991 * `default` type, otherwise. 1992 1993 1994 ### Schema Types 1995 1996 Dgraph supports scalar types and the UID type. 1997 1998 #### Scalar Types 1999 2000 For all triples with a predicate of scalar types the object is a literal. 2001 2002 | Dgraph Type | Go type | 2003 | ------------|:--------| 2004 | `default` | string | 2005 | `int` | int64 | 2006 | `float` | float | 2007 | `string` | string | 2008 | `bool` | bool | 2009 | `dateTime` | time.Time (RFC3339 format [Optional timezone] eg: 2006-01-02T15:04:05.999999999+10:00 or 2006-01-02T15:04:05.999999999) | 2010 | `geo` | [go-geom](https://github.com/twpayne/go-geom) | 2011 | `password` | string (encrypted) | 2012 2013 2014 {{% notice "note" %}}Dgraph supports date and time formats for `dateTime` scalar type only if they 2015 are RFC 3339 compatible which is different from ISO 8601(as defined in the RDF spec). You should 2016 convert your values to RFC 3339 format before sending them to Dgraph.{{% /notice %}} 2017 2018 #### UID Type 2019 2020 The `uid` type denotes a node-node edge; internally each node is represented as a `uint64` id. 2021 2022 | Dgraph Type | Go type | 2023 | ------------|:--------| 2024 | `uid` | uint64 | 2025 2026 2027 ### Adding or Modifying Schema 2028 2029 Schema mutations add or modify schema. 2030 2031 Multiple scalar values can also be added for a `S P` by specifying the schema to be of 2032 list type. Occupations in the example below can store a list of strings for each `S P`. 2033 2034 An index is specified with `@index`, with arguments to specify the tokenizer. When specifying an 2035 index for a predicate it is mandatory to specify the type of the index. For example: 2036 2037 ``` 2038 name: string @index(exact, fulltext) @count . 2039 multiname: string @lang . 2040 age: int @index(int) . 2041 friend: [uid] @count . 2042 dob: dateTime . 2043 location: geo @index(geo) . 2044 occupations: [string] @index(term) . 2045 ``` 2046 2047 If no data has been stored for the predicates, a schema mutation sets up an empty schema ready to receive triples. 2048 2049 If data is already stored before the mutation, existing values are not checked to conform to the new schema. On query, Dgraph tries to convert existing values to the new schema types, ignoring any that fail conversion. 2050 2051 If data exists and new indices are specified in a schema mutation, any index not in the updated list is dropped and a new index is created for every new tokenizer specified. 2052 2053 Reverse edges are also computed if specified by a schema mutation. 2054 2055 2056 ### Predicate name rules 2057 2058 Any alphanumeric combination of a predicate name is permitted. 2059 Dgraph also supports [Internationalized Resource Identifiers](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) (IRIs). 2060 You can read more in [Predicates i18n](#predicates-i18n). 2061 2062 #### Allowed special characters 2063 2064 Single special characters are not accepted, which includes the special characters from IRIs. 2065 They have to be prefixed/suffixed with alphanumeric characters. 2066 2067 ``` 2068 ][&*()_-+=!#$% 2069 ``` 2070 2071 *Note: You are not restricted to use @ suffix, but the suffix character gets ignored.* 2072 2073 #### Forbidden special characters 2074 2075 The special characters below are not accepted. 2076 2077 ``` 2078 ^}|{`\~ 2079 ``` 2080 2081 2082 ### Predicates i18n 2083 2084 If your predicate is a URI or has language-specific characters, then enclose 2085 it with angle brackets `<>` when executing the schema mutation. 2086 2087 {{% notice "note" %}}Dgraph supports [Internationalized Resource Identifiers](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) (IRIs) for predicate names and values.{{% /notice %}} 2088 2089 Schema syntax: 2090 ``` 2091 <职业>: string @index(exact) . 2092 <年龄>: int @index(int) . 2093 <地点>: geo @index(geo) . 2094 <公司>: string . 2095 ``` 2096 2097 This syntax allows for internationalized predicate names, but full-text indexing still defaults to English. 2098 To use the right tokenizer for your language, you need to use the `@lang` directive and enter values using your 2099 language tag. 2100 2101 Schema: 2102 ``` 2103 <公司>: string @index(fulltext) @lang . 2104 ``` 2105 Mutation: 2106 ``` 2107 { 2108 set { 2109 _:a <公司> "Dgraph Labs Inc"@en . 2110 _:b <公司> "夏新科技有限责任公司"@zh . 2111 } 2112 } 2113 ``` 2114 Query: 2115 ``` 2116 { 2117 q(func: alloftext(<公司>@zh, "夏新科技有限责任公司")) { 2118 uid 2119 <公司>@. 2120 } 2121 } 2122 ``` 2123 2124 2125 ### Upsert directive 2126 2127 To use [upsert operations]({{< relref "howto/index.md#upserts">}}) on a 2128 predicate, specify the `@upsert` directive in the schema. When committing 2129 transactions involving predicates with the `@upsert` directive, Dgraph checks 2130 index keys for conflicts, helping to enforce uniqueness constraints when running 2131 concurrent upserts. 2132 2133 This is how you specify the upsert directive for a predicate. 2134 ``` 2135 email: string @index(exact) @upsert . 2136 ``` 2137 2138 ### RDF Types 2139 2140 Dgraph supports a number of [RDF types in mutations]({{< relref "mutations/index.md#language-and-rdf-types" >}}). 2141 2142 As well as implying a schema type for a [first mutation]({{< relref "#schema" >}}), an RDF type can override a schema type for storage. 2143 2144 If a predicate has a schema type and a mutation has an RDF type with a different underlying Dgraph type, the convertibility to schema type is checked, and an error is thrown if they are incompatible, but the value is stored in the RDF type's corresponding Dgraph type. Query results are always returned in schema type. 2145 2146 For example, if no schema is set for the `age` predicate. Given the mutation 2147 ``` 2148 { 2149 set { 2150 _:a <age> "15"^^<xs:int> . 2151 _:b <age> "13" . 2152 _:c <age> "14"^^<xs:string> . 2153 _:d <age> "14.5"^^<xs:string> . 2154 _:e <age> "14.5" . 2155 } 2156 } 2157 ``` 2158 Dgraph: 2159 2160 * sets the schema type to `int`, as implied by the first triple, 2161 * converts `"13"` to `int` on storage, 2162 * checks `"14"` can be converted to `int`, but stores as `string`, 2163 * throws an error for the remaining two triples, because `"14.5"` can't be converted to `int`. 2164 2165 ### Extended Types 2166 2167 The following types are also accepted. 2168 2169 #### Password type 2170 2171 A password for an entity is set with setting the schema for the attribute to be of type `password`. Passwords cannot be queried directly, only checked for a match using the `checkpwd` function. 2172 The passwords are encrypted using [bcrypt](https://en.wikipedia.org/wiki/Bcrypt). 2173 2174 For example: to set a password, first set schema, then the password: 2175 ``` 2176 pass: password . 2177 ``` 2178 2179 ``` 2180 { 2181 set { 2182 <0x123> <name> "Password Example" . 2183 <0x123> <pass> "ThePassword" . 2184 } 2185 } 2186 ``` 2187 2188 to check a password: 2189 ``` 2190 { 2191 check(func: uid(0x123)) { 2192 name 2193 checkpwd(pass, "ThePassword") 2194 } 2195 } 2196 ``` 2197 2198 output: 2199 ``` 2200 { 2201 "data": { 2202 "check": [ 2203 { 2204 "name": "Password Example", 2205 "checkpwd(pass)": true 2206 } 2207 ] 2208 } 2209 } 2210 ``` 2211 2212 You can also use alias with password type. 2213 2214 ``` 2215 { 2216 check(func: uid(0x123)) { 2217 name 2218 secret: checkpwd(pass, "ThePassword") 2219 } 2220 } 2221 ``` 2222 2223 output: 2224 ``` 2225 { 2226 "data": { 2227 "check": [ 2228 { 2229 "name": "Password Example", 2230 "secret": true 2231 } 2232 ] 2233 } 2234 } 2235 ``` 2236 2237 ### Indexing 2238 2239 {{% notice "note" %}}Filtering on a predicate by applying a [function]({{< relref "#functions" >}}) requires an index.{{% /notice %}} 2240 2241 When filtering by applying a function, Dgraph uses the index to make the search through a potentially large dataset efficient. 2242 2243 All scalar types can be indexed. 2244 2245 Types `int`, `float`, `bool` and `geo` have only a default index each: with tokenizers named `int`, `float`, `bool` and `geo`. 2246 2247 Types `string` and `dateTime` have a number of indices. 2248 2249 #### String Indices 2250 The indices available for strings are as follows. 2251 2252 | Dgraph function | Required index / tokenizer | Notes | 2253 | :----------------------- | :------------ | :--- | 2254 | `eq` | `hash`, `exact`, `term`, or `fulltext` | The most performant index for `eq` is `hash`. Only use `term` or `fulltext` if you also require term or full-text search. If you're already using `term`, there is no need to use `hash` or `exact` as well. | 2255 | `le`, `ge`, `lt`, `gt` | `exact` | Allows faster sorting. | 2256 | `allofterms`, `anyofterms` | `term` | Allows searching by a term in a sentence. | 2257 | `alloftext`, `anyoftext` | `fulltext` | Matching with language specific stemming and stopwords. | 2258 | `regexp` | `trigram` | Regular expression matching. Can also be used for equality checking. | 2259 2260 {{% notice "warning" %}} 2261 Incorrect index choice can impose performance penalties and an increased 2262 transaction conflict rate. Use only the minimum number of and simplest indexes 2263 that your application needs. 2264 {{% /notice %}} 2265 2266 2267 #### DateTime Indices 2268 2269 The indices available for `dateTime` are as follows. 2270 2271 | Index name / Tokenizer | Part of date indexed | 2272 | :----------- | :------------------------------------------------------------------ | 2273 | `year` | index on year (default) | 2274 | `month` | index on year and month | 2275 | `day` | index on year, month and day | 2276 | `hour` | index on year, month, day and hour | 2277 2278 The choices of `dateTime` index allow selecting the precision of the index. Applications, such as the movies examples in these docs, that require searching over dates but have relatively few nodes per year may prefer the `year` tokenizer; applications that are dependent on fine grained date searches, such as real-time sensor readings, may prefer the `hour` index. 2279 2280 2281 All the `dateTime` indices are sortable. 2282 2283 2284 #### Sortable Indices 2285 2286 Not all the indices establish a total order among the values that they index. Sortable indices allow inequality functions and sorting. 2287 2288 * Indexes `int` and `float` are sortable. 2289 * `string` index `exact` is sortable. 2290 * All `dateTime` indices are sortable. 2291 2292 For example, given an edge `name` of `string` type, to sort by `name` or perform inequality filtering on names, the `exact` index must have been specified. In which case a schema query would return at least the following tokenizers. 2293 2294 ``` 2295 { 2296 "predicate": "name", 2297 "type": "string", 2298 "index": true, 2299 "tokenizer": [ 2300 "exact" 2301 ] 2302 } 2303 ``` 2304 2305 #### Count index 2306 2307 For predicates with the `@count` Dgraph indexes the number of edges out of each node. This enables fast queries of the form: 2308 ``` 2309 { 2310 q(func: gt(count(pred), threshold)) { 2311 ... 2312 } 2313 } 2314 ``` 2315 2316 ### List Type 2317 2318 Predicate with scalar types can also store a list of values if specified in the schema. The scalar 2319 type needs to be enclosed within `[]` to indicate that its a list type. These lists are like an 2320 unordered set. 2321 2322 ``` 2323 occupations: [string] . 2324 score: [int] . 2325 ``` 2326 2327 * A set operation adds to the list of values. The order of the stored values is non-deterministic. 2328 * A delete operation deletes the value from the list. 2329 * Querying for these predicates would return the list in an array. 2330 * Indexes can be applied on predicates which have a list type and you can use [Functions]({{<ref 2331 "#functions">}}) on them. 2332 * Sorting is not allowed using these predicates. 2333 2334 2335 ### Reverse Edges 2336 2337 A graph edge is unidirectional. For node-node edges, sometimes modeling requires reverse edges. If only some subject-predicate-object triples have a reverse, these must be manually added. But if a predicate always has a reverse, Dgraph computes the reverse edges if `@reverse` is specified in the schema. 2338 2339 The reverse edge of `anEdge` is `~anEdge`. 2340 2341 For existing data, Dgraph computes all reverse edges. For data added after the schema mutation, Dgraph computes and stores the reverse edge for each added triple. 2342 2343 ### Querying Schema 2344 2345 A schema query queries for the whole schema: 2346 2347 ``` 2348 schema {} 2349 ``` 2350 2351 {{% notice "note" %}} Unlike regular queries, the schema query is not surrounded 2352 by curly braces. Also, schema queries and regular queries cannot be combined. 2353 {{% /notice %}} 2354 2355 You can query for particular schema fields in the query body. 2356 2357 ``` 2358 schema { 2359 type 2360 index 2361 reverse 2362 tokenizer 2363 list 2364 count 2365 upsert 2366 lang 2367 } 2368 ``` 2369 2370 You can also query for particular predicates: 2371 2372 ``` 2373 schema(pred: [name, friend]) { 2374 type 2375 index 2376 reverse 2377 tokenizer 2378 list 2379 count 2380 upsert 2381 lang 2382 } 2383 ``` 2384 2385 Types can also be queried. Below are some example queries. 2386 2387 ``` 2388 schema(type: Movie) {} 2389 schema(type: [Person, Animal]) {} 2390 ``` 2391 2392 Note that type queries do not contain anything between the curly braces. The 2393 output will be the entire definition of the requested types. 2394 2395 ## Type System 2396 2397 Dgraph supports a type system that can be used to categorize nodes and query 2398 them based on their type. The type system is also used during expand queries. 2399 2400 ### Type definition 2401 2402 Types are defined using a GraphQL-like syntax. For example: 2403 2404 ``` 2405 type Student { 2406 name: string 2407 dob: datetime 2408 home_address: string 2409 year: int 2410 friends: [uid] 2411 } 2412 ``` 2413 2414 Types are declared along with the schema using the Alter endpoint. In order to 2415 properly support the above type, a predicate for each of the attributes 2416 in the type is also needed, such as: 2417 2418 ``` 2419 name: string @index(term) . 2420 dob: datetime . 2421 home_address: string . 2422 year: int . 2423 friends: [uid] . 2424 ``` 2425 2426 If a `uid` predicate contains a reverse index, both the predicate and the 2427 reverse predicate are part of any type definition which contain that predicate. 2428 Expand queries will follow that convention. 2429 2430 Edges can be used in multiple types: for example, `name` might be used for both 2431 a person and a pet. Sometimes, however, it's required to use a different 2432 predicate for each type to represent a similar concept. For example, if student 2433 names and book names required different indexes, then the predicates must be 2434 different. 2435 2436 ``` 2437 type Student { 2438 student_name: string 2439 } 2440 2441 type Textbook { 2442 textbook_name: string 2443 } 2444 2445 student_name: string @index(exact) . 2446 textbook_name: string @lang @index(fulltext) . 2447 ``` 2448 2449 Types also support lists like `friends: [uid]` or `tags: [string]`. 2450 2451 Altering the schema for a type that already exists, overwrites the existing 2452 definition. 2453 2454 ### Setting the type of a node 2455 2456 Scalar nodes cannot have types since they only have one attribute and its type 2457 is the type of the node. UID nodes can have a type. The type is set by setting 2458 the value of the `dgraph.type` predicate for that node. A node can have multiple 2459 types. Here's an example of how to set the types of a node: 2460 2461 ``` 2462 { 2463 set { 2464 _:a <name> "Garfield" . 2465 _:a <dgraph.type> "Pet" . 2466 _:a <dgraph.type> "Animal" . 2467 } 2468 } 2469 ``` 2470 2471 `dgraph.type` is a reserved predicate and cannot be removed or modified. 2472 2473 ### Using types during queries 2474 2475 Types can be used as a top level function in the query language. For example: 2476 2477 ``` 2478 { 2479 q(func: type(Animal)) { 2480 uid 2481 name 2482 } 2483 } 2484 ``` 2485 2486 This query will only return nodes whose type is set to `Animal`. 2487 2488 Types can also be used to filter results inside a query. For example: 2489 2490 ``` 2491 { 2492 q(func: has(parent)) { 2493 uid 2494 parent @filter(type(Person)) { 2495 uid 2496 name 2497 } 2498 } 2499 } 2500 ``` 2501 2502 This query will return the nodes that have a parent predicate and only the 2503 `parent`'s of type `Person`. 2504 2505 ### Deleting a type 2506 2507 Type definitions can be deleted using the Alter endpoint. All that is needed is 2508 to send an operation object with the field `DropOp` (or `drop_op` depending on 2509 the client) to the enum value `TYPE` and the field 'DropValue' (or `drop_value`) 2510 to the type that is meant to be deleted. 2511 2512 Below is an example deleting the type `Person` using the Go client: 2513 ```go 2514 err := c.Alter(context.Background(), &api.Operation{ 2515 DropOp: api.Operation_TYPE, 2516 DropValue: "Person"}) 2517 ``` 2518 2519 ### Expand queries and types 2520 2521 Queries using [expand]({{< relref "#expand-predicates" >}}) (i.e.: 2522 `expand(_all_)`, `expand(_reverse_)`, or `expand(_forward_)`) require that the 2523 nodes to expand have types. 2524 2525 ## Facets : Edge attributes 2526 2527 Dgraph supports facets --- **key value pairs on edges** --- as an extension to RDF triples. That is, facets add properties to edges, rather than to nodes. 2528 For example, a `friend` edge between two nodes may have a boolean property of `close` friendship. 2529 Facets can also be used as `weights` for edges. 2530 2531 Though you may find yourself leaning towards facets many times, they should not be misused. It wouldn't be correct modeling to give the `friend` edge a facet `date_of_birth`. That should be an edge for the friend. However, a facet like `start_of_friendship` might be appropriate. Facets are however not first class citizen in Dgraph like predicates. 2532 2533 Facet keys are strings and values can be `string`, `bool`, `int`, `float` and `dateTime`. 2534 For `int` and `float`, only 32-bit signed integers and 64-bit floats are accepted. 2535 2536 The following mutation is used throughout this section on facets. The mutation adds data for some peoples and, for example, records a `since` facet in `mobile` and `car` to record when Alice bought the car and started using the mobile number. 2537 2538 First we add some schema. 2539 ```sh 2540 curl localhost:8080/alter -XPOST -d $' 2541 name: string @index(exact, term) . 2542 rated: [uid] @reverse @count . 2543 ' | python -m json.tool | less 2544 2545 ``` 2546 2547 ```sh 2548 curl -H "Content-Type: application/rdf" localhost:8080/mutate?commitNow=true -XPOST -d $' 2549 { 2550 set { 2551 2552 # -- Facets on scalar predicates 2553 _:alice <name> "Alice" . 2554 _:alice <mobile> "040123456" (since=2006-01-02T15:04:05) . 2555 _:alice <car> "MA0123" (since=2006-02-02T13:01:09, first=true) . 2556 2557 _:bob <name> "Bob" . 2558 _:bob <car> "MA0134" (since=2006-02-02T13:01:09) . 2559 2560 _:charlie <name> "Charlie" . 2561 _:dave <name> "Dave" . 2562 2563 2564 # -- Facets on UID predicates 2565 _:alice <friend> _:bob (close=true, relative=false) . 2566 _:alice <friend> _:charlie (close=false, relative=true) . 2567 _:alice <friend> _:dave (close=true, relative=true) . 2568 2569 2570 # -- Facets for variable propagation 2571 _:movie1 <name> "Movie 1" . 2572 _:movie2 <name> "Movie 2" . 2573 _:movie3 <name> "Movie 3" . 2574 2575 _:alice <rated> _:movie1 (rating=3) . 2576 _:alice <rated> _:movie2 (rating=2) . 2577 _:alice <rated> _:movie3 (rating=5) . 2578 2579 _:bob <rated> _:movie1 (rating=5) . 2580 _:bob <rated> _:movie2 (rating=5) . 2581 _:bob <rated> _:movie3 (rating=5) . 2582 2583 _:charlie <rated> _:movie1 (rating=2) . 2584 _:charlie <rated> _:movie2 (rating=5) . 2585 _:charlie <rated> _:movie3 (rating=1) . 2586 } 2587 }' | python -m json.tool | less 2588 ``` 2589 2590 ### Facets on scalar predicates 2591 2592 2593 Querying `name`, `mobile` and `car` of Alice gives the same result as without facets. 2594 2595 {{< runnable >}} 2596 { 2597 data(func: eq(name, "Alice")) { 2598 name 2599 mobile 2600 car 2601 } 2602 } 2603 {{</ runnable >}} 2604 2605 2606 The syntax `@facets(facet-name)` is used to query facet data. For Alice the `since` facet for `mobile` and `car` are queried as follows. 2607 2608 {{< runnable >}} 2609 { 2610 data(func: eq(name, "Alice")) { 2611 name 2612 mobile @facets(since) 2613 car @facets(since) 2614 } 2615 } 2616 {{</ runnable >}} 2617 2618 2619 Facets are returned at the same level as the corresponding edge and have keys like edge|facet. 2620 2621 All facets on an edge are queried with `@facets`. 2622 2623 {{< runnable >}} 2624 { 2625 data(func: eq(name, "Alice")) { 2626 name 2627 mobile @facets 2628 car @facets 2629 } 2630 } 2631 {{</ runnable >}} 2632 2633 ### Facets i18n 2634 2635 Facets keys and values can use language-specific characters directly when mutating. But facet keys need to be enclosed in angle brackets `<>` when querying. This is similar to predicates. See [Predicates i18n](#predicates-i18n) for more info. 2636 2637 {{% notice "note" %}}Dgraph supports [Internationalized Resource Identifiers](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) (IRIs) for facet keys when querying.{{% /notice %}} 2638 2639 Example: 2640 ``` 2641 { 2642 set { 2643 _:person1 <name> "Daniel" (वंश="स्पेनी", ancestry="Español") . 2644 _:person2 <name> "Raj" (वंश="हिंदी", ancestry="हिंदी") . 2645 _:person3 <name> "Zhang Wei" (वंश="चीनी", ancestry="中文") . 2646 } 2647 } 2648 ``` 2649 Query, notice the `<>`'s: 2650 ``` 2651 { 2652 q(func: has(name)) { 2653 name @facets(<वंश>) 2654 } 2655 } 2656 ``` 2657 2658 ### Alias with facets 2659 2660 Alias can be specified while requesting specific predicates. Syntax is similar to how would request 2661 alias for other predicates. `orderasc` and `orderdesc` are not allowed as alias as they have special 2662 meaning. Apart from that anything else can be set as alias. 2663 2664 Here we set `car_since`, `close_friend` alias for `since`, `close` facets respectively. 2665 {{< runnable >}} 2666 { 2667 data(func: eq(name, "Alice")) { 2668 name 2669 mobile 2670 car @facets(car_since: since) 2671 friend @facets(close_friend: close) { 2672 name 2673 } 2674 } 2675 } 2676 {{</ runnable >}} 2677 2678 2679 2680 ### Facets on UID predicates 2681 2682 Facets on UID edges work similarly to facets on value edges. 2683 2684 For example, `friend` is an edge with facet `close`. 2685 It was set to true for friendship between Alice and Bob 2686 and false for friendship between Alice and Charlie. 2687 2688 A query for friends of Alice. 2689 2690 {{< runnable >}} 2691 { 2692 data(func: eq(name, "Alice")) { 2693 name 2694 friend { 2695 name 2696 } 2697 } 2698 } 2699 {{</ runnable >}} 2700 2701 A query for friends and the facet `close` with `@facets(close)`. 2702 2703 {{< runnable >}} 2704 { 2705 data(func: eq(name, "Alice")) { 2706 name 2707 friend @facets(close) { 2708 name 2709 } 2710 } 2711 } 2712 {{</ runnable >}} 2713 2714 2715 For uid edges like `friend`, facets go to the corresponding child under the key edge|facet. In the above 2716 example you can see that the `close` facet on the edge between Alice and Bob appears with the key `friend|close` 2717 along with Bob's results. 2718 2719 {{< runnable >}} 2720 { 2721 data(func: eq(name, "Alice")) { 2722 name 2723 friend @facets { 2724 name 2725 car @facets 2726 } 2727 } 2728 } 2729 {{</ runnable >}} 2730 2731 Bob has a `car` and it has a facet `since`, which, in the results, is part of the same object as Bob 2732 under the key car|since. 2733 Also, the `close` relationship between Bob and Alice is part of Bob's output object. 2734 Charlie does not have `car` edge and thus only UID facets. 2735 2736 ### Filtering on facets 2737 2738 Dgraph supports filtering edges based on facets. 2739 Filtering works similarly to how it works on edges without facets and has the same available functions. 2740 2741 2742 Find Alice's close friends 2743 {{< runnable >}} 2744 { 2745 data(func: eq(name, "Alice")) { 2746 friend @facets(eq(close, true)) { 2747 name 2748 } 2749 } 2750 } 2751 {{</ runnable >}} 2752 2753 2754 To return facets as well as filter, add another `@facets(<facetname>)` to the query. 2755 2756 {{< runnable >}} 2757 { 2758 data(func: eq(name, "Alice")) { 2759 friend @facets(eq(close, true)) @facets(relative) { # filter close friends and give relative status 2760 name 2761 } 2762 } 2763 } 2764 {{</ runnable >}} 2765 2766 2767 Facet queries can be composed with `AND`, `OR` and `NOT`. 2768 2769 {{< runnable >}} 2770 { 2771 data(func: eq(name, "Alice")) { 2772 friend @facets(eq(close, true) AND eq(relative, true)) @facets(relative) { # filter close friends in my relation 2773 name 2774 } 2775 } 2776 } 2777 {{</ runnable >}} 2778 2779 2780 ### Sorting using facets 2781 2782 Sorting is possible for a facet on a uid edge. Here we sort the movies rated by Alice, Bob and 2783 Charlie by their `rating` which is a facet. 2784 2785 {{< runnable >}} 2786 { 2787 me(func: anyofterms(name, "Alice Bob Charlie")) { 2788 name 2789 rated @facets(orderdesc: rating) { 2790 name 2791 } 2792 } 2793 } 2794 {{</ runnable >}} 2795 2796 2797 2798 ### Assigning Facet values to a variable 2799 2800 Facets on UID edges can be stored in [value variables]({{< relref "#value-variables" >}}). The variable is a map from the edge target to the facet value. 2801 2802 Alice's friends reported by variables for `close` and `relative`. 2803 {{< runnable >}} 2804 { 2805 var(func: eq(name, "Alice")) { 2806 friend @facets(a as close, b as relative) 2807 } 2808 2809 friend(func: uid(a)) { 2810 name 2811 val(a) 2812 } 2813 2814 relative(func: uid(b)) { 2815 name 2816 val(b) 2817 } 2818 } 2819 {{</ runnable >}} 2820 2821 2822 ### Facets and Variable Propagation 2823 2824 Facet values of `int` and `float` can be assigned to variables and thus the [values propagate]({{< relref "#variable-propagation" >}}). 2825 2826 2827 Alice, Bob and Charlie each rated every movie. A value variable on facet `rating` maps movies to ratings. A query that reaches a movie through multiple paths sums the ratings on each path. The following sums Alice, Bob and Charlie's ratings for the three movies. 2828 2829 {{<runnable >}} 2830 { 2831 var(func: anyofterms(name, "Alice Bob Charlie")) { 2832 num_raters as math(1) 2833 rated @facets(r as rating) { 2834 total_rating as math(r) # sum of the 3 ratings 2835 average_rating as math(total_rating / num_raters) 2836 } 2837 } 2838 data(func: uid(total_rating)) { 2839 name 2840 val(total_rating) 2841 val(average_rating) 2842 } 2843 2844 } 2845 {{</ runnable >}} 2846 2847 2848 2849 ### Facets and Aggregation 2850 2851 Facet values assigned to value variables can be aggregated. 2852 2853 {{< runnable >}} 2854 { 2855 data(func: eq(name, "Alice")) { 2856 name 2857 rated @facets(r as rating) { 2858 name 2859 } 2860 avg(val(r)) 2861 } 2862 } 2863 {{</ runnable >}} 2864 2865 2866 Note though that `r` is a map from movies to the sum of ratings on edges in the query reaching the movie. Hence, the following does not correctly calculate the average ratings for Alice and Bob individually --- it calculates 2 times the average of both Alice and Bob's ratings. 2867 2868 {{< runnable >}} 2869 2870 { 2871 data(func: anyofterms(name, "Alice Bob")) { 2872 name 2873 rated @facets(r as rating) { 2874 name 2875 } 2876 avg(val(r)) 2877 } 2878 } 2879 {{</ runnable >}} 2880 2881 Calculating the average ratings of users requires a variable that maps users to the sum of their ratings. 2882 2883 {{< runnable >}} 2884 2885 { 2886 var(func: has(rated)) { 2887 num_rated as math(1) 2888 rated @facets(r as rating) { 2889 avg_rating as math(r / num_rated) 2890 } 2891 } 2892 2893 data(func: uid(avg_rating)) { 2894 name 2895 val(avg_rating) 2896 } 2897 } 2898 {{</ runnable >}} 2899 2900 ## K-Shortest Path Queries 2901 2902 The shortest path between a source (`from`) node and destination (`to`) node can be found using the keyword `shortest` for the query block name. It requires the source node UID, destination node UID and the predicates (at least one) that have to be considered for traversal. A `shortest` query block returns the shortest path under `_path_` in the query response. The path can also be stored in a variable which is used in other query blocks. 2903 2904 By default the shortest path is returned. With `numpaths: k`, the k-shortest paths are returned. With `depth: n`, the shortest paths up to `n` hops away are returned. 2905 2906 {{% notice "note" %}} 2907 - If no predicates are specified in the `shortest` block, no path can be fetched as no edge is traversed. 2908 - If you're seeing queries take a long time, you can set a [gRPC deadline](https://grpc.io/blog/deadlines) to stop the query after a certain amount of time. 2909 {{% /notice %}} 2910 2911 For example: 2912 2913 ```sh 2914 curl localhost:8080/alter -XPOST -d $' 2915 name: string @index(exact) . 2916 ' | python -m json.tool | less 2917 ``` 2918 2919 ```sh 2920 curl -H "Content-Type: application/rdf" localhost:8080/mutate?commitNow=true -XPOST -d $' 2921 { 2922 set { 2923 _:a <friend> _:b (weight=0.1) . 2924 _:b <friend> _:c (weight=0.2) . 2925 _:c <friend> _:d (weight=0.3) . 2926 _:a <friend> _:d (weight=1) . 2927 _:a <name> "Alice" . 2928 _:b <name> "Bob" . 2929 _:c <name> "Tom" . 2930 _:d <name> "Mallory" . 2931 } 2932 }' | python -m json.tool | less 2933 ``` 2934 2935 The shortest path between Alice and Mallory (assuming UIDs 0x2 and 0x5 respectively) can be found with query: 2936 2937 ```sh 2938 curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{ 2939 path as shortest(from: 0x2, to: 0x5) { 2940 friend 2941 } 2942 path(func: uid(path)) { 2943 name 2944 } 2945 }' | python -m json.tool | less 2946 ``` 2947 2948 Which returns the following results. (Note, without considering the `weight` facet, each edges' weight is considered as 1) 2949 2950 ``` 2951 { 2952 "data": { 2953 "path": [ 2954 { 2955 "name": "Alice" 2956 }, 2957 { 2958 "name": "Mallory" 2959 } 2960 ], 2961 "_path_": [ 2962 { 2963 "uid": "0x2", 2964 "friend": [ 2965 { 2966 "uid": "0x5" 2967 } 2968 ] 2969 } 2970 ] 2971 } 2972 } 2973 ``` 2974 2975 We can return more paths by specifying `numpaths`. Setting `numpaths: 2` returns the shortest two paths: 2976 2977 ```sh 2978 curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{ 2979 2980 A as var(func: eq(name, "Alice")) 2981 M as var(func: eq(name, "Mallory")) 2982 2983 path as shortest(from: uid(A), to: uid(M), numpaths: 2) { 2984 friend 2985 } 2986 path(func: uid(path)) { 2987 name 2988 } 2989 }' | python -m json.tool | less 2990 ``` 2991 2992 {{% notice "note" %}}In the query above, instead of using UID literals, we query both people using var blocks and the `uid()` function. You can also combine it with [GraphQL Variables]({{< relref "#graphql-variables" >}}).{{% /notice %}} 2993 2994 Edges weights are included by using facets on the edges as follows. 2995 2996 {{% notice "note" %}}Only one facet per predicate is allowed in the shortest query block.{{% /notice %}} 2997 2998 ```sh 2999 curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{ 3000 path as shortest(from: 0x2, to: 0x5) { 3001 friend @facets(weight) 3002 } 3003 3004 path(func: uid(path)) { 3005 name 3006 } 3007 }' | python -m json.tool | less 3008 ``` 3009 3010 ``` 3011 { 3012 "data": { 3013 "path": [ 3014 { 3015 "name": "Alice" 3016 }, 3017 { 3018 "name": "Bob" 3019 }, 3020 { 3021 "name": "Tom" 3022 }, 3023 { 3024 "name": "Mallory" 3025 } 3026 ], 3027 "_path_": [ 3028 { 3029 "uid": "0x2", 3030 "friend": [ 3031 { 3032 "uid": "0x3", 3033 "friend|weight": 0.1, 3034 "friend": [ 3035 { 3036 "uid": "0x4", 3037 "friend|weight": 0.2, 3038 "friend": [ 3039 { 3040 "uid": "0x5", 3041 "friend|weight": 0.3 3042 } 3043 ] 3044 } 3045 ] 3046 } 3047 ] 3048 } 3049 ] 3050 } 3051 } 3052 ``` 3053 3054 Constraints can be applied to the intermediate nodes as follows. 3055 3056 ```sh 3057 curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{ 3058 path as shortest(from: 0x2, to: 0x5) { 3059 friend @filter(not eq(name, "Bob")) @facets(weight) 3060 relative @facets(liking) 3061 } 3062 3063 relationship(func: uid(path)) { 3064 name 3065 } 3066 }' | python -m json.tool | less 3067 ``` 3068 3069 The k-shortest path algorithm (used when `numpaths` > 1) also accepts the arguments `minweight` and `maxweight`, which take a float as their value. When they are passed, only paths within the weight range `[minweight, maxweight]` will be considered as valid paths. This can be used, for example, to query the shortest paths that traverse between 2 and 4 nodes. 3070 3071 ```sh 3072 curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $'{ 3073 path as shortest(from: 0x2, to: 0x5, numpaths: 2, minweight: 2, maxweight: 4) { 3074 friend 3075 } 3076 path(func: uid(path)) { 3077 name 3078 } 3079 }' | python -m json.tool | less 3080 ``` 3081 3082 Some points to keep in mind for shortest path queries: 3083 3084 - Weights must be non-negative. Dijkstra's algorithm is used to calculate the shortest paths. 3085 - Only one facet per predicate in the shortest query block is allowed. 3086 - Only one `shortest` path block is allowed per query. Only one `_path_` is returned in the result. 3087 - For k-shortest paths (when `numpaths` > 1), the result of the shortest path query variable will only return a single path. All k paths are returned in `_path_`. 3088 3089 ## Recurse Query 3090 3091 `Recurse` queries let you traverse a set of predicates (with filter, facets, etc.) until we reach all leaf nodes or we reach the maximum depth which is specified by the `depth` parameter. 3092 3093 To get 10 movies from a genre that has more than 30000 films and then get two actors for those movies we'd do something as follows: 3094 {{< runnable >}} 3095 { 3096 me(func: gt(count(~genre), 30000), first: 1) @recurse(depth: 5, loop: true) { 3097 name@en 3098 ~genre (first:10) @filter(gt(count(starring), 2)) 3099 starring (first: 2) 3100 performance.actor 3101 } 3102 } 3103 {{< /runnable >}} 3104 Some points to keep in mind while using recurse queries are: 3105 3106 - You can specify only one level of predicates after root. These would be traversed recursively. Both scalar and entity-nodes are treated similarly. 3107 - Only one recurse block is advised per query. 3108 - Be careful as the result size could explode quickly and an error would be returned if the result set gets too large. In such cases use more filters, limit results using pagination, or provide a depth parameter at root as shown in the example above. 3109 - The `loop` parameter can be set to false, in which case paths which lead to a loop would be ignored 3110 while traversing. 3111 - If not specified, the value of the `loop` parameter defaults to false. 3112 - If the value of the `loop` parameter is false and depth is not specified, `depth` will default to `math.MaxUint64`, which means that the entire graph might be traversed until all the leaf nodes are reached. 3113 3114 3115 ## Fragments 3116 3117 `fragment` keyword allows you to define new fragments that can be referenced in a query, as per [GraphQL specification](https://facebook.github.io/graphql/#sec-Language.Fragments). The point is that if there are multiple parts which query the same set of fields, you can define a fragment and refer to it multiple times instead. Fragments can be nested inside fragments, but no cycles are allowed. Here is one contrived example. 3118 3119 ```sh 3120 curl -H "Content-Type: application/graphql+-" localhost:8080/query -XPOST -d $' 3121 query { 3122 debug(func: uid(1)) { 3123 name@en 3124 ...TestFrag 3125 } 3126 } 3127 fragment TestFrag { 3128 initial_release_date 3129 ...TestFragB 3130 } 3131 fragment TestFragB { 3132 country 3133 }' | python -m json.tool | less 3134 ``` 3135 3136 ## GraphQL Variables 3137 3138 `Variables` can be defined and used in queries which helps in query reuse and avoids costly string building in clients at runtime by passing a separate variable map. A variable starts with a `$` symbol. 3139 For **HTTP requests** with GraphQL Variables, we must use `Content-Type: application/json` header and pass data with a JSON object containing `query` and `variables`. 3140 3141 ```sh 3142 curl -H "Content-Type: application/json" localhost:8080/query -XPOST -d $'{ 3143 "query": "query test($a: string) { test(func: eq(name, $a)) { \n uid \n name \n } }", 3144 "variables": { "$a": "Alice" } 3145 }' | python -m json.tool | less 3146 ``` 3147 3148 {{< runnable vars="{\"$a\": \"5\", \"$b\": \"10\", \"$name\": \"Steven Spielberg\"}" >}} 3149 query test($a: int, $b: int, $name: string) { 3150 me(func: allofterms(name@en, $name)) { 3151 name@en 3152 director.film (first: $a, offset: $b) { 3153 name @en 3154 genre(first: $a) { 3155 name@en 3156 } 3157 } 3158 } 3159 } 3160 {{< /runnable >}} 3161 3162 * Variables can have default values. In the example below, `$a` has a default value of `2`. Since the value for `$a` isn't provided in the variable map, `$a` takes on the default value. 3163 * Variables whose type is suffixed with a `!` can't have a default value but must have a value as part of the variables map. 3164 * The value of the variable must be parsable to the given type, if not, an error is thrown. 3165 * The variable types that are supported as of now are: `int`, `float`, `bool` and `string`. 3166 * Any variable that is being used must be declared in the named query clause in the beginning. 3167 3168 {{< runnable vars="{\"$b\": \"10\", \"$name\": \"Steven Spielberg\"}" >}} 3169 query test($a: int = 2, $b: int!, $name: string) { 3170 me(func: allofterms(name@en, $name)) { 3171 director.film (first: $a, offset: $b) { 3172 genre(first: $a) { 3173 name@en 3174 } 3175 } 3176 } 3177 } 3178 {{< /runnable >}} 3179 3180 You can also use array with GraphQL Variables. 3181 3182 {{< runnable vars="{\"$b\": \"10\", \"$aName\": \"Steven Spielberg\", \"$bName\": \"Quentin Tarantino\"}" >}} 3183 query test($a: int = 2, $b: int!, $aName: string, $bName: string) { 3184 me(func: eq(name@en, [$aName, $bName])) { 3185 director.film (first: $a, offset: $b) { 3186 genre(first: $a) { 3187 name@en 3188 } 3189 } 3190 } 3191 } 3192 {{< /runnable >}} 3193 3194 3195 {{% notice "note" %}} 3196 If you want to input a list of uids as a GraphQL variable value, you can have the variable as string type and 3197 have the value surrounded by square brackets like `["13", "14"]`. 3198 {{% /notice %}} 3199 3200 ## Indexing with Custom Tokenizers 3201 3202 Dgraph comes with a large toolkit of builtin indexes, but sometimes for niche 3203 use cases they're not always enough. 3204 3205 Dgraph allows you to implement custom tokenizers via a plugin system in order 3206 to fill the gaps. 3207 3208 ### Caveats 3209 3210 The plugin system uses Go's [`pkg/plugin`](https://golang.org/pkg/plugin/). 3211 This brings some restrictions to how plugins can be used. 3212 3213 - Plugins must be written in Go. 3214 3215 - As of Go 1.9, `pkg/plugin` only works on Linux. Therefore, plugins will only 3216 work on Dgraph instances deployed in a Linux environment. 3217 3218 - The version of Go used to compile the plugin should be the same as the version 3219 of Go used to compile Dgraph itself. Dgraph always uses the latest version of 3220 Go (and so should you!). 3221 3222 ### Implementing a plugin 3223 3224 {{% notice "note" %}} 3225 You should consider Go's [plugin](https://golang.org/pkg/plugin/) documentation 3226 to be supplementary to the documentation provided here. 3227 {{% /notice %}} 3228 3229 Plugins are implemented as their own main package. They must export a 3230 particular symbol that allows Dgraph to hook into the custom logic the plugin 3231 provides. 3232 3233 The plugin must export a symbol named `Tokenizer`. The type of the symbol must 3234 be `func() interface{}`. When the function is called the result returned should 3235 be a value that implements the following interface: 3236 3237 ``` 3238 type PluginTokenizer interface { 3239 // Name is the name of the tokenizer. It should be unique among all 3240 // builtin tokenizers and other custom tokenizers. It identifies the 3241 // tokenizer when an index is set in the schema and when search/filter 3242 // is used in queries. 3243 Name() string 3244 3245 // Identifier is a byte that uniquely identifiers the tokenizer. 3246 // Bytes in the range 0x80 to 0xff (inclusive) are reserved for 3247 // custom tokenizers. 3248 Identifier() byte 3249 3250 // Type is a string representing the type of data that is to be 3251 // tokenized. This must match the schema type of the predicate 3252 // being indexed. Allowable values are shown in the table below. 3253 Type() string 3254 3255 // Tokens should implement the tokenization logic. The input is 3256 // the value to be tokenized, and will always have a concrete type 3257 // corresponding to Type(). The return value should be a list of 3258 // the tokens generated. 3259 Tokens(interface{}) ([]string, error) 3260 } 3261 ``` 3262 3263 The return value of `Type()` corresponds to the concrete input type of 3264 `Tokens(interface{})` in the following way: 3265 3266 `Type()` return value | `Tokens(interface{})` input type 3267 -----------------------|---------------------------------- 3268 `"int"` | `int64` 3269 `"float"` | `float64` 3270 `"string"` | `string` 3271 `"bool"` | `bool` 3272 `"datetime"` | `time.Time` 3273 3274 ### Building the plugin 3275 3276 The plugin has to be built using the `plugin` build mode so that an `.so` file 3277 is produced instead of a regular executable. For example: 3278 3279 ```sh 3280 go build -buildmode=plugin -o myplugin.so ~/go/src/myplugin/main.go 3281 ``` 3282 3283 ### Running Dgraph with plugins 3284 3285 When starting Dgraph, use the `--custom_tokenizers` flag to tell Dgraph which 3286 tokenizers to load. It accepts a comma separated list of plugins. E.g. 3287 3288 ```sh 3289 dgraph ...other-args... --custom_tokenizers=plugin1.so,plugin2.so 3290 ``` 3291 3292 {{% notice "note" %}} 3293 Plugin validation is performed on startup. If a problem is detected, Dgraph 3294 will refuse to initialise. 3295 {{% /notice %}} 3296 3297 ### Adding the index to the schema 3298 3299 To use a tokenization plugin, an index has to be created in the schema. 3300 3301 The syntax is the same as adding any built-in index. To add an custom index 3302 using a tokenizer plugin named `foo` to a `string` predicate named 3303 `my_predicate`, use the following in the schema: 3304 3305 ```sh 3306 my_predicate: string @index(foo) . 3307 ``` 3308 3309 ### Using the index in queries 3310 3311 There are two functions that can use custom indexes: 3312 3313 Mode | Behaviour 3314 --------|------- 3315 `anyof` | Returns nodes that match on *any* of the tokens generated 3316 `allof` | Returns nodes that match on *all* of the tokens generated 3317 3318 The functions can be used either at the query root or in filters. 3319 3320 There behaviour here an analogous to `anyofterms`/`allofterms` and 3321 `anyoftext`/`alloftext`. 3322 3323 ### Examples 3324 3325 The following examples should make the process of writing a tokenization plugin 3326 more concrete. 3327 3328 #### Unicode Characters 3329 3330 This example shows the type of tokenization that is similar to term 3331 tokenization of full-text search. Instead of being broken down into terms or 3332 stem words, the text is instead broken down into its constituent unicode 3333 codepoints (in Go terminology these are called *runes*). 3334 3335 {{% notice "note" %}} 3336 This tokenizer would create a very large index that would be expensive to 3337 manage and store. That's one of the reasons that text indexing usually occurs 3338 at a higher level; stem words for full-text search or terms for term search. 3339 {{% /notice %}} 3340 3341 The implementation of the plugin looks like this: 3342 3343 ```go 3344 package main 3345 3346 import "encoding/binary" 3347 3348 func Tokenizer() interface{} { return RuneTokenizer{} } 3349 3350 type RuneTokenizer struct{} 3351 3352 func (RuneTokenizer) Name() string { return "rune" } 3353 func (RuneTokenizer) Type() string { return "string" } 3354 func (RuneTokenizer) Identifier() byte { return 0xfd } 3355 3356 func (t RuneTokenizer) Tokens(value interface{}) ([]string, error) { 3357 var toks []string 3358 for _, r := range value.(string) { 3359 var buf [binary.MaxVarintLen32]byte 3360 n := binary.PutVarint(buf[:], int64(r)) 3361 tok := string(buf[:n]) 3362 toks = append(toks, tok) 3363 } 3364 return toks, nil 3365 } 3366 ``` 3367 3368 **Hints and tips:** 3369 3370 - Inside `Tokens`, you can assume that `value` will have concrete type 3371 corresponding to that specified by `Type()`. It's safe to do a type 3372 assertion. 3373 3374 - Even though the return value is `[]string`, you can always store non-unicode 3375 data inside the string. See [this blogpost](https://blog.golang.org/strings) 3376 for some interesting background how string are implemented in Go and why they 3377 can be used to store non-textual data. By storing arbitrary data in the string, 3378 you can make the index more compact. In this case, varints are stored in the 3379 return values. 3380 3381 Setting up the indexing and adding data: 3382 ``` 3383 name: string @index(rune) . 3384 ``` 3385 3386 3387 ``` 3388 { 3389 set{ 3390 _:ad <name> "Adam" . 3391 _:aa <name> "Aaron" . 3392 _:am <name> "Amy" . 3393 _:ro <name> "Ronald" . 3394 } 3395 } 3396 ``` 3397 Now queries can be performed. 3398 3399 The only person that has all of the runes `A` and `n` in their `name` is Aaron: 3400 ``` 3401 { 3402 q(func: allof(name, rune, "An")) { 3403 name 3404 } 3405 } 3406 => 3407 { 3408 "data": { 3409 "q": [ 3410 { "name": "Aaron" } 3411 ] 3412 } 3413 } 3414 ``` 3415 But there are multiple people who have both of the runes `A` and `m`: 3416 ``` 3417 { 3418 q(func: allof(name, rune, "Am")) { 3419 name 3420 } 3421 } 3422 => 3423 { 3424 "data": { 3425 "q": [ 3426 { "name": "Amy" }, 3427 { "name": "Adam" } 3428 ] 3429 } 3430 } 3431 ``` 3432 Case is taken into account, so if you search for all names containing `"ron"`, 3433 you would find `"Aaron"`, but not `"Ronald"`. But if you were to search for 3434 `"no"`, you would match both `"Aaron"` and `"Ronald"`. The order of the runes in 3435 the strings doesn't matter. 3436 3437 It's possible to search for people that have *any* of the supplied runes in 3438 their names (rather than *all* of the supplied runes). To do this, use `anyof` 3439 instead of `allof`: 3440 ``` 3441 { 3442 q(func: anyof(name, rune, "mr")) { 3443 name 3444 } 3445 } 3446 => 3447 { 3448 "data": { 3449 "q": [ 3450 { "name": "Adam" }, 3451 { "name": "Aaron" }, 3452 { "name": "Amy" } 3453 ] 3454 } 3455 } 3456 ``` 3457 `"Ronald"` doesn't contain `m` or `r`, so isn't found by the search. 3458 3459 {{% notice "note" %}} 3460 Understanding what's going on under the hood can help you intuitively 3461 understand how `Tokens` method should be implemented. 3462 3463 When Dgraph sees new edges that are to be indexed by your tokenizer, it 3464 will tokenize the value. The resultant tokens are used as keys for posting 3465 lists. The edge subject is then added to the posting list for each token. 3466 3467 When a query root search occurs, the search value is tokenized. The result of 3468 the search is all of the nodes in the union or intersection of the corresponding 3469 posting lists (depending on whether `anyof` or `allof` was used). 3470 {{% /notice %}} 3471 3472 #### CIDR Range 3473 3474 Tokenizers don't always have to be about splitting text up into its constituent 3475 parts. This example indexes [IP addresses into their CIDR 3476 ranges](https://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing). This 3477 allows you to search for all IP addresses that fall into a particular CIDR 3478 range. 3479 3480 The plugin code is more complicated than the rune example. The input is an IP 3481 address stored as a string, e.g. `"100.55.22.11/32"`. The output are the CIDR 3482 ranges that the IP address could possibly fall into. There could be up to 32 3483 different outputs (`"100.55.22.11/32"` does indeed have 32 possible ranges, one 3484 for each mask size). 3485 3486 ```go 3487 package main 3488 3489 import "net" 3490 3491 func Tokenizer() interface{} { return CIDRTokenizer{} } 3492 3493 type CIDRTokenizer struct{} 3494 3495 func (CIDRTokenizer) Name() string { return "cidr" } 3496 func (CIDRTokenizer) Type() string { return "string" } 3497 func (CIDRTokenizer) Identifier() byte { return 0xff } 3498 3499 func (t CIDRTokenizer) Tokens(value interface{}) ([]string, error) { 3500 _, ipnet, err := net.ParseCIDR(value.(string)) 3501 if err != nil { 3502 return nil, err 3503 } 3504 ones, bits := ipnet.Mask.Size() 3505 var toks []string 3506 for i := ones; i >= 1; i-- { 3507 m := net.CIDRMask(i, bits) 3508 tok := net.IPNet{ 3509 IP: ipnet.IP.Mask(m), 3510 Mask: m, 3511 } 3512 toks = append(toks, tok.String()) 3513 } 3514 return toks, nil 3515 } 3516 ``` 3517 An example of using the tokenizer: 3518 3519 Setting up the indexing and adding data: 3520 ``` 3521 ip: string @index(cidr) . 3522 3523 ``` 3524 3525 ``` 3526 { 3527 set{ 3528 _:a <ip> "100.55.22.11/32" . 3529 _:b <ip> "100.33.81.19/32" . 3530 _:c <ip> "100.49.21.25/32" . 3531 _:d <ip> "101.0.0.5/32" . 3532 _:e <ip> "100.176.2.1/32" . 3533 } 3534 } 3535 ``` 3536 ``` 3537 { 3538 q(func: allof(ip, cidr, "100.48.0.0/12")) { 3539 ip 3540 } 3541 } 3542 => 3543 { 3544 "data": { 3545 "q": [ 3546 { "ip": "100.55.22.11/32" }, 3547 { "ip": "100.49.21.25/32" } 3548 ] 3549 } 3550 } 3551 ``` 3552 The CIDR ranges of `100.55.22.11/32` and `100.49.21.25/32` are both 3553 `100.48.0.0/12`. The other IP addresses in the database aren't included in the 3554 search result, since they have different CIDR ranges for 12 bit masks 3555 (`100.32.0.0/12`, `101.0.0.0/12`, `100.154.0.0/12` for `100.33.81.19/32`, 3556 `101.0.0.5/32`, and `100.176.2.1/32` respectively). 3557 3558 Note that we're using `allof` instead of `anyof`. Only `allof` will work 3559 correctly with this index. Remember that the tokenizer generates all possible 3560 CIDR ranges for an IP address. If we were to use `anyof` then the search result 3561 would include all IP addresses under the 1 bit mask (in this case, `0.0.0.0/1`, 3562 which would match all IPs in this dataset). 3563 3564 #### Anagram 3565 3566 Tokenizers don't always have to return multiple tokens. If you just want to 3567 index data into groups, have the tokenizer just return an identifying member of 3568 that group. 3569 3570 In this example, we want to find groups of words that are 3571 [anagrams](https://en.wikipedia.org/wiki/Anagram) of each 3572 other. 3573 3574 A token to correspond to a group of anagrams could just be the letters in the 3575 anagram in sorted order, as implemented below: 3576 3577 ```go 3578 package main 3579 3580 import "sort" 3581 3582 func Tokenizer() interface{} { return AnagramTokenizer{} } 3583 3584 type AnagramTokenizer struct{} 3585 3586 func (AnagramTokenizer) Name() string { return "anagram" } 3587 func (AnagramTokenizer) Type() string { return "string" } 3588 func (AnagramTokenizer) Identifier() byte { return 0xfc } 3589 3590 func (t AnagramTokenizer) Tokens(value interface{}) ([]string, error) { 3591 b := []byte(value.(string)) 3592 sort.Slice(b, func(i, j int) bool { return b[i] < b[j] }) 3593 return []string{string(b)}, nil 3594 } 3595 ``` 3596 In action: 3597 3598 Setting up the indexing and adding data: 3599 ``` 3600 word: string @index(anagram) . 3601 ``` 3602 3603 ``` 3604 { 3605 set{ 3606 _:1 <word> "airmen" . 3607 _:2 <word> "marine" . 3608 _:3 <word> "beat" . 3609 _:4 <word> "beta" . 3610 _:5 <word> "race" . 3611 _:6 <word> "care" . 3612 } 3613 } 3614 ``` 3615 ``` 3616 { 3617 q(func: allof(word, anagram, "remain")) { 3618 word 3619 } 3620 } 3621 => 3622 { 3623 "data": { 3624 "q": [ 3625 { "word": "airmen" }, 3626 { "word": "marine" } 3627 ] 3628 } 3629 } 3630 ``` 3631 3632 Since a single token is only ever generated, it doesn't matter if `anyof` or 3633 `allof` is used. The result will always be the same. 3634 3635 #### Integer prime factors 3636 3637 All of the custom tokenizers shown previously have worked with strings. 3638 However, other data types can be used as well. This example is contrived, but 3639 nonetheless shows some advanced usages of custom tokenizers. 3640 3641 The tokenizer creates a token for each prime factor in the input. 3642 3643 ``` 3644 package main 3645 3646 import ( 3647 "encoding/binary" 3648 "fmt" 3649 ) 3650 3651 func Tokenizer() interface{} { return FactorTokenizer{} } 3652 3653 type FactorTokenizer struct{} 3654 3655 func (FactorTokenizer) Name() string { return "factor" } 3656 func (FactorTokenizer) Type() string { return "int" } 3657 func (FactorTokenizer) Identifier() byte { return 0xfe } 3658 3659 func (FactorTokenizer) Tokens(value interface{}) ([]string, error) { 3660 x := value.(int64) 3661 if x <= 1 { 3662 return nil, fmt.Errorf("Cannot factor int <= 1: %d", x) 3663 } 3664 var toks []string 3665 for p := int64(2); x > 1; p++ { 3666 if x%p == 0 { 3667 toks = append(toks, encodeInt(p)) 3668 for x%p == 0 { 3669 x /= p 3670 } 3671 } 3672 } 3673 return toks, nil 3674 3675 } 3676 3677 func encodeInt(x int64) string { 3678 var buf [binary.MaxVarintLen64]byte 3679 n := binary.PutVarint(buf[:], x) 3680 return string(buf[:n]) 3681 } 3682 ``` 3683 {{% notice "note" %}} 3684 Notice that the return of `Type()` is `"int"`, corresponding to the concrete 3685 type of the input to `Tokens` (which is `int64`). 3686 {{% /notice %}} 3687 3688 This allows you do things like search for all numbers that share prime 3689 factors with a particular number. 3690 3691 In particular, we search for numbers that contain any of the prime factors of 3692 15, i.e. any numbers that are divisible by either 3 or 5. 3693 3694 Setting up the indexing and adding data: 3695 ``` 3696 num: int @index(factor) . 3697 ``` 3698 3699 ``` 3700 { 3701 set{ 3702 _:2 <num> "2"^^<xs:int> . 3703 _:3 <num> "3"^^<xs:int> . 3704 _:4 <num> "4"^^<xs:int> . 3705 _:5 <num> "5"^^<xs:int> . 3706 _:6 <num> "6"^^<xs:int> . 3707 _:7 <num> "7"^^<xs:int> . 3708 _:8 <num> "8"^^<xs:int> . 3709 _:9 <num> "9"^^<xs:int> . 3710 _:10 <num> "10"^^<xs:int> . 3711 _:11 <num> "11"^^<xs:int> . 3712 _:12 <num> "12"^^<xs:int> . 3713 _:13 <num> "13"^^<xs:int> . 3714 _:14 <num> "14"^^<xs:int> . 3715 _:15 <num> "15"^^<xs:int> . 3716 _:16 <num> "16"^^<xs:int> . 3717 _:17 <num> "17"^^<xs:int> . 3718 _:18 <num> "18"^^<xs:int> . 3719 _:19 <num> "19"^^<xs:int> . 3720 _:20 <num> "20"^^<xs:int> . 3721 _:21 <num> "21"^^<xs:int> . 3722 _:22 <num> "22"^^<xs:int> . 3723 _:23 <num> "23"^^<xs:int> . 3724 _:24 <num> "24"^^<xs:int> . 3725 _:25 <num> "25"^^<xs:int> . 3726 _:26 <num> "26"^^<xs:int> . 3727 _:27 <num> "27"^^<xs:int> . 3728 _:28 <num> "28"^^<xs:int> . 3729 _:29 <num> "29"^^<xs:int> . 3730 _:30 <num> "30"^^<xs:int> . 3731 } 3732 } 3733 ``` 3734 ``` 3735 { 3736 q(func: anyof(num, factor, 15)) { 3737 num 3738 } 3739 } 3740 => 3741 { 3742 "data": { 3743 "q": [ 3744 { "num": 3 }, 3745 { "num": 5 }, 3746 { "num": 6 }, 3747 { "num": 9 }, 3748 { "num": 10 }, 3749 { "num": 12 }, 3750 { "num": 15 }, 3751 { "num": 18 } 3752 { "num": 20 }, 3753 { "num": 21 }, 3754 { "num": 25 }, 3755 { "num": 24 }, 3756 { "num": 27 }, 3757 { "num": 30 }, 3758 ] 3759 } 3760 } 3761 ```