github.com/dgraph-io/dgraph@v1.2.8/wiki/content/tutorial-5/index.md (about)

     1  +++
     2  title = "Get Started with Dgraph - String Indices and Modeling Tweet Graph"
     3  +++
     4  
     5  **Welcome to the fifth tutorial of getting started with Dgraph.**
     6  
     7  In the [previous tutorial]({{< relref "tutorial-4/index.md" >}}), we learned about using multi-language strings and operations on them using [language tags](https://www.w3schools.com/tags/ref_language_codes.asp).
     8  
     9  In this tutorial, we'll model tweets in Dgraph and, using it, we'll learn more about string indices in Dgraph.
    10  
    11  We'll specifically learn about:
    12  
    13  - Modeling tweets in Dgraph.
    14  - Using String indices in Dgraph
    15    - Querying twitter users using the `hash` index.
    16    - Comparing strings using the `exact` index.
    17    - Searching for tweets based on keywords using the `term` index.
    18  
    19  Let's start analyzing the anatomy of a real tweet and figure out how to model it in Dgraph.
    20  
    21  The accompanying video of the tutorial will be out shortly, so stay tuned to [our YouTube channel](https://www.youtube.com/channel/UCghE41LR8nkKFlR3IFTRO4w).
    22  
    23  ## Modeling a tweet in Dgraph
    24  
    25  Here's a sample tweet.
    26  
    27  {{< tweet 1194740206177402880>}}
    28  
    29  Let's dissect the tweet above. Here are the components of the tweet:
    30  
    31  - **The Author**
    32  
    33    The author of the tweet is the user `@hackintoshrao`.
    34  
    35  - **The Body**
    36  
    37    This component is the content of the tweet.
    38  
    39    > Test tweet for the fifth episode of getting started series with @dgraphlabs.
    40  Wait for the video of the fourth one by @francesc the coming Wednesday!
    41  #GraphDB #GraphQL
    42  
    43  - **The Hashtags**
    44  
    45    Here are the hashtags in the tweet: `#GraphQL` and `#GraphDB`.
    46  
    47  - **The Mentions**
    48  
    49    A tweet can mention other twitter users.
    50  
    51    Here are the mentions in the tweet above: `@dgraphlabs` and `@francesc`.
    52  
    53  Before we model tweets in Dgraph using these components, let's recap the design principles of a graph model:
    54  
    55  > `Nodes` and `Edges` are the building blocks of a graph model.
    56  May it be a sale, a tweet, user info, any concept or an entity is represented as a node.
    57  If any two nodes are related, represent that by creating an edge between them.
    58  
    59  With the above design principles in mind, let's go through components of a tweet and see how we could fit them into Dgraph.
    60  
    61  **The Author**
    62  
    63  The Author of a tweet is a twitter user. We should use a node to represent this.
    64  
    65  **The Body**
    66  
    67  We should represent every tweet as a node.
    68  
    69  **The Hashtags**
    70  
    71  It is advantageous to represent a hashtag as a node of its own.
    72  It gives us better flexibility while querying.
    73  
    74  Though you can search for hashtags from the body of a tweet, it's not efficient to do so.
    75  Creating unique nodes to represent a hashtag, allows you to write performant queries like the following: _Hey Dgraph, give me all the tweets with hashtag #graphql_
    76  
    77  **The Mentions**
    78  
    79  A mention represents a twitter user, and we've already modeled a user as a node.
    80  Therefore, we represent a mention as an edge between a tweet and the users mentioned.
    81  
    82  ### The Relationships
    83  
    84  We have three types of nodes: `User`, `Tweet,` and `Hashtag`.
    85  
    86  {{% load-img "/images/tutorials/5/a-nodes.jpg" "graph nodes" %}}
    87  
    88  Let's look at how these nodes might be related to each other and model their relationship as an edge between them.
    89  
    90  **The User and Tweet nodes**
    91  
    92  There's a two-way relationship between a `Tweet` and a `User` node.
    93  
    94  - Every tweet is authored by a user, and a user can author many tweets.
    95  
    96  Let's name the edge representing this relationship  as `authored` .
    97  
    98  An `authored` edge points from a `User` node to a `Tweet` node.
    99  
   100  - A tweet can mention many users, and users can be mentioned in many tweets.
   101  
   102  Let's name the edge which represents this relationship as `mentioned`.
   103  
   104  A `mentioned` edge points from a `Tweet` node to a `User` node.
   105  These users are the ones who are mentioned in the tweet.
   106  
   107  {{% load-img "/images/tutorials/5/a-tweet-user.jpg" "graph nodes" %}}
   108  
   109  **The tweet and the hashtag nodes**
   110  
   111  A tweet can have one or more hashtags.
   112  Let's name the edge, which represents this relationship as `tagged_with`.
   113  
   114  
   115  A `tagged_with` edge points from a `Tweet` node to a `Hashtag` node.
   116  These hashtag nodes correspond to the hashtags in the tweets.
   117  
   118  {{% load-img "/images/tutorials/5/a-tagged.jpg" "graph nodes" %}}
   119  
   120  **The Author and hashtag nodes**
   121  
   122  There's no direct relationship between an author and a hashtag node.
   123  Hence, we don't need a direct edge between them.
   124  
   125  Our graph model of a tweet is ready! Here's it is.
   126  
   127  {{% load-img "/images/tutorials/5/a-graph-model.jpg" "tweet model" %}}
   128  
   129  Here is the graph of our sample tweet.
   130  
   131  {{% load-img "/images/tutorials/5/c-tweet-model.jpg" "tweet model" %}}
   132  
   133  Let's add a couple of tweets to the list.
   134  
   135  {{< tweet 1142124111650443273>}}
   136  
   137  {{< tweet 1192822660679577602>}}
   138  
   139  We'll be using these two tweets and the sample tweet, which we used in the beginning as our dataset.
   140  Open Ratel, go to the mutate tab, paste the mutation, and click Run.
   141  
   142  ```json
   143  {
   144    "set": [
   145      {
   146        "user_handle": "hackintoshrao",
   147        "user_name": "Karthic Rao",
   148        "uid": "_:hackintoshrao",
   149        "authored": [
   150          {
   151            "tweet": "Test tweet for the fifth episode of getting started series with @dgraphlabs. Wait for the video of the fourth one by @francesc the coming Wednesday!\n#GraphDB #GraphQL",
   152            "tagged_with": [
   153              {
   154                "uid": "_:graphql",
   155                "hashtag": "GraphQL"
   156              },
   157              {
   158                "uid": "_:graphdb",
   159                "hashtag": "GraphDB"
   160              }
   161            ],
   162            "mentioned": [
   163              {
   164                "uid": "_:francesc"
   165              },
   166              {
   167                "uid": "_:dgraphlabs"
   168              }
   169            ]
   170          }
   171        ]
   172      },
   173      {
   174        "user_handle": "francesc",
   175        "user_name": "Francesc Campoy",
   176        "uid": "_:francesc",
   177        "authored": [
   178          {
   179            "tweet": "So many good talks at #graphqlconf, next year I'll make sure to be *at least* in the audience!\nAlso huge thanks to the live tweeting by @dgraphlabs for alleviating the FOMO😊\n#GraphDB ♥️ #GraphQL",
   180            "tagged_with": [
   181              {
   182                "uid": "_:graphql"
   183              },
   184              {
   185                "uid": "_:graphdb"
   186              },
   187              {
   188                "hashtag": "graphqlconf"
   189              }
   190            ],
   191            "mentioned": [
   192              {
   193                "uid": "_:dgraphlabs"
   194              }
   195            ]
   196          }
   197        ]
   198      },
   199      {
   200        "user_handle": "dgraphlabs",
   201        "user_name": "Dgraph Labs",
   202        "uid": "_:dgraphlabs",
   203        "authored": [
   204          {
   205            "tweet": "Let's Go and catch @francesc at @Gopherpalooza today, as he scans into Go source code by building its Graph in Dgraph!\nBe there, as he Goes through analyzing Go source code, using a Go program, that stores data in the GraphDB built in Go!\n#golang #GraphDB #Databases #Dgraph ",
   206            "tagged_with": [
   207              {
   208                "hashtag": "golang"
   209              },
   210              {
   211                "uid": "_:graphdb"
   212              },
   213              {
   214                "hashtag": "Databases"
   215              },
   216              {
   217                "hashtag": "Dgraph"
   218              }
   219            ],
   220            "mentioned": [
   221              {
   222                "uid": "_:francesc"
   223              },
   224              {
   225                "uid": "_:dgraphlabs"
   226              }
   227            ]
   228          },
   229          {
   230            "uid": "_:gopherpalooza",
   231            "user_handle": "gopherpalooza",
   232            "user_name": "Gopherpalooza"
   233          }
   234        ]
   235      }
   236    ]
   237  }
   238  ```
   239  
   240  _Note: If you're new to Dgraph, and yet to figure out how to run the database and use Ratel, we highly recommend reading the [first article of the series]({{< relref "tutorial-1/index.md" >}})_
   241  
   242  Here is the graph we built.
   243  
   244  {{% load-img "/images/tutorials/5/x-all-tweets.png" "tweet graph" %}}
   245  
   246  Our graph has:
   247  
   248  - Five blue twitter user nodes.
   249  - The green nodes are the tweets.
   250  - The blue ones are the hashtags.
   251  
   252  Let's start our tweet exploration by querying for the twitter users in the database.
   253  
   254  ```
   255  {
   256    tweet_graph(func: has(user_handle)) {
   257       user_handle
   258    }
   259  }
   260  ```
   261  
   262  {{% load-img "/images/tutorials/5/j-users.png" "tweet model" %}}
   263  
   264  _Note: If the query syntax above looks not so familiar to you, check out the [first tutorial]({{< relref "tutorial-1/index.md" >}})._
   265  
   266  We have four twitter users: `@hackintoshrao`, `@francesc`, `@dgraphlabs`, and `@gopherpalooza`.
   267  
   268  Now, let's find their tweets and hashtags too.
   269  
   270  ```graphql
   271  {
   272    tweet_graph(func: has(user_handle)) {
   273       user_name
   274       authored {
   275        tweet
   276        tagged_with {
   277          hashtag
   278        }
   279      }
   280    }
   281  }
   282  ```
   283  
   284  {{% load-img "/images/tutorials/5/y-author-tweet.png" "tweet model" %}}
   285  
   286  _Note: If the traversal query syntax in the above query is not familiar to you, [check out the third tutorial]({{< relref "tutorial-3/index.md" >}}) of the series._
   287  
   288  Before we start querying our graph, let's learn a bit about database indices using a simple analogy.
   289  
   290  ### What are indices?
   291  
   292  Indexing is a way to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed.
   293  
   294  Consider a "Book" of 600 pages, divided into 30 sections.
   295  Let's say each section has a different number of pages in it.
   296  
   297  Now, without an index page, to find a particular section that starts with the letter "F", you have no other option than scanning through the entire book. i.e: 600 pages.
   298  
   299  But with an index page at the beginning makes it easier to access the intended information.
   300  You just need to look over the index page, after finding the matching index, you can efficiently jump to the section by skipping other sections.
   301  
   302  But remember that the index page also takes disk space!
   303  Use them only when necessary.
   304  
   305  In our next section,let's learn some interesting queries on our twitter graph.
   306  
   307  ## String indices and querying
   308  
   309  ### Hash index
   310  
   311  Let's compose a query which says: _Hey Dgraph, find me the tweets of user with twitter handle equals to `hackintoshrao`._
   312  
   313  Before we do so, we need first to add an index has to the `user_handle` predicate.
   314  We know that there are 5 types of string indices: `hash`, `exact`, `term`, `full-text`, and `trigram`.
   315  
   316  The type of string index to be used depends on the kind of queries you want to run on the string predicate.
   317  
   318  In this case, we want to search for a node based on the exact string value of a predicate.
   319  For a use case like this one, the `hash` index is recommended.
   320  
   321  Let's first add the `hash` index to the `user_handle` predicate.
   322  
   323  {{% load-img "/images/tutorials/5/k-hash.png" "tweet model" %}}
   324  
   325  Now, let's use the `eq` comparator to find all the tweets of `hackintoshrao`.
   326  
   327  Go to the query tab, type in the query, and click Run.
   328  
   329  ```graphql
   330   {
   331    tweet_graph(func: eq(user_handle, "hackintoshrao")) {
   332       user_name
   333       authored {
   334  		tweet
   335      }
   336    }
   337  }
   338  ```
   339  
   340  {{% load-img "/images/tutorials/5/z-exact.png" "tweet model" %}}
   341  
   342  _Note: Refer to [the third tutorial]({{< relref "tutorial-3/index.md" >}}), if you want to know about comparator functions like `eq` in detail._
   343  
   344  Let's extend the last query also to fetch the hashtags and the mentions.
   345  
   346  ```graphql
   347  {
   348    tweet_graph(func: eq(user_handle, "hackintoshrao")) {
   349       user_name
   350       authored {
   351        tweet
   352        tagged_with {
   353          hashtag
   354        }
   355        mentioned {
   356          user_name
   357        }
   358      }
   359    }
   360  }
   361  ```
   362  
   363  {{% load-img "/images/tutorials/5/l-hash-query.png" "tweet model" %}}
   364  
   365  _Note: If the traversal query syntax in the above query is not familiar to you, [check out the third tutorial]({{< relref "tutorial-3/index.md" >}}) of the series._
   366  
   367  Did you know that string values in Dgraph can also be compared using comparators like greater-than or less-than? 
   368  
   369  In our next section, let's see how to run the comparison functions other than `equals to (eq)` on the string predicates.
   370  
   371  ### Exact Index
   372  
   373  We discussed in the [third tutorial]({{< relref "tutorial-3/index.md" >}}) that there five comparator functions in Dgraph.
   374  
   375  Here's a quick recap:
   376  
   377  | comparator function name | Full form |
   378  |--------------------------|--------------------------|
   379  | eq | equals to |
   380  | lt | less than |
   381  | le | less than or equal to |
   382  | gt | greater than |
   383  | ge | greater than or equal to |
   384  
   385  All five comparator functions can be applied to the string predicates.
   386  
   387  We have already used the `eq` operator.
   388  The other four are useful for operations, which depend on the alphabetical ordering of the strings.
   389  
   390  Let's learn about it with a simple example.
   391  
   392  Let's find the twitter accounts which come after `dgraphlabs` in alphabetically sorted order.
   393  
   394  ```graphql
   395  {
   396    using_greater_than(func: gt(user_handle, "dgraphlabs")) {
   397      user_handle
   398    }
   399  }
   400  ```
   401  
   402  {{% load-img "/images/tutorials/5/n-exact-error.png" "tweet model" %}}
   403  
   404  Oops, we have an error!
   405  
   406  You can see from the error that the current `hash` index on the `user_handle` predicate doesn't support the `gt` function. 
   407  
   408  To be able to do string comparison operations like the one above, you need first set the `exact` index on the string predicate.
   409  
   410  The `exact` index is the only string index that allows you to use the `ge`, `gt`, `le`, `lt` comparators on the string predicates.
   411  
   412  Remind you that the `exact` index also allows you to use `equals to (eq)` comparator.
   413  But, if you want to just use the `equals to (eq)` comparator on string predicates, using the `exact` index would be an overkill.
   414  The `hash` index would be a better option, as it is, in general, much more space-efficient.
   415  
   416  Let's see the `exact` index in action.
   417  
   418  {{% load-img "/images/tutorials/5/o-exact-conflict.png" "set exact" %}}
   419  
   420  We again have an error!
   421  
   422  Though a string predicate can have more than one index, some of them are not compatible with each other.
   423  One such example is the combination of the `hash` and the `exact` indices.
   424  
   425  The `user_handle` predicate already has the `hash` index, so trying to set the `exact` index gives you an error.
   426  
   427  Let's uncheck the `hash` index for the `user_handle` predicate, select the `exact` index, and click update.
   428  
   429  {{% load-img "/images/tutorials/5/p-set-exact.png" "set exact" %}}
   430  
   431  Though Dgraph allows you to change the index type of a predicate, do it only if it's necessary.
   432  When the indices are changed, the data needs to be re-indexed, and this takes some computing, so it could take a bit of time.
   433  While the re-indexing operation is running, all mutations will be put on hold.
   434  
   435  Now, let's re-run the query.
   436  
   437  {{% load-img "/images/tutorials/5/q-exact-gt.png" "tweet model" %}}
   438  
   439  The result contains three twitter handles: `francesc`, `gopherpalooza`, and `hackintoshrao`.
   440  
   441  In the alphabetically sorted order, these twitter handles are greater than `dgraphlabs`.
   442  
   443  Some tweets appeal to us better than others.
   444  For instance, I love `Graphs` and `Go`.
   445  Hence, I would surely enjoy tweets that are related to these topics.
   446  A keyword-based search is a useful way to find relevant information.
   447  
   448  Can we search for tweets based on one or more keywords related to your interests?
   449  
   450  Yes, we can! Let's do that in our next section.
   451  
   452  ### The Term index
   453  
   454  The `term` index lets you search string predicates based on one or more keywords.
   455  These keywords are called terms.
   456  
   457  To be able to search tweets with specific keywords or terms, we need to first set the `term` index on the tweets.
   458  
   459  Adding the `term` index is similar to adding any other string index.
   460  
   461  {{% load-img "/images/tutorials/5/r-term-set.png" "term set" %}}
   462  
   463  Dgraph provides two built-in functions specifically to search for terms: `allofterms` and `anyofterms`.
   464  
   465  Apart from these two functions, the `term` index only supports the `eq` comparator.
   466  This means any other query functions (like eq, lt, gt...) fails when run on string predicates with the `term` index.
   467  
   468  We'll soon take a look at the table containing the string indices and their supporting query functions.
   469  But first, let's learn how to use `anyofterms` and `allofterms` query functions.
   470  Let's write a query to find all tweets with terms or keywords `Go` or `Graph` in them.
   471  
   472  Go the query tab, paste the query, and click Run.
   473  
   474  ```graphql
   475  {
   476    find_tweets(func: anyofterms(tweet, "Go Graph")) {
   477      tweet
   478    }
   479  }
   480  ```
   481  
   482  Here's the matched tweet from the query response:
   483  
   484  ```json
   485  {
   486          "tweet": "Let's Go and catch @francesc at @Gopherpalooza today, as he scans into Go source code by building its Graph in Dgraph!\nBe there, as he Goes through analyzing Go source code, using a Go program, that stores data in the GraphDB built in Go!\n#golang #GraphDB #Databases #Dgraph "
   487  }
   488  ```
   489  
   490  {{% load-img "/images/tutorials/5/s-go-graph.png" "go graph set" %}}
   491  
   492  _Note: Check out [the first tutorial]({{< relref "tutorial-1/index.md" >}}) if the query syntax, in general, is not familiar to you_
   493  
   494  The `anyofterms` function returns tweets which have either of `Go` or `Graph` keyword.
   495  
   496  In this case, we've used only two terms to search for (`Go` and `Graph`), but you can extend for any number of terms to be searched or matched.
   497  
   498  The result has one of the three tweets in the database.
   499  The other two tweets don't make it to the result since they don't have either of the terms `Go` or `Graph`.
   500  
   501  It's also important to notice that the term search functions (`anyofterms` and `allofterms`) are insensitive to case and special characters.
   502  
   503  This means, if you search for the term `GraphQL`, the query returns a positive match for all of the following terms found in the tweets: `graphql`, `graphQL`, `#graphql`, `#GraphQL`.
   504  
   505  Now, let's find tweets that have either of the terms `Go` or `GraphQL` in them.
   506  
   507  
   508  ```graphql
   509  {
   510    find_tweets(func: anyofterms(tweet, "Go GraphQL")) {
   511      tweet
   512    }
   513  }
   514  ```
   515  
   516  {{% load-img "/images/tutorials/5/t-go-graphql-all.png" "Go Graphql" %}}
   517  
   518  Oh wow, we have all the three tweets in the result.
   519  This means, all of the three tweets have either of the terms `Go` or `GraphQL`.
   520  
   521  Now, how about finding tweets that contain both the terms `Go` and `GraphQL` in them.
   522  We can do it by using the `allofterms` function.
   523  
   524  ```graphql
   525  {
   526    find_tweets(func: allofterms(tweet, "Go GraphQL")) {
   527      tweet
   528    }
   529  }
   530  ```
   531  
   532  {{% load-img "/images/tutorials/5/u-allofterms.png" "Go Graphql" %}}
   533  
   534  We have an empty result.
   535  None of the tweets have both the terms `Go` and `GraphQL` in them.
   536  
   537  Besides `Go` and `Graph`, I'm also a big fan of `GraphQL` and `GraphDB`.
   538  
   539  Let's find out tweets that contain both the keywords `GraphQL` and `GraphDB` in them.
   540  
   541  {{% load-img "/images/tutorials/5/v-graphdb-graphql.png" "Graphdb-GraphQL" %}}
   542  
   543  We have two tweets in a result which has both the terms `GraphQL` and `GraphDB`.
   544  
   545  ```
   546  {
   547    "tweet": "Test tweet for the fifth episode of getting started series with @dgraphlabs. Wait for the video of the fourth one by @francesc the coming Wednesday!\n#GraphDB #GraphQL"
   548  },
   549  {
   550    "tweet": "So many good talks at #graphqlconf, next year I'll make sure to be *at least* in the audience!\nAlso huge thanks to the live tweeting by @dgraphlabs for alleviating the FOMO😊\n#GraphDB ♥️ #GraphQL"
   551  }
   552  ```
   553  
   554  Before we wrap up, here's the table containing the three string indices we learned about, and their compatible built-in functions.
   555  
   556  | Index | Valid query functions      |
   557  |-------|----------------------------|
   558  | hash  | eq                         |
   559  | exact | eq, lt, gt, le, ge         |
   560  | term  | eq, allofterms, anyofterms |
   561  
   562  
   563  ## Summary
   564  
   565  In this tutorial, we modeled a series of tweets and set up the exact, term, and hash indices in order to query them.
   566  
   567  Did you know that Dgraph also offers more powerful search capabilities like full-text search and regular expressions based search?
   568  
   569  In the next tutorial, we'll explore these features and learn about more powerful ways of searching for your favorite tweets!
   570  
   571  Sounds interesting?
   572  Then see you all soon in the next tutorial. Till then, happy Graphing!
   573  
   574  Check out our next tutorial of the getting started series [here]({{< relref "tutorial-6/index.md" >}}).
   575  
   576  ## Need Help
   577  
   578  * Please use [discuss.dgraph.io](https://discuss.dgraph.io) for questions, feature requests and discussions.
   579  * Please use [Github Issues](https://github.com/dgraph-io/dgraph/issues) if you encounter bugs or have feature requests.
   580  * You can also join our [Slack channel](http://slack.dgraph.io).
   581  
   582  <style>
   583    /* blockquote styling */
   584    blockquote {
   585      font-size: 1;
   586      font-style: italic;
   587      margin: 0 3rem 1rem 3rem;
   588      text-align: justify;
   589    }
   590    blockquote p:last-child, blockquote ul:last-child, blockquote ol:last-child {
   591      margin-bottom: 0;
   592    }
   593    blockquote cite {
   594      font-size: 15px;
   595      font-size: 0.9375rem;
   596      line-height: 1.5;
   597      font-style: normal;
   598      color: #555;
   599    }
   600    blockquote footer, blockquote small {
   601      font-size: 18px;
   602      font-size: 1.125rem;
   603      display: block;
   604      line-height: 1.42857143;
   605    }
   606    blockquote footer:before, blockquote small:before {
   607      content: "\2014 \00A0";
   608    }
   609  </style>