github.com/dgraph-io/dgraph@v1.2.8/wiki/content/tutorial-5/index.md (about) 1 +++ 2 title = "Get Started with Dgraph - String Indices and Modeling Tweet Graph" 3 +++ 4 5 **Welcome to the fifth tutorial of getting started with Dgraph.** 6 7 In the [previous tutorial]({{< relref "tutorial-4/index.md" >}}), we learned about using multi-language strings and operations on them using [language tags](https://www.w3schools.com/tags/ref_language_codes.asp). 8 9 In this tutorial, we'll model tweets in Dgraph and, using it, we'll learn more about string indices in Dgraph. 10 11 We'll specifically learn about: 12 13 - Modeling tweets in Dgraph. 14 - Using String indices in Dgraph 15 - Querying twitter users using the `hash` index. 16 - Comparing strings using the `exact` index. 17 - Searching for tweets based on keywords using the `term` index. 18 19 Let's start analyzing the anatomy of a real tweet and figure out how to model it in Dgraph. 20 21 The accompanying video of the tutorial will be out shortly, so stay tuned to [our YouTube channel](https://www.youtube.com/channel/UCghE41LR8nkKFlR3IFTRO4w). 22 23 ## Modeling a tweet in Dgraph 24 25 Here's a sample tweet. 26 27 {{< tweet 1194740206177402880>}} 28 29 Let's dissect the tweet above. Here are the components of the tweet: 30 31 - **The Author** 32 33 The author of the tweet is the user `@hackintoshrao`. 34 35 - **The Body** 36 37 This component is the content of the tweet. 38 39 > Test tweet for the fifth episode of getting started series with @dgraphlabs. 40 Wait for the video of the fourth one by @francesc the coming Wednesday! 41 #GraphDB #GraphQL 42 43 - **The Hashtags** 44 45 Here are the hashtags in the tweet: `#GraphQL` and `#GraphDB`. 46 47 - **The Mentions** 48 49 A tweet can mention other twitter users. 50 51 Here are the mentions in the tweet above: `@dgraphlabs` and `@francesc`. 52 53 Before we model tweets in Dgraph using these components, let's recap the design principles of a graph model: 54 55 > `Nodes` and `Edges` are the building blocks of a graph model. 56 May it be a sale, a tweet, user info, any concept or an entity is represented as a node. 57 If any two nodes are related, represent that by creating an edge between them. 58 59 With the above design principles in mind, let's go through components of a tweet and see how we could fit them into Dgraph. 60 61 **The Author** 62 63 The Author of a tweet is a twitter user. We should use a node to represent this. 64 65 **The Body** 66 67 We should represent every tweet as a node. 68 69 **The Hashtags** 70 71 It is advantageous to represent a hashtag as a node of its own. 72 It gives us better flexibility while querying. 73 74 Though you can search for hashtags from the body of a tweet, it's not efficient to do so. 75 Creating unique nodes to represent a hashtag, allows you to write performant queries like the following: _Hey Dgraph, give me all the tweets with hashtag #graphql_ 76 77 **The Mentions** 78 79 A mention represents a twitter user, and we've already modeled a user as a node. 80 Therefore, we represent a mention as an edge between a tweet and the users mentioned. 81 82 ### The Relationships 83 84 We have three types of nodes: `User`, `Tweet,` and `Hashtag`. 85 86 {{% load-img "/images/tutorials/5/a-nodes.jpg" "graph nodes" %}} 87 88 Let's look at how these nodes might be related to each other and model their relationship as an edge between them. 89 90 **The User and Tweet nodes** 91 92 There's a two-way relationship between a `Tweet` and a `User` node. 93 94 - Every tweet is authored by a user, and a user can author many tweets. 95 96 Let's name the edge representing this relationship as `authored` . 97 98 An `authored` edge points from a `User` node to a `Tweet` node. 99 100 - A tweet can mention many users, and users can be mentioned in many tweets. 101 102 Let's name the edge which represents this relationship as `mentioned`. 103 104 A `mentioned` edge points from a `Tweet` node to a `User` node. 105 These users are the ones who are mentioned in the tweet. 106 107 {{% load-img "/images/tutorials/5/a-tweet-user.jpg" "graph nodes" %}} 108 109 **The tweet and the hashtag nodes** 110 111 A tweet can have one or more hashtags. 112 Let's name the edge, which represents this relationship as `tagged_with`. 113 114 115 A `tagged_with` edge points from a `Tweet` node to a `Hashtag` node. 116 These hashtag nodes correspond to the hashtags in the tweets. 117 118 {{% load-img "/images/tutorials/5/a-tagged.jpg" "graph nodes" %}} 119 120 **The Author and hashtag nodes** 121 122 There's no direct relationship between an author and a hashtag node. 123 Hence, we don't need a direct edge between them. 124 125 Our graph model of a tweet is ready! Here's it is. 126 127 {{% load-img "/images/tutorials/5/a-graph-model.jpg" "tweet model" %}} 128 129 Here is the graph of our sample tweet. 130 131 {{% load-img "/images/tutorials/5/c-tweet-model.jpg" "tweet model" %}} 132 133 Let's add a couple of tweets to the list. 134 135 {{< tweet 1142124111650443273>}} 136 137 {{< tweet 1192822660679577602>}} 138 139 We'll be using these two tweets and the sample tweet, which we used in the beginning as our dataset. 140 Open Ratel, go to the mutate tab, paste the mutation, and click Run. 141 142 ```json 143 { 144 "set": [ 145 { 146 "user_handle": "hackintoshrao", 147 "user_name": "Karthic Rao", 148 "uid": "_:hackintoshrao", 149 "authored": [ 150 { 151 "tweet": "Test tweet for the fifth episode of getting started series with @dgraphlabs. Wait for the video of the fourth one by @francesc the coming Wednesday!\n#GraphDB #GraphQL", 152 "tagged_with": [ 153 { 154 "uid": "_:graphql", 155 "hashtag": "GraphQL" 156 }, 157 { 158 "uid": "_:graphdb", 159 "hashtag": "GraphDB" 160 } 161 ], 162 "mentioned": [ 163 { 164 "uid": "_:francesc" 165 }, 166 { 167 "uid": "_:dgraphlabs" 168 } 169 ] 170 } 171 ] 172 }, 173 { 174 "user_handle": "francesc", 175 "user_name": "Francesc Campoy", 176 "uid": "_:francesc", 177 "authored": [ 178 { 179 "tweet": "So many good talks at #graphqlconf, next year I'll make sure to be *at least* in the audience!\nAlso huge thanks to the live tweeting by @dgraphlabs for alleviating the FOMO😊\n#GraphDB ♥️ #GraphQL", 180 "tagged_with": [ 181 { 182 "uid": "_:graphql" 183 }, 184 { 185 "uid": "_:graphdb" 186 }, 187 { 188 "hashtag": "graphqlconf" 189 } 190 ], 191 "mentioned": [ 192 { 193 "uid": "_:dgraphlabs" 194 } 195 ] 196 } 197 ] 198 }, 199 { 200 "user_handle": "dgraphlabs", 201 "user_name": "Dgraph Labs", 202 "uid": "_:dgraphlabs", 203 "authored": [ 204 { 205 "tweet": "Let's Go and catch @francesc at @Gopherpalooza today, as he scans into Go source code by building its Graph in Dgraph!\nBe there, as he Goes through analyzing Go source code, using a Go program, that stores data in the GraphDB built in Go!\n#golang #GraphDB #Databases #Dgraph ", 206 "tagged_with": [ 207 { 208 "hashtag": "golang" 209 }, 210 { 211 "uid": "_:graphdb" 212 }, 213 { 214 "hashtag": "Databases" 215 }, 216 { 217 "hashtag": "Dgraph" 218 } 219 ], 220 "mentioned": [ 221 { 222 "uid": "_:francesc" 223 }, 224 { 225 "uid": "_:dgraphlabs" 226 } 227 ] 228 }, 229 { 230 "uid": "_:gopherpalooza", 231 "user_handle": "gopherpalooza", 232 "user_name": "Gopherpalooza" 233 } 234 ] 235 } 236 ] 237 } 238 ``` 239 240 _Note: If you're new to Dgraph, and yet to figure out how to run the database and use Ratel, we highly recommend reading the [first article of the series]({{< relref "tutorial-1/index.md" >}})_ 241 242 Here is the graph we built. 243 244 {{% load-img "/images/tutorials/5/x-all-tweets.png" "tweet graph" %}} 245 246 Our graph has: 247 248 - Five blue twitter user nodes. 249 - The green nodes are the tweets. 250 - The blue ones are the hashtags. 251 252 Let's start our tweet exploration by querying for the twitter users in the database. 253 254 ``` 255 { 256 tweet_graph(func: has(user_handle)) { 257 user_handle 258 } 259 } 260 ``` 261 262 {{% load-img "/images/tutorials/5/j-users.png" "tweet model" %}} 263 264 _Note: If the query syntax above looks not so familiar to you, check out the [first tutorial]({{< relref "tutorial-1/index.md" >}})._ 265 266 We have four twitter users: `@hackintoshrao`, `@francesc`, `@dgraphlabs`, and `@gopherpalooza`. 267 268 Now, let's find their tweets and hashtags too. 269 270 ```graphql 271 { 272 tweet_graph(func: has(user_handle)) { 273 user_name 274 authored { 275 tweet 276 tagged_with { 277 hashtag 278 } 279 } 280 } 281 } 282 ``` 283 284 {{% load-img "/images/tutorials/5/y-author-tweet.png" "tweet model" %}} 285 286 _Note: If the traversal query syntax in the above query is not familiar to you, [check out the third tutorial]({{< relref "tutorial-3/index.md" >}}) of the series._ 287 288 Before we start querying our graph, let's learn a bit about database indices using a simple analogy. 289 290 ### What are indices? 291 292 Indexing is a way to optimize the performance of a database by minimizing the number of disk accesses required when a query is processed. 293 294 Consider a "Book" of 600 pages, divided into 30 sections. 295 Let's say each section has a different number of pages in it. 296 297 Now, without an index page, to find a particular section that starts with the letter "F", you have no other option than scanning through the entire book. i.e: 600 pages. 298 299 But with an index page at the beginning makes it easier to access the intended information. 300 You just need to look over the index page, after finding the matching index, you can efficiently jump to the section by skipping other sections. 301 302 But remember that the index page also takes disk space! 303 Use them only when necessary. 304 305 In our next section,let's learn some interesting queries on our twitter graph. 306 307 ## String indices and querying 308 309 ### Hash index 310 311 Let's compose a query which says: _Hey Dgraph, find me the tweets of user with twitter handle equals to `hackintoshrao`._ 312 313 Before we do so, we need first to add an index has to the `user_handle` predicate. 314 We know that there are 5 types of string indices: `hash`, `exact`, `term`, `full-text`, and `trigram`. 315 316 The type of string index to be used depends on the kind of queries you want to run on the string predicate. 317 318 In this case, we want to search for a node based on the exact string value of a predicate. 319 For a use case like this one, the `hash` index is recommended. 320 321 Let's first add the `hash` index to the `user_handle` predicate. 322 323 {{% load-img "/images/tutorials/5/k-hash.png" "tweet model" %}} 324 325 Now, let's use the `eq` comparator to find all the tweets of `hackintoshrao`. 326 327 Go to the query tab, type in the query, and click Run. 328 329 ```graphql 330 { 331 tweet_graph(func: eq(user_handle, "hackintoshrao")) { 332 user_name 333 authored { 334 tweet 335 } 336 } 337 } 338 ``` 339 340 {{% load-img "/images/tutorials/5/z-exact.png" "tweet model" %}} 341 342 _Note: Refer to [the third tutorial]({{< relref "tutorial-3/index.md" >}}), if you want to know about comparator functions like `eq` in detail._ 343 344 Let's extend the last query also to fetch the hashtags and the mentions. 345 346 ```graphql 347 { 348 tweet_graph(func: eq(user_handle, "hackintoshrao")) { 349 user_name 350 authored { 351 tweet 352 tagged_with { 353 hashtag 354 } 355 mentioned { 356 user_name 357 } 358 } 359 } 360 } 361 ``` 362 363 {{% load-img "/images/tutorials/5/l-hash-query.png" "tweet model" %}} 364 365 _Note: If the traversal query syntax in the above query is not familiar to you, [check out the third tutorial]({{< relref "tutorial-3/index.md" >}}) of the series._ 366 367 Did you know that string values in Dgraph can also be compared using comparators like greater-than or less-than? 368 369 In our next section, let's see how to run the comparison functions other than `equals to (eq)` on the string predicates. 370 371 ### Exact Index 372 373 We discussed in the [third tutorial]({{< relref "tutorial-3/index.md" >}}) that there five comparator functions in Dgraph. 374 375 Here's a quick recap: 376 377 | comparator function name | Full form | 378 |--------------------------|--------------------------| 379 | eq | equals to | 380 | lt | less than | 381 | le | less than or equal to | 382 | gt | greater than | 383 | ge | greater than or equal to | 384 385 All five comparator functions can be applied to the string predicates. 386 387 We have already used the `eq` operator. 388 The other four are useful for operations, which depend on the alphabetical ordering of the strings. 389 390 Let's learn about it with a simple example. 391 392 Let's find the twitter accounts which come after `dgraphlabs` in alphabetically sorted order. 393 394 ```graphql 395 { 396 using_greater_than(func: gt(user_handle, "dgraphlabs")) { 397 user_handle 398 } 399 } 400 ``` 401 402 {{% load-img "/images/tutorials/5/n-exact-error.png" "tweet model" %}} 403 404 Oops, we have an error! 405 406 You can see from the error that the current `hash` index on the `user_handle` predicate doesn't support the `gt` function. 407 408 To be able to do string comparison operations like the one above, you need first set the `exact` index on the string predicate. 409 410 The `exact` index is the only string index that allows you to use the `ge`, `gt`, `le`, `lt` comparators on the string predicates. 411 412 Remind you that the `exact` index also allows you to use `equals to (eq)` comparator. 413 But, if you want to just use the `equals to (eq)` comparator on string predicates, using the `exact` index would be an overkill. 414 The `hash` index would be a better option, as it is, in general, much more space-efficient. 415 416 Let's see the `exact` index in action. 417 418 {{% load-img "/images/tutorials/5/o-exact-conflict.png" "set exact" %}} 419 420 We again have an error! 421 422 Though a string predicate can have more than one index, some of them are not compatible with each other. 423 One such example is the combination of the `hash` and the `exact` indices. 424 425 The `user_handle` predicate already has the `hash` index, so trying to set the `exact` index gives you an error. 426 427 Let's uncheck the `hash` index for the `user_handle` predicate, select the `exact` index, and click update. 428 429 {{% load-img "/images/tutorials/5/p-set-exact.png" "set exact" %}} 430 431 Though Dgraph allows you to change the index type of a predicate, do it only if it's necessary. 432 When the indices are changed, the data needs to be re-indexed, and this takes some computing, so it could take a bit of time. 433 While the re-indexing operation is running, all mutations will be put on hold. 434 435 Now, let's re-run the query. 436 437 {{% load-img "/images/tutorials/5/q-exact-gt.png" "tweet model" %}} 438 439 The result contains three twitter handles: `francesc`, `gopherpalooza`, and `hackintoshrao`. 440 441 In the alphabetically sorted order, these twitter handles are greater than `dgraphlabs`. 442 443 Some tweets appeal to us better than others. 444 For instance, I love `Graphs` and `Go`. 445 Hence, I would surely enjoy tweets that are related to these topics. 446 A keyword-based search is a useful way to find relevant information. 447 448 Can we search for tweets based on one or more keywords related to your interests? 449 450 Yes, we can! Let's do that in our next section. 451 452 ### The Term index 453 454 The `term` index lets you search string predicates based on one or more keywords. 455 These keywords are called terms. 456 457 To be able to search tweets with specific keywords or terms, we need to first set the `term` index on the tweets. 458 459 Adding the `term` index is similar to adding any other string index. 460 461 {{% load-img "/images/tutorials/5/r-term-set.png" "term set" %}} 462 463 Dgraph provides two built-in functions specifically to search for terms: `allofterms` and `anyofterms`. 464 465 Apart from these two functions, the `term` index only supports the `eq` comparator. 466 This means any other query functions (like eq, lt, gt...) fails when run on string predicates with the `term` index. 467 468 We'll soon take a look at the table containing the string indices and their supporting query functions. 469 But first, let's learn how to use `anyofterms` and `allofterms` query functions. 470 Let's write a query to find all tweets with terms or keywords `Go` or `Graph` in them. 471 472 Go the query tab, paste the query, and click Run. 473 474 ```graphql 475 { 476 find_tweets(func: anyofterms(tweet, "Go Graph")) { 477 tweet 478 } 479 } 480 ``` 481 482 Here's the matched tweet from the query response: 483 484 ```json 485 { 486 "tweet": "Let's Go and catch @francesc at @Gopherpalooza today, as he scans into Go source code by building its Graph in Dgraph!\nBe there, as he Goes through analyzing Go source code, using a Go program, that stores data in the GraphDB built in Go!\n#golang #GraphDB #Databases #Dgraph " 487 } 488 ``` 489 490 {{% load-img "/images/tutorials/5/s-go-graph.png" "go graph set" %}} 491 492 _Note: Check out [the first tutorial]({{< relref "tutorial-1/index.md" >}}) if the query syntax, in general, is not familiar to you_ 493 494 The `anyofterms` function returns tweets which have either of `Go` or `Graph` keyword. 495 496 In this case, we've used only two terms to search for (`Go` and `Graph`), but you can extend for any number of terms to be searched or matched. 497 498 The result has one of the three tweets in the database. 499 The other two tweets don't make it to the result since they don't have either of the terms `Go` or `Graph`. 500 501 It's also important to notice that the term search functions (`anyofterms` and `allofterms`) are insensitive to case and special characters. 502 503 This means, if you search for the term `GraphQL`, the query returns a positive match for all of the following terms found in the tweets: `graphql`, `graphQL`, `#graphql`, `#GraphQL`. 504 505 Now, let's find tweets that have either of the terms `Go` or `GraphQL` in them. 506 507 508 ```graphql 509 { 510 find_tweets(func: anyofterms(tweet, "Go GraphQL")) { 511 tweet 512 } 513 } 514 ``` 515 516 {{% load-img "/images/tutorials/5/t-go-graphql-all.png" "Go Graphql" %}} 517 518 Oh wow, we have all the three tweets in the result. 519 This means, all of the three tweets have either of the terms `Go` or `GraphQL`. 520 521 Now, how about finding tweets that contain both the terms `Go` and `GraphQL` in them. 522 We can do it by using the `allofterms` function. 523 524 ```graphql 525 { 526 find_tweets(func: allofterms(tweet, "Go GraphQL")) { 527 tweet 528 } 529 } 530 ``` 531 532 {{% load-img "/images/tutorials/5/u-allofterms.png" "Go Graphql" %}} 533 534 We have an empty result. 535 None of the tweets have both the terms `Go` and `GraphQL` in them. 536 537 Besides `Go` and `Graph`, I'm also a big fan of `GraphQL` and `GraphDB`. 538 539 Let's find out tweets that contain both the keywords `GraphQL` and `GraphDB` in them. 540 541 {{% load-img "/images/tutorials/5/v-graphdb-graphql.png" "Graphdb-GraphQL" %}} 542 543 We have two tweets in a result which has both the terms `GraphQL` and `GraphDB`. 544 545 ``` 546 { 547 "tweet": "Test tweet for the fifth episode of getting started series with @dgraphlabs. Wait for the video of the fourth one by @francesc the coming Wednesday!\n#GraphDB #GraphQL" 548 }, 549 { 550 "tweet": "So many good talks at #graphqlconf, next year I'll make sure to be *at least* in the audience!\nAlso huge thanks to the live tweeting by @dgraphlabs for alleviating the FOMO😊\n#GraphDB ♥️ #GraphQL" 551 } 552 ``` 553 554 Before we wrap up, here's the table containing the three string indices we learned about, and their compatible built-in functions. 555 556 | Index | Valid query functions | 557 |-------|----------------------------| 558 | hash | eq | 559 | exact | eq, lt, gt, le, ge | 560 | term | eq, allofterms, anyofterms | 561 562 563 ## Summary 564 565 In this tutorial, we modeled a series of tweets and set up the exact, term, and hash indices in order to query them. 566 567 Did you know that Dgraph also offers more powerful search capabilities like full-text search and regular expressions based search? 568 569 In the next tutorial, we'll explore these features and learn about more powerful ways of searching for your favorite tweets! 570 571 Sounds interesting? 572 Then see you all soon in the next tutorial. Till then, happy Graphing! 573 574 Check out our next tutorial of the getting started series [here]({{< relref "tutorial-6/index.md" >}}). 575 576 ## Need Help 577 578 * Please use [discuss.dgraph.io](https://discuss.dgraph.io) for questions, feature requests and discussions. 579 * Please use [Github Issues](https://github.com/dgraph-io/dgraph/issues) if you encounter bugs or have feature requests. 580 * You can also join our [Slack channel](http://slack.dgraph.io). 581 582 <style> 583 /* blockquote styling */ 584 blockquote { 585 font-size: 1; 586 font-style: italic; 587 margin: 0 3rem 1rem 3rem; 588 text-align: justify; 589 } 590 blockquote p:last-child, blockquote ul:last-child, blockquote ol:last-child { 591 margin-bottom: 0; 592 } 593 blockquote cite { 594 font-size: 15px; 595 font-size: 0.9375rem; 596 line-height: 1.5; 597 font-style: normal; 598 color: #555; 599 } 600 blockquote footer, blockquote small { 601 font-size: 18px; 602 font-size: 1.125rem; 603 display: block; 604 line-height: 1.42857143; 605 } 606 blockquote footer:before, blockquote small:before { 607 content: "\2014 \00A0"; 608 } 609 </style>