kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/docs/schema/writing-an-indexer.txt (about) 1 // Copyright 2016 The Kythe Authors. All rights reserved. 2 // 3 // Licensed under the Apache License, Version 2.0 (the "License"); 4 // you may not use this file except in compliance with the License. 5 // You may obtain a copy of the License at 6 // 7 // http://www.apache.org/licenses/LICENSE-2.0 8 // 9 // Unless required by applicable law or agreed to in writing, software 10 // distributed under the License is distributed on an "AS IS" BASIS, 11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 // See the License for the specific language governing permissions and 13 // limitations under the License. 14 15 Writing a New Indexer 16 ===================== 17 :Revision: 1.0 18 :toc2: 19 :toclevels: 3 20 :priority: 999 21 22 This document is an overview of the steps to take to add support for a new 23 language to Kythe. We assume that you have the 24 link:https://github.com/kythe/kythe/releases[Kythe release package] extracted 25 to `/opt/kythe`. You can also build the tools from source (but it is not 26 necessary to build Kythe to provide it with graph data). Sample code snippets 27 are written in JavaScript, but this document is not about indexing any 28 particular language. 29 30 In the Kythe pipeline, a language's *indexer* is responsible for building a 31 subgraph that represents a particular program. Complete indexers usually accept 32 link:https://www.kythe.io/docs/kythe-kzip.html[`.kzip`] files that contain a 33 program, all of its dependencies, and the arguments necessary for a compiler or 34 interpreter to understand it. This data is packaged by a separate component 35 called an *extractor*. Depending on the language and build system involved, it 36 may be possible to use a generic extractor to produce these hermetic 37 compilation units. We will not address extraction here. 38 39 For development and testing, it's useful for the indexer to accept program text 40 directly as input; this is how we will proceed in these instructions. First, 41 we'll begin by writing some scripts to insert file content into a small Kythe 42 graph. From there, we'll see how to encode Kythe nodes and edges into *entries*, 43 the unit of exchange between many of our tools. We'll see that certain kinds 44 of nodes are used to represent common sorts of semantic objects in programming 45 languages and that other nodes are used to represent syntactic spans of text. 46 We will add relationships as edges between these nodes to add cross-reference 47 data to the graph. This allows users to jump between definitions and references 48 in programs we've indexed. Finally, we'll discuss how to write tests for (and 49 how to debug) Kythe indexers. 50 51 == Bootstrapping Kythe support 52 53 Kythe indexers emit directed graph data as a stream of *entries* that can 54 represent either nodes or edges. These have various encodings, but for 55 simplicity we'll use JSON. To get started, let's write a script `kythe-browse.sh` 56 that will turn a stream of JSON-formatted Kythe entries into a format that our 57 example code browser can read. Put it in your Kythe root; it will clobber the 58 directories `//graphstore` and `//tables`. 59 60 [source,bash] 61 ---- 62 #!/bin/bash -e 63 set -o pipefail 64 BROWSE_PORT="${BROWSE_PORT:-8080}" 65 # You can find prebuilt binaries at https://github.com/kythe/kythe/releases. 66 # This script assumes that they are installed to /opt/kythe. 67 # If you build the tools yourself or install them to a different location, 68 # make sure to pass the correct public_resources directory to http_server. 69 rm -f -- graphstore/* tables/* 70 mkdir -p graphstore tables 71 # Read JSON entries from standard in to a graphstore. 72 /opt/kythe/tools/entrystream --read_json \ 73 | /opt/kythe/tools/write_entries -graphstore graphstore 74 # Convert the graphstore to serving tables. 75 /opt/kythe/tools/write_tables -graphstore graphstore -out=tables 76 # Host the browser UI. 77 /opt/kythe/tools/http_server -serving_table tables \ 78 -listen="localhost:${BROWSE_PORT}" # ":${BROWSE_PORT}" allows access from other machines 79 ---- 80 81 TIP: The 82 link:https://github.com/kythe/kythe/blob/master/kythe/proto/storage.proto[protocol buffer] 83 encoding of Kythe facts is more efficient than the JSON encoding we're using 84 here. Kythe supports JSON because some languages do not have good support for 85 protocol buffers. This only comes into play for languages that emit a large 86 amount of data, like $$C++$$. The `entrystream` tool used in `kythe-browse.sh` 87 is invoked to read a stream of JSON entries from standard input and emit a 88 `varint32`-delimited stream of `kythe.proto.Entry` messages on standard output. 89 90 You can test this with a very short entry stream. The only tricky part here is 91 that Kythe fact values, when serialized to JSON, are base64-encoded. This 92 ensures that they can be properly deserialized later, since fact values may 93 contain arbitrary binary data, but JSON strings permit only UTF-8 characters. 94 `ZmlsZQ==` is `file` and `SGVsbG8sIHdvcmxkIQ==` is `Hello, world!`. 95 96 [source,bash] 97 ---- 98 echo ' 99 {"source":{"corpus":"example","path":"hello"}, 100 "fact_name":"/kythe/node/kind","fact_value":"ZmlsZQ=="} 101 {"source":{"corpus":"example","path":"hello"}, 102 "fact_name":"/kythe/text","fact_value":"SGVsbG8sIHdvcmxkIQ=="} 103 ' | ./kythe-browse.sh 104 ---- 105 106 You can check that http://localhost:8080/#hello?corpus=example shows 107 `Hello, world!'. 108 109 == Modeling Kythe entries 110 111 A Kythe graph can be encoded using two basic data types. The first, called a 112 http://www.kythe.io/docs/kythe-storage.html#_a_id_termvname_a_vector_name_strong_vname_strong[VName], 113 uniquely picks out a node in the graph. VNames have five string-valued fields. 114 http://www.kythe.io/docs/kythe-storage.html#_entry[Entries] 115 record both facts about individual nodes and edges between them. As described in 116 the documentation, we only need to emit the forward versions of edges (those 117 that are described in the http://www.kythe.io/docs/schema[schema]); the Kythe 118 pipeline takes care of generating reverse edges as needed for efficiency. 119 120 We'll encode VNames and entries in a straightforward way; in particular, we 121 represent entries as objects, where the target's presence or absence determines 122 if the entry represents an edge between nodes or a fact about a single node 123 (respectively). Our `fact` and `edge` convenience functions also assume that all 124 of the fact and edge names we'll use are underneath the `/kythe` prefix, since 125 we're following the Kythe schema. This prefix is a requirement of the schema, 126 not of the data model. 127 128 [source,js] 129 ---- 130 function vname(signature, path, language, root, corpus) { 131 return { 132 signature: signature, 133 path: path, 134 language: language, 135 root: root, 136 corpus: corpus, 137 }; 138 } 139 function fact(node, fact_name, fact_val) { 140 return { 141 source: node, 142 fact_name: "/kythe/" + fact_name, 143 fact_value: base64enc(fact_val), 144 }; 145 } 146 function edge(source, edge_name, target) { 147 return { 148 source: source, 149 edge_kind: "/kythe/edge/" + edge_name, 150 target: target, 151 fact_name: "/", 152 }; 153 } 154 function ordinal_edge(source, edge_name, target, ordinal) { 155 return { 156 source: source, 157 edge_kind: "/kythe/edge/" + edge_name + "." + ordinal, 158 target: target, 159 fact_name: "/", 160 }; 161 } 162 ---- 163 164 You can follow along at home with link:https://nodejs.org[node.js] and the 165 following definitions: 166 167 [source,js] 168 ---- 169 function base64enc(string) { 170 return new Buffer(string).toString('base64'); 171 } 172 function emitEntries(entries) { 173 entries.forEach(function(v){console.log(JSON.stringify(v))}); 174 } 175 ---- 176 177 With this representation, our example database becomes: 178 179 [source,js] 180 ---- 181 [ 182 fact(vname("", "hello", "", "", "example"), "node/kind", "file"), 183 fact(vname("", "hello", "", "", "example"), "text", "Hello, world!") 184 ] 185 ---- 186 187 VNames have an alternate 188 link:http://www.kythe.io/docs/kythe-uri-spec.html[URI-style encoding]. VNames 189 encoded in this way are called *tickets*; tickets and VNames are semantically 190 interchangeable. This encoding is used where it is inconvenient or not possible 191 to store VNames in a more structured format. You can use Kythe URIs when 192 interacting with the 193 link:http://www.kythe.io/docs/kythes-command-line-tool.html[Kythe command-line tool]: 194 195 [source,bash] 196 ---- 197 /opt/kythe/tools/kythe -api './tables' nodes 'kythe://example?path=hello' 198 ---- 199 200 .Output 201 ---- 202 kythe://example?path=hello 203 /kythe/node/kind file 204 /kythe/text Hello, world! 205 ---- 206 207 `kythe://example?path=hello` is the URI encoding of the VName used in the 208 example graph above. 209 210 == File content 211 212 Kythe stores file content in its graph. The `http_server` binary used in our 213 `kythe-browse.sh` script doesn't look in your filesystem for a file to present 214 to the Web browser; it instead reads the `text` fact off of a graph node. 215 216 Since every node in the graph has a VName, we'll need to be able to build one 217 for any source file your indexer might refer to. In our small example above, 218 our test file had the path `hello` in the corpus `example`. It is up to you 219 how to determine the corpus (and possibly root) to which a node belongs. It is 220 best to keep this configurable; other Kythe indexers use a `vnames.json` file 221 to choose the VName fields based on regular expressions over paths. 222 223 All Kythe graph nodes should have a `node/kind` fact. For files, this kind is 224 `file`. This means that each file should have at least two associated facts. 225 You can see the JSON representation of the resulting entries above, where we 226 used them to test the `kythe-browse.sh` script. 227 228 NOTE: The Kythe JSON representation requires fact values to be base64-encoded. 229 The protocol buffer representation does not, but it does store fact values as 230 `bytes` instead of the `string` type. The protocol buffer `string` type must be 231 valid UTF-8 and not all files in a graph may be UTF-8 encoded (though it is the 232 default). Alternate encodings may be specified using the `encoding` fact. 233 234 == Cross-references 235 236 Imagine we have the following simple program: 237 238 [source,lua] 239 ---- 240 var foo = 1 241 print foo 242 ---- 243 244 We want to record the relationship between the reference to `foo` on the second 245 line and its definition on the first line. First, we should build a 246 representation for the variable `foo` itself. To summon a node into existence, 247 we need a VName and a node kind. The schema already defines a node kind 248 for http://www.kythe.io/docs/schema#variable[variables]. If there is no existing 249 way to model `foo` in the schema, you're free to invent one of your own; the 250 schema is intended to be open-ended. Be aware that tools that consume Kythe data 251 may not be able to offer as much help with custom kinds, but should always be 252 tolerant of them. 253 254 We've already seen that VNames for http://www.kythe.io/docs/schema#file[files] 255 contain *path*, *root*, and *corpus* components. (In fact, the schema requires 256 that the other components of a file VName be empty.) We need to come up with 257 assignments to these, plus *signature* and *language*, that uniquely refer to 258 our variable `foo`. Getting this right can be subtle. Here are some guidelines: 259 260 * Indexing the same compilation unit twice should always produce the same data. 261 * VNames for objects that are accessible from multiple compilation units must 262 be generated consistently. For example, if a module defines a public variable 263 `Bar`, then `Bar`'s VName must be the same in all of the modules that use it. 264 * VNames should not be over-specific. For example, if your language has a 265 builtin `string` type, you should only have a single `VName` for that type 266 (which is probably of the http://www.kythe.io/docs/schema#tbuiltin[tbuiltin] 267 kind). Structural types should also have single representations; if your 268 language also has a builtin pair type, there should only be a single `VName` 269 for `pair<string,string>` (that's probably a 270 http://www.kythe.io/docs/schema#tapp[tapp]). 271 * Where possible, VNames should be generated without reference to source 272 locations. This makes debugging your indexer easier and decreases the number 273 of spurious changes to the graph when source text is modified. 274 * Take caution that your *signature* fields aren't too long. In the $$C++$$ 275 indexer, signatures that are past a certain length are replaced with their 276 hashes. This has significant implications for the size of your graph and the 277 I/O cost of your tools. 278 * The *language* component of a VName should be set to a well-known value. 279 Java is `java`; $$C++$$ is `c++`; and so on. We'll use `ex` as our language. 280 * Avoid duplicating information that's elsewhere in the VName, like the corpus 281 label, language label, or path (in cases where a path is appropriate). 282 283 We'll use `foo`'s defining file's preset components, the *language* `ex`, and 284 the *signature* `foo#0` (to mean "the zeroth binding of foo at global scope"). 285 Using the functions we've defined above, we emit the following entry: 286 287 [source,js] 288 ---- 289 // sig path lang root corpus 290 fact(vname("foo#0", "hello", "ex", "", "example"), "node/kind", "variable"); 291 ---- 292 293 We can see it in the graph with the `kythe` tool (after running 294 `kythe-browse.sh` to generate `./tables`): 295 296 [source,bash] 297 ---- 298 /opt/kythe/tools/kythe -api './tables' \ 299 nodes 'kythe://example?path=hello?lang=ex#foo#0' 300 ---- 301 302 .Output 303 ---- 304 kythe://example?lang=ex?path=hello#foo%230 305 /kythe/node/kind variable 306 ---- 307 308 Notice how the `#` was URI-encoded in the ticket. 309 310 === Specifying spans of text 311 312 Spans of text in Kythe are represented by 313 http://www.kythe.io/docs/schema#anchor[anchor] nodes. Anchors may overlap. 314 If an anchor exactly overlaps another anchor (e.g., it shares the same start 315 and end offsets), it is conventional (but not required) that they share a VName. 316 Contrary to the general advice for generating VNames, an anchor's VName *should* 317 be based on its location in a source file. 318 319 Besides the required `node/kind` fact, anchors should have `loc/start` 320 and `loc/end` facts that give their (inclusive) start and (exclusive) end 321 location offsets as base-10 stringified integers. 322 323 NOTE: In Kythe, offsets are always in units of bytes. If your programming 324 language specifies locations of syntactic objects in lines and columns or 325 codepoints, you will need to transform these to byte offsets. 326 327 [source,js] 328 ---- 329 function anchorVName(file_vname, begin, end) { 330 return vname("@" + begin + ":" + end, file_vname.path, "ex", file_vname.root, 331 file_vname.corpus); 332 } 333 function anchor(file_vname, begin, end) { 334 var anchor_vname = anchorVName(file_vname, begin, end); 335 return [ 336 fact(anchor_vname, "node/kind", "anchor"), 337 fact(anchor_vname, "loc/start", begin.toString()), 338 fact(anchor_vname, "loc/end", end.toString()), 339 ]; 340 } 341 ---- 342 343 The anchor covering the definition of `foo` in our example file, assuming the 344 file has the same `VName` as the earlier file, is represented by these three 345 facts: 346 347 [source,js] 348 ---- 349 {"source":{"signature":"@4:7","path":"hello","language":"ex","corpus":"example"}, 350 "edge_name":"/","fact_name":"/kythe/node/kind","fact_value":"YW5jaG9y"} 351 {"source":{"signature":"@4:7","path":"hello","language":"ex","corpus":"example"}, 352 "edge_name":"/","fact_name":"/kythe/loc/start","fact_value":"NA=="} 353 {"source":{"signature":"@4:7","path":"hello","language":"ex","corpus":"example"}, 354 "edge_name":"/","fact_name":"/kythe/loc/end","fact_value":"Nw=="} 355 ---- 356 357 === Linking anchors to semantic nodes 358 359 We can now link the definition and reference sites of `foo` back to the node 360 we created for the variable. To do so, we'll add a 361 link:http://www.kythe.io/docs/schema/#definesbinding[defines/binding] edge 362 from the definition site and a link:http://www.kythe.io/docs/schema/#ref[ref] 363 edge from the use site: 364 365 [source,js] 366 ---- 367 edge(foo_def_anchor_vname, "defines/binding", foo_vname), 368 edge(foo_ref_anchor_vname, "ref", foo_vname) 369 ---- 370 371 Our full database, specified using the previously-defined functions, looks 372 like: 373 374 [source,js] 375 ---- 376 var hello_file_vname = vname("", "hello", "", "", "example"); 377 var foo_vname = vname("foo#0", "hello", "ex", "", "example"); 378 var foo_def_anchor_vname = anchorVName(hello_file_vname, 4, 7); 379 var foo_ref_anchor_vname = anchorVName(hello_file_vname, 18, 21); 380 var entries = [ 381 fact(hello_file_vname, "node/kind", "file"), 382 fact(hello_file_vname, "text", "var foo = 1\nprint foo"), 383 fact(foo_vname, "node/kind", "variable"), 384 edge(foo_def_anchor_vname, "defines/binding", foo_vname), 385 edge(foo_ref_anchor_vname, "ref", foo_vname) 386 ].concat(anchor(hello_file_vname, 4, 7)) 387 .concat(anchor(hello_file_vname, 18, 21)); 388 ---- 389 390 NOTE: For pedagogical reasons, we're building our graph up as a big array 391 of entries. In practice, this is a bad idea; graphs can become very large, 392 and buffering all your data up to release it at the same time prevents 393 downstream consumers from working in parallel (even if you're just writing 394 to disk). Indexers should emit graph data as soon as practical (and should also 395 endeavor to avoid emitting duplicate data). 396 397 You can test it using `kythe-browse.sh` and by querying the `kythe` tool for 398 file decorations: 399 400 [source,bash] 401 ---- 402 /opt/kythe/tools/kythe -api './tables' decor 'kythe://example?path=hello' 403 ---- 404 405 .Output 406 ---- 407 /kythe/edge/defines/binding 1:4-1:7 variable kythe://example?lang=ex?path=hello#foo%230 408 /kythe/edge/ref 2:6-2:9 variable kythe://example?lang=ex?path=hello#foo%230 409 ---- 410 411 == Testing 412 413 Most of the work in testing a tool that produces Kythe data boils down to 414 checking that different anchors in example source text are linked to the correct 415 nodes and edges. From this starting point, you can make sure that other parts 416 of the semantic graph are properly formed. 417 418 Given a description of these anchors and their desired relationships, 419 performing the necessary checks doesn't require any information specific to the 420 language being analyzed. With this in mind, we built the 421 http://www.kythe.io/docs/kythe-verifier.html[Kythe verifier]. The verifier 422 accepts a stream of Kythe entries and source files, the latter of which have 423 been annotated with *goals*. Each goal describes entries that the verifier must 424 (or must not) find in its input stream. Since some parts of these entries are 425 uninteresting to test--for example, the exact encoding used for a anchor's VName 426 is unimportant--parts of a goal may be replaced with variables for which the 427 verifier will try to find an assignment. 428 429 Just as we were able to drive the Kythe pipeline with only a list of JSON 430 entries, so too can we drive the verifier with only those entries and a list 431 of goals. This script, `kythe-verify-json.sh`, reads JSON entries from 432 standard in and passes them (and its arguments) to the verifier: 433 434 [source,bash] 435 ---- 436 #!/bin/bash -e 437 set -o pipefail 438 # You can find prebuilt binaries at https://github.com/kythe/kythe/releases. 439 # This script assumes that they are installed to /opt/kythe. 440 # Read JSON entries from standard in and pass them to the verifier. 441 # The entrystream tool turns the JSON into length-delimited protocol buffers, 442 # described at http://godoc.org/kythe.io/kythe/go/platform/delimited 443 /opt/kythe/tools/entrystream --read_format=json | \ 444 /opt/kythe/tools/verifier --nofile_vnames "$@" 445 ---- 446 447 We can write a rule file that checks whether we have any file nodes at all and 448 call it `test.goals`: 449 450 [source,c] 451 ---- 452 //- FileNode?.node/kind file 453 ---- 454 455 The `//-` prefix tells the verifier which lines to look for goals on. It's meant 456 to be ignored as a comment by most languages. Of course, some languages (like 457 Python) use different character sequences to denote comments, so it can be 458 changed with a command-line flag. 459 460 We ask the verifier to check that the goals can be met with the entries we wrote 461 out earlier in this document: 462 463 [source,bash] 464 ---- 465 echo ' 466 {"source":{"corpus":"example","path":"hello"}, 467 "fact_name":"/kythe/node/kind","fact_value":"ZmlsZQ=="} 468 {"source":{"corpus":"example","path":"hello"}, 469 "fact_name":"/kythe/text","fact_value":"SGVsbG8sIHdvcmxkIQ=="} 470 ' | ./kythe-verify-json.sh test.goals 471 ---- 472 473 Since we do have a node with kind `file`, the verifier exits with a zero error 474 code without printing any diagnostics. 475 476 If we had written an unsatisfiable goal--let's say we made a spelling mistake 477 and asked for a node with kind `elif` instead: 478 479 [source,c] 480 ---- 481 //- FileNode?.node/kind elif 482 ---- 483 484 the verifier will protest (and return a nonzero exit code): 485 486 .Output 487 ---- 488 Could not verify all goals. The furthest we reached was: 489 test.goals:2:5-2:28 FileNode.node/kind elif 490 ---- 491 492 If your graph is small, it can be useful to display it graphically: 493 494 [source,bash] 495 ---- 496 echo ' 497 {"source":{"corpus":"example","path":"hello"}, 498 "fact_name":"/kythe/node/kind","fact_value":"ZmlsZQ=="} 499 {"source":{"corpus":"example","path":"hello"}, 500 "fact_name":"/kythe/text","fact_value":"SGVsbG8sIHdvcmxkIQ=="} 501 ' | ./kythe-verify-json.sh -annotated_graphviz test.goals | xdot 502 ---- 503 504 This graph will render in https://github.com/jrfonseca/xdot.py[xdot.py] as 505 something like: 506 507 [kythe,dot,"one file node",0] 508 -------------------------------------------------------------------------------- 509 digraph G { 510 "App(vname, (\"\", example, \"\", hello, \"\"))" [ label=<<TABLE><TR><TD COLSPAN="2">("", example, "", hello, "") = FileNode</TD></TR><TR><TD>/kythe/node/kind</TD><TD>file</TD></TR><TR><TD>/kythe/text</TD><TD>...</TD></TR></TABLE>> shape=plaintext color=blue ]; 511 } 512 -------------------------------------------------------------------------------- 513 514 There's only one node in this graph, but it's the file node that we asked the 515 verifier to find; notice how it is outlined in blue. The verifier applies this 516 highlighting to nodes that are matched against variables in the goals 517 (here, `FileNode`). 518 519 === Testing for variable definitions and references 520 521 Most of the time, verifier rules are written down in the same file that the 522 rules are meant to check. For example, we can rewrite our example program 523 in the following way: 524 525 [source,lua] 526 ---- 527 --! @foo defines/binding VarFoo 528 --! VarFoo.node/kind variable 529 var foo = 1 530 --! @foo ref VarFoo 531 print foo 532 ---- 533 534 Let's start with the second line. To satisfy this goal, the verifier must find 535 the VName of a node with a `node/kind` fact with the value `variable`. It will 536 then use that VName wherever the variable `VarFoo` appears. `VarFoo` is 537 interpreted as a variable because it begins with a capital letter. 538 539 To satisfy the goal on the first line, the verifier must find two VNames: 540 one to substitute for `VarFoo` (the same `VarFoo` as previously discussed) 541 and one to use as the VName of the anchor spanning `foo` on the next 542 line of code. The `@foo` token generates a new VName variable and constrains 543 it to refer to an *anchor* node with the offsets of `foo`. Any additional 544 constraints on `@foo` act as constraints on that variable. In order for this 545 first goal to succeed, then, the verifier must find an anchor spanning the 546 text `foo` that is the source of a `defines/binding` edge with some other 547 node (with VName `VarFoo`) as a target. 548 549 Similarly, to satisfy the goal on the fourth line, the verifier must find 550 a `ref` edge starting at an anchor covering `foo` on the next line of code 551 and ending at a node with VName `VarFoo`. 552 553 NOTE: `@foo` does not refer to the same variable as the `@foo` on the first 554 line. Each `@` token creates a new anonymous variable. 555 556 The *full* problem that the verifier must solve is the *conjunction* of all 557 of these goals. If it chooses a VName to use for `VarFoo` that works for the 558 first goal but not the third, the verifier will backtrack and try a different 559 assignment. The test succeeds if there is an assignment that satisfies all 560 the goals. When our first example failed, the verifier couldn't find any 561 assignment to `FileNode` that would satisfy `FileNode.node/kind elif`. 562 563 Assuming we update the offsets in our output to reflect the comments 564 (these are now `(66, 69)` and `(100, 103)`), we can now check our code: 565 566 [source,bash] 567 ---- 568 ./kythe-verify-json.sh --goal_prefix="--!" test.program < test.program.json 569 ---- 570 571 We can also dump our graph: 572 573 [source,bash] 574 ---- 575 ./kythe-verify-json.sh --goal_prefix="--!" --annotated_graphviz \ 576 test.program < test.program.json | xdot 577 ---- 578 579 This results in the following: 580 581 [kythe,dot,"tiny program",0] 582 ---- 583 digraph G { 584 "App(vname, (\"\", example, \"\", hello, \"\"))" [ label=<<TABLE><TR><TD COLSPAN="2">("", example, "", hello, "")</TD></TR><TR><TD>/kythe/node/kind</TD><TD>file</TD></TR><TR><TD>/kythe/text</TD><TD>...</TD></TR></TABLE>> shape=plaintext ]; 585 "App(vname, (foo#0, example, \"\", hello, ex))" [ label=<<TABLE><TR><TD COLSPAN="2">(foo#0, example, "", hello, ex) = VarFoo</TD></TR><TR><TD>/kythe/node/kind</TD><TD>variable</TD></TR></TABLE>> shape=plaintext color=blue ]; 586 "App(vname, (@66:69, example, \"\", hello, ex))" [ shape=circle, label="@foo:1.4", color="blue" ]; 587 "App(vname, (@66:69, example, \"\", hello, ex))" -> "App(vname, (foo#0, example, \"\", hello, ex))" [ label="/kythe/edge/defines/binding" ]; 588 "App(vname, (@100:103, example, \"\", hello, ex))" [ shape=circle, label="@foo:4.6", color="blue" ]; 589 "App(vname, (@100:103, example, \"\", hello, ex))" -> "App(vname, (foo#0, example, \"\", hello, ex))" [ label="/kythe/edge/ref" ]; 590 } 591 ---- 592 593 As before, the nodes we've matched are colored blue. In these diagrams, anchors 594 are presented as circles with `@` labels (unless they are matched to verifier 595 variables, in which case more information is provided). The vast majority of 596 the time, you will not be interested in seeing file offsets in these diagrams. 597 You can still test for facts on `@`-specified nodes as you would any other 598 node. 599 600 For more examples of the goal language, take a look at the code listings in the 601 http://www.kythe.io/docs/schema[schema document]. There are also lots more in 602 the $$C++$$ indexer's 603 https://kythe.io/repo/kythe/cxx/indexer/cxx/testdata[testdata] and the Java 604 indexer's 605 https://kythe.io/repo/kythe/javatests/com/google/devtools/kythe/analyzers/java/testdata[testdata] 606 directories. Finally, there is a 607 http://www.kythe.io/docs/schema/verifierstyle.html[style guide] with helpful 608 tips. 609 610 Note how the verifier goals don't mention any of the internal implementation 611 decisions we've made about the VNames of anchors or variables. This means that 612 if we later choose to change those aspects of our implementation, the verifier 613 tests will not break. Also note that we didn't check for details about the 614 file itself (as we did in the first example). Tests using the Kythe verifier 615 rarely examine *all* of an indexer's output, just the subgraph that is relevant 616 for a particular feature. This makes the tests easier to read and guards against 617 tests becoming sensitive to new features.