kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/docs/schema/writing-an-indexer.txt

kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/docs/schema/writing-an-indexer.txt (about)

     1  // Copyright 2016 The Kythe Authors. All rights reserved.
     2  //
     3  // Licensed under the Apache License, Version 2.0 (the "License");
     4  // you may not use this file except in compliance with the License.
     5  // You may obtain a copy of the License at
     6  //
     7  //   http://www.apache.org/licenses/LICENSE-2.0
     8  //
     9  // Unless required by applicable law or agreed to in writing, software
    10  // distributed under the License is distributed on an "AS IS" BASIS,
    11  // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    12  // See the License for the specific language governing permissions and
    13  // limitations under the License.
    14  
    15  Writing a New Indexer
    16  =====================
    17  :Revision: 1.0
    18  :toc2:
    19  :toclevels: 3
    20  :priority: 999
    21  
    22  This document is an overview of the steps to take to add support for a new
    23  language to Kythe. We assume that you have the
    24  link:https://github.com/kythe/kythe/releases[Kythe release package] extracted
    25  to `/opt/kythe`. You can also build the tools from source (but it is not
    26  necessary to build Kythe to provide it with graph data). Sample code snippets
    27  are written in JavaScript, but this document is not about indexing any
    28  particular language.
    29  
    30  In the Kythe pipeline, a language's *indexer* is responsible for building a
    31  subgraph that represents a particular program. Complete indexers usually accept
    32  link:https://www.kythe.io/docs/kythe-kzip.html[`.kzip`] files that contain a
    33  program, all of its dependencies, and the arguments necessary for a compiler or
    34  interpreter to understand it. This data is packaged by a separate component
    35  called an *extractor*. Depending on the language and build system involved, it
    36  may be possible to use a generic extractor to produce these hermetic
    37  compilation units. We will not address extraction here.
    38  
    39  For development and testing, it's useful for the indexer to accept program text
    40  directly as input; this is how we will proceed in these instructions. First,
    41  we'll begin by writing some scripts to insert file content into a small Kythe
    42  graph. From there, we'll see how to encode Kythe nodes and edges into *entries*,
    43  the unit of exchange between many of our tools. We'll see that certain kinds
    44  of nodes are used to represent common sorts of semantic objects in programming
    45  languages and that other nodes are used to represent syntactic spans of text.
    46  We will add relationships as edges between these nodes to add cross-reference
    47  data to the graph. This allows users to jump between definitions and references
    48  in programs we've indexed. Finally, we'll discuss how to write tests for (and
    49  how to debug) Kythe indexers.
    50  
    51  == Bootstrapping Kythe support
    52  
    53  Kythe indexers emit directed graph data as a stream of *entries* that can
    54  represent either nodes or edges. These have various encodings, but for
    55  simplicity we'll use JSON. To get started, let's write a script `kythe-browse.sh`
    56  that will turn a stream of JSON-formatted Kythe entries into a format that our
    57  example code browser can read. Put it in your Kythe root; it will clobber the
    58  directories `//graphstore` and `//tables`.
    59  
    60  [source,bash]
    61  ----
    62  #!/bin/bash -e
    63  set -o pipefail
    64  BROWSE_PORT="${BROWSE_PORT:-8080}"
    65  # You can find prebuilt binaries at https://github.com/kythe/kythe/releases.
    66  # This script assumes that they are installed to /opt/kythe.
    67  # If you build the tools yourself or install them to a different location,
    68  # make sure to pass the correct public_resources directory to http_server.
    69  rm -f -- graphstore/* tables/*
    70  mkdir -p graphstore tables
    71  # Read JSON entries from standard in to a graphstore.
    72  /opt/kythe/tools/entrystream --read_json \
    73    | /opt/kythe/tools/write_entries -graphstore graphstore
    74  # Convert the graphstore to serving tables.
    75  /opt/kythe/tools/write_tables -graphstore graphstore -out=tables
    76  # Host the browser UI.
    77  /opt/kythe/tools/http_server -serving_table tables \
    78    -listen="localhost:${BROWSE_PORT}"  # ":${BROWSE_PORT}" allows access from other machines
    79  ----
    80  
    81  TIP: The
    82  link:https://github.com/kythe/kythe/blob/master/kythe/proto/storage.proto[protocol buffer]
    83  encoding of Kythe facts is more efficient than the JSON encoding we're using
    84  here. Kythe supports JSON because some languages do not have good support for
    85  protocol buffers. This only comes into play for languages that emit a large
    86  amount of data, like $$C++$$. The `entrystream` tool used in `kythe-browse.sh`
    87  is invoked to read a stream of JSON entries from standard input and emit a
    88  `varint32`-delimited stream of `kythe.proto.Entry` messages on standard output.
    89  
    90  You can test this with a very short entry stream. The only tricky part here is
    91  that Kythe fact values, when serialized to JSON, are base64-encoded. This
    92  ensures that they can be properly deserialized later, since fact values may
    93  contain arbitrary binary data, but JSON strings permit only UTF-8 characters.
    94  `ZmlsZQ==` is `file` and `SGVsbG8sIHdvcmxkIQ==` is `Hello, world!`.
    95  
    96  [source,bash]
    97  ----
    98  echo '
    99  {"source":{"corpus":"example","path":"hello"},
   100   "fact_name":"/kythe/node/kind","fact_value":"ZmlsZQ=="}
   101  {"source":{"corpus":"example","path":"hello"},
   102   "fact_name":"/kythe/text","fact_value":"SGVsbG8sIHdvcmxkIQ=="}
   103  ' | ./kythe-browse.sh
   104  ----
   105  
   106  You can check that http://localhost:8080/#hello?corpus=example shows
   107  `Hello, world!'.
   108  
   109  == Modeling Kythe entries
   110  
   111  A Kythe graph can be encoded using two basic data types. The first, called a
   112  http://www.kythe.io/docs/kythe-storage.html#_a_id_termvname_a_vector_name_strong_vname_strong[VName],
   113  uniquely picks out a node in the graph. VNames have five string-valued fields.
   114  http://www.kythe.io/docs/kythe-storage.html#_entry[Entries]
   115  record both facts about individual nodes and edges between them. As described in
   116  the documentation, we only need to emit the forward versions of edges (those
   117  that are described in the http://www.kythe.io/docs/schema[schema]); the Kythe
   118  pipeline takes care of generating reverse edges as needed for efficiency.
   119  
   120  We'll encode VNames and entries in a straightforward way; in particular, we
   121  represent entries as objects, where the target's presence or absence determines
   122  if the entry represents an edge between nodes or a fact about a single node
   123  (respectively). Our `fact` and `edge` convenience functions also assume that all
   124  of the fact and edge names we'll use are underneath the `/kythe` prefix, since
   125  we're following the Kythe schema. This prefix is a requirement of the schema,
   126  not of the data model.
   127  
   128  [source,js]
   129  ----
   130  function vname(signature, path, language, root, corpus) {
   131    return {
   132      signature: signature,
   133      path: path,
   134      language: language,
   135      root: root,
   136      corpus: corpus,
   137    };
   138  }
   139  function fact(node, fact_name, fact_val) {
   140    return {
   141      source: node,
   142      fact_name: "/kythe/" + fact_name,
   143      fact_value: base64enc(fact_val),
   144    };
   145  }
   146  function edge(source, edge_name, target) {
   147    return {
   148      source: source,
   149      edge_kind: "/kythe/edge/" + edge_name,
   150      target: target,
   151      fact_name: "/",
   152    };
   153  }
   154  function ordinal_edge(source, edge_name, target, ordinal) {
   155    return {
   156      source: source,
   157      edge_kind: "/kythe/edge/" + edge_name + "." + ordinal,
   158      target: target,
   159      fact_name: "/",
   160    };
   161  }
   162  ----
   163  
   164  You can follow along at home with link:https://nodejs.org[node.js] and the
   165  following definitions:
   166  
   167  [source,js]
   168  ----
   169  function base64enc(string) {
   170    return new Buffer(string).toString('base64');
   171  }
   172  function emitEntries(entries) {
   173    entries.forEach(function(v){console.log(JSON.stringify(v))});
   174  }
   175  ----
   176  
   177  With this representation, our example database becomes:
   178  
   179  [source,js]
   180  ----
   181  [
   182    fact(vname("", "hello", "", "", "example"), "node/kind", "file"),
   183    fact(vname("", "hello", "", "", "example"), "text", "Hello, world!")
   184  ]
   185  ----
   186  
   187  VNames have an alternate
   188  link:http://www.kythe.io/docs/kythe-uri-spec.html[URI-style encoding]. VNames
   189  encoded in this way are called *tickets*; tickets and VNames are semantically
   190  interchangeable. This encoding is used where it is inconvenient or not possible
   191  to store VNames in a more structured format. You can use Kythe URIs when
   192  interacting with the
   193  link:http://www.kythe.io/docs/kythes-command-line-tool.html[Kythe command-line tool]:
   194  
   195  [source,bash]
   196  ----
   197  /opt/kythe/tools/kythe -api './tables' nodes 'kythe://example?path=hello'
   198  ----
   199  
   200  .Output
   201  ----
   202  kythe://example?path=hello
   203    /kythe/node/kind	file
   204    /kythe/text	Hello, world!
   205  ----
   206  
   207  `kythe://example?path=hello` is the URI encoding of the VName used in the
   208  example graph above.
   209  
   210  == File content
   211  
   212  Kythe stores file content in its graph. The `http_server` binary used in our
   213  `kythe-browse.sh` script doesn't look in your filesystem for a file to present
   214  to the Web browser; it instead reads the `text` fact off of a graph node.
   215  
   216  Since every node in the graph has a VName, we'll need to be able to build one
   217  for any source file your indexer might refer to. In our small example above,
   218  our test file had the path `hello` in the corpus `example`. It is up to you
   219  how to determine the corpus (and possibly root) to which a node belongs. It is
   220  best to keep this configurable; other Kythe indexers use a `vnames.json` file
   221  to choose the VName fields based on regular expressions over paths.
   222  
   223  All Kythe graph nodes should have a `node/kind` fact. For files, this kind is
   224  `file`. This means that each file should have at least two associated facts.
   225  You can see the JSON representation of the resulting entries above, where we
   226  used them to test the `kythe-browse.sh` script.
   227  
   228  NOTE: The Kythe JSON representation requires fact values to be base64-encoded.
   229  The protocol buffer representation does not, but it does store fact values as
   230  `bytes` instead of the `string` type. The protocol buffer `string` type must be
   231  valid UTF-8 and not all files in a graph may be UTF-8 encoded (though it is the
   232  default). Alternate encodings may be specified using the `encoding` fact.
   233  
   234  == Cross-references
   235  
   236  Imagine we have the following simple program:
   237  
   238  [source,lua]
   239  ----
   240  var foo = 1
   241  print foo
   242  ----
   243  
   244  We want to record the relationship between the reference to `foo` on the second
   245  line and its definition on the first line. First, we should build a
   246  representation for the variable `foo` itself. To summon a node into existence,
   247  we need a VName and a node kind. The schema already defines a node kind
   248  for http://www.kythe.io/docs/schema#variable[variables]. If there is no existing
   249  way to model `foo` in the schema, you're free to invent one of your own; the
   250  schema is intended to be open-ended. Be aware that tools that consume Kythe data
   251  may not be able to offer as much help with custom kinds, but should always be
   252  tolerant of them.
   253  
   254  We've already seen that VNames for http://www.kythe.io/docs/schema#file[files]
   255  contain *path*, *root*, and *corpus* components. (In fact, the schema requires
   256  that the other components of a file VName be empty.) We need to come up with
   257  assignments to these, plus *signature* and *language*, that uniquely refer to
   258  our variable `foo`. Getting this right can be subtle. Here are some guidelines:
   259  
   260  * Indexing the same compilation unit twice should always produce the same data.
   261  * VNames for objects that are accessible from multiple compilation units must
   262  be generated consistently. For example, if a module defines a public variable
   263  `Bar`, then `Bar`'s VName must be the same in all of the modules that use it.
   264  * VNames should not be over-specific. For example, if your language has a
   265  builtin `string` type, you should only have a single `VName` for that type
   266  (which is probably of the http://www.kythe.io/docs/schema#tbuiltin[tbuiltin]
   267  kind). Structural types should also have single representations; if your
   268  language also has a builtin pair type, there should only be a single `VName`
   269  for `pair<string,string>` (that's probably a
   270  http://www.kythe.io/docs/schema#tapp[tapp]).
   271  * Where possible, VNames should be generated without reference to source
   272  locations. This makes debugging your indexer easier and decreases the number
   273  of spurious changes to the graph when source text is modified.
   274  * Take caution that your *signature* fields aren't too long. In the $$C++$$
   275  indexer, signatures that are past a certain length are replaced with their
   276  hashes. This has significant implications for the size of your graph and the
   277  I/O cost of your tools.
   278  * The *language* component of a VName should be set to a well-known value.
   279  Java is `java`; $$C++$$ is `c++`; and so on. We'll use `ex` as our language.
   280  * Avoid duplicating information that's elsewhere in the VName, like the corpus
   281  label, language label, or path (in cases where a path is appropriate).
   282  
   283  We'll use `foo`'s defining file's preset components, the *language* `ex`, and
   284  the *signature* `foo#0` (to mean "the zeroth binding of foo at global scope").
   285  Using the functions we've defined above, we emit the following entry:
   286  
   287  [source,js]
   288  ----
   289  //         sig      path     lang  root   corpus
   290  fact(vname("foo#0", "hello", "ex", "", "example"), "node/kind", "variable");
   291  ----
   292  
   293  We can see it in the graph with the `kythe` tool (after running
   294  `kythe-browse.sh` to generate `./tables`):
   295  
   296  [source,bash]
   297  ----
   298  /opt/kythe/tools/kythe -api './tables' \
   299      nodes 'kythe://example?path=hello?lang=ex#foo#0'
   300  ----
   301  
   302  .Output
   303  ----
   304  kythe://example?lang=ex?path=hello#foo%230
   305    /kythe/node/kind	variable
   306  ----
   307  
   308  Notice how the `#` was URI-encoded in the ticket.
   309  
   310  === Specifying spans of text
   311  
   312  Spans of text in Kythe are represented by
   313  http://www.kythe.io/docs/schema#anchor[anchor] nodes. Anchors may overlap.
   314  If an anchor exactly overlaps another anchor (e.g., it shares the same start
   315  and end offsets), it is conventional (but not required) that they share a VName.
   316  Contrary to the general advice for generating VNames, an anchor's VName *should*
   317  be based on its location in a source file.
   318  
   319  Besides the required `node/kind` fact, anchors should have `loc/start`
   320  and `loc/end` facts that give their (inclusive) start and (exclusive) end
   321  location offsets as base-10 stringified integers.
   322  
   323  NOTE: In Kythe, offsets are always in units of bytes. If your programming
   324  language specifies locations of syntactic objects in lines and columns or
   325  codepoints, you will need to transform these to byte offsets.
   326  
   327  [source,js]
   328  ----
   329  function anchorVName(file_vname, begin, end) {
   330    return vname("@" + begin + ":" + end, file_vname.path, "ex", file_vname.root,
   331        file_vname.corpus);
   332  }
   333  function anchor(file_vname, begin, end) {
   334    var anchor_vname = anchorVName(file_vname, begin, end);
   335    return [
   336      fact(anchor_vname, "node/kind", "anchor"),
   337      fact(anchor_vname, "loc/start", begin.toString()),
   338      fact(anchor_vname, "loc/end", end.toString()),
   339    ];
   340  }
   341  ----
   342  
   343  The anchor covering the definition of `foo` in our example file, assuming the
   344  file has the same `VName` as the earlier file, is represented by these three
   345  facts:
   346  
   347  [source,js]
   348  ----
   349  {"source":{"signature":"@4:7","path":"hello","language":"ex","corpus":"example"},
   350   "edge_name":"/","fact_name":"/kythe/node/kind","fact_value":"YW5jaG9y"}
   351  {"source":{"signature":"@4:7","path":"hello","language":"ex","corpus":"example"},
   352   "edge_name":"/","fact_name":"/kythe/loc/start","fact_value":"NA=="}
   353  {"source":{"signature":"@4:7","path":"hello","language":"ex","corpus":"example"},
   354   "edge_name":"/","fact_name":"/kythe/loc/end","fact_value":"Nw=="}
   355  ----
   356  
   357  === Linking anchors to semantic nodes
   358  
   359  We can now link the definition and reference sites of `foo` back to the node
   360  we created for the variable. To do so, we'll add a
   361  link:http://www.kythe.io/docs/schema/#definesbinding[defines/binding] edge
   362  from the definition site and a link:http://www.kythe.io/docs/schema/#ref[ref]
   363  edge from the use site:
   364  
   365  [source,js]
   366  ----
   367  edge(foo_def_anchor_vname, "defines/binding", foo_vname),
   368  edge(foo_ref_anchor_vname, "ref", foo_vname)
   369  ----
   370  
   371  Our full database, specified using the previously-defined functions, looks
   372  like:
   373  
   374  [source,js]
   375  ----
   376  var hello_file_vname = vname("", "hello", "", "", "example");
   377  var foo_vname = vname("foo#0", "hello", "ex", "", "example");
   378  var foo_def_anchor_vname = anchorVName(hello_file_vname, 4, 7);
   379  var foo_ref_anchor_vname = anchorVName(hello_file_vname, 18, 21);
   380  var entries = [
   381    fact(hello_file_vname, "node/kind", "file"),
   382    fact(hello_file_vname, "text", "var foo = 1\nprint foo"),
   383    fact(foo_vname, "node/kind", "variable"),
   384    edge(foo_def_anchor_vname, "defines/binding", foo_vname),
   385    edge(foo_ref_anchor_vname, "ref", foo_vname)
   386  ].concat(anchor(hello_file_vname, 4, 7))
   387   .concat(anchor(hello_file_vname, 18, 21));
   388  ----
   389  
   390  NOTE: For pedagogical reasons, we're building our graph up as a big array
   391  of entries. In practice, this is a bad idea; graphs can become very large,
   392  and buffering all your data up to release it at the same time prevents
   393  downstream consumers from working in parallel (even if you're just writing
   394  to disk). Indexers should emit graph data as soon as practical (and should also
   395  endeavor to avoid emitting duplicate data).
   396  
   397  You can test it using `kythe-browse.sh` and by querying the `kythe` tool for
   398  file decorations:
   399  
   400  [source,bash]
   401  ----
   402  /opt/kythe/tools/kythe -api './tables' decor 'kythe://example?path=hello'
   403  ----
   404  
   405  .Output
   406  ----
   407  /kythe/edge/defines/binding	1:4-1:7	variable	kythe://example?lang=ex?path=hello#foo%230
   408  /kythe/edge/ref	2:6-2:9	variable	kythe://example?lang=ex?path=hello#foo%230
   409  ----
   410  
   411  == Testing
   412  
   413  Most of the work in testing a tool that produces Kythe data boils down to
   414  checking that different anchors in example source text are linked to the correct
   415  nodes and edges. From this starting point, you can make sure that other parts
   416  of the semantic graph are properly formed.
   417  
   418  Given a description of these anchors and their desired relationships,
   419  performing the necessary checks doesn't require any information specific to the
   420  language being analyzed. With this in mind, we built the
   421  http://www.kythe.io/docs/kythe-verifier.html[Kythe verifier]. The verifier
   422  accepts a stream of Kythe entries and source files, the latter of which have
   423  been annotated with *goals*. Each goal describes entries that the verifier must
   424  (or must not) find in its input stream. Since some parts of these entries are
   425  uninteresting to test--for example, the exact encoding used for a anchor's VName
   426  is unimportant--parts of a goal may be replaced with variables for which the
   427  verifier will try to find an assignment.
   428  
   429  Just as we were able to drive the Kythe pipeline with only a list of JSON
   430  entries, so too can we drive the verifier with only those entries and a list
   431  of goals. This script, `kythe-verify-json.sh`, reads JSON entries from
   432  standard in and passes them (and its arguments) to the verifier:
   433  
   434  [source,bash]
   435  ----
   436  #!/bin/bash -e
   437  set -o pipefail
   438  # You can find prebuilt binaries at https://github.com/kythe/kythe/releases.
   439  # This script assumes that they are installed to /opt/kythe.
   440  # Read JSON entries from standard in and pass them to the verifier.
   441  # The entrystream tool turns the JSON into length-delimited protocol buffers,
   442  # described at http://godoc.org/kythe.io/kythe/go/platform/delimited
   443  /opt/kythe/tools/entrystream --read_format=json | \
   444  /opt/kythe/tools/verifier --nofile_vnames "$@"
   445  ----
   446  
   447  We can write a rule file that checks whether we have any file nodes at all and
   448  call it `test.goals`:
   449  
   450  [source,c]
   451  ----
   452  //- FileNode?.node/kind file
   453  ----
   454  
   455  The `//-` prefix tells the verifier which lines to look for goals on. It's meant
   456  to be ignored as a comment by most languages. Of course, some languages (like
   457  Python) use different character sequences to denote comments, so it can be
   458  changed with a command-line flag.
   459  
   460  We ask the verifier to check that the goals can be met with the entries we wrote
   461  out earlier in this document:
   462  
   463  [source,bash]
   464  ----
   465  echo '
   466  {"source":{"corpus":"example","path":"hello"},
   467   "fact_name":"/kythe/node/kind","fact_value":"ZmlsZQ=="}
   468  {"source":{"corpus":"example","path":"hello"},
   469   "fact_name":"/kythe/text","fact_value":"SGVsbG8sIHdvcmxkIQ=="}
   470  ' | ./kythe-verify-json.sh test.goals
   471  ----
   472  
   473  Since we do have a node with kind `file`, the verifier exits with a zero error
   474  code without printing any diagnostics.
   475  
   476  If we had written an unsatisfiable goal--let's say we made a spelling mistake
   477  and asked for a node with kind `elif` instead:
   478  
   479  [source,c]
   480  ----
   481  //- FileNode?.node/kind elif
   482  ----
   483  
   484  the verifier will protest (and return a nonzero exit code):
   485  
   486  .Output
   487  ----
   488  Could not verify all goals. The furthest we reached was:
   489    test.goals:2:5-2:28 FileNode.node/kind elif
   490  ----
   491  
   492  If your graph is small, it can be useful to display it graphically:
   493  
   494  [source,bash]
   495  ----
   496  echo '
   497  {"source":{"corpus":"example","path":"hello"},
   498   "fact_name":"/kythe/node/kind","fact_value":"ZmlsZQ=="}
   499  {"source":{"corpus":"example","path":"hello"},
   500   "fact_name":"/kythe/text","fact_value":"SGVsbG8sIHdvcmxkIQ=="}
   501  ' | ./kythe-verify-json.sh -annotated_graphviz test.goals | xdot
   502  ----
   503  
   504  This graph will render in https://github.com/jrfonseca/xdot.py[xdot.py] as
   505  something like:
   506  
   507  [kythe,dot,"one file node",0]
   508  --------------------------------------------------------------------------------
   509  digraph G {
   510  "App(vname, (\"\", example, \"\", hello, \"\"))" [ label=<<TABLE><TR><TD COLSPAN="2">(&quot;&quot;, example, &quot;&quot;, hello, &quot;&quot;) = FileNode</TD></TR><TR><TD>/kythe/node/kind</TD><TD>file</TD></TR><TR><TD>/kythe/text</TD><TD>...</TD></TR></TABLE>> shape=plaintext  color=blue ];
   511  }
   512  --------------------------------------------------------------------------------
   513  
   514  There's only one node in this graph, but it's the file node that we asked the
   515  verifier to find; notice how it is outlined in blue. The verifier applies this
   516  highlighting to nodes that are matched against variables in the goals
   517  (here, `FileNode`).
   518  
   519  === Testing for variable definitions and references
   520  
   521  Most of the time, verifier rules are written down in the same file that the
   522  rules are meant to check. For example, we can rewrite our example program
   523  in the following way:
   524  
   525  [source,lua]
   526  ----
   527  --! @foo defines/binding VarFoo
   528  --! VarFoo.node/kind variable
   529  var foo = 1
   530  --! @foo ref VarFoo
   531  print foo
   532  ----
   533  
   534  Let's start with the second line. To satisfy this goal, the verifier must find
   535  the VName of a node with a `node/kind` fact with the value `variable`. It will
   536  then use that VName wherever the variable `VarFoo` appears. `VarFoo` is
   537  interpreted as a variable because it begins with a capital letter.
   538  
   539  To satisfy the goal on the first line, the verifier must find two VNames:
   540  one to substitute for `VarFoo` (the same `VarFoo` as previously discussed)
   541  and one to use as the VName of the anchor spanning `foo` on the next
   542  line of code. The `@foo` token generates a new VName variable and constrains
   543  it to refer to an *anchor* node with the offsets of `foo`. Any additional
   544  constraints on `@foo` act as constraints on that variable. In order for this
   545  first goal to succeed, then, the verifier must find an anchor spanning the
   546  text `foo` that is the source of a `defines/binding` edge with some other
   547  node (with VName `VarFoo`) as a target.
   548  
   549  Similarly, to satisfy the goal on the fourth line, the verifier must find
   550  a `ref` edge starting at an anchor covering `foo` on the next line of code
   551  and ending at a node with VName `VarFoo`.
   552  
   553  NOTE: `@foo` does not refer to the same variable as the `@foo` on the first
   554  line. Each `@` token creates a new anonymous variable.
   555  
   556  The *full* problem that the verifier must solve is the *conjunction* of all
   557  of these goals. If it chooses a VName to use for `VarFoo` that works for the
   558  first goal but not the third, the verifier will backtrack and try a different
   559  assignment. The test succeeds if there is an assignment that satisfies all
   560  the goals. When our first example failed, the verifier couldn't find any
   561  assignment to `FileNode` that would satisfy `FileNode.node/kind elif`.
   562  
   563  Assuming we update the offsets in our output to reflect the comments
   564  (these are now `(66, 69)` and `(100, 103)`), we can now check our code:
   565  
   566  [source,bash]
   567  ----
   568  ./kythe-verify-json.sh --goal_prefix="--!" test.program < test.program.json
   569  ----
   570  
   571  We can also dump our graph:
   572  
   573  [source,bash]
   574  ----
   575  ./kythe-verify-json.sh --goal_prefix="--!" --annotated_graphviz \
   576      test.program < test.program.json | xdot
   577  ----
   578  
   579  This results in the following:
   580  
   581  [kythe,dot,"tiny program",0]
   582  ----
   583  digraph G {
   584  "App(vname, (\"\", example, \"\", hello, \"\"))" [ label=<<TABLE><TR><TD COLSPAN="2">(&quot;&quot;, example, &quot;&quot;, hello, &quot;&quot;)</TD></TR><TR><TD>/kythe/node/kind</TD><TD>file</TD></TR><TR><TD>/kythe/text</TD><TD>...</TD></TR></TABLE>> shape=plaintext ];
   585  "App(vname, (foo#0, example, \"\", hello, ex))" [ label=<<TABLE><TR><TD COLSPAN="2">(foo#0, example, &quot;&quot;, hello, ex) = VarFoo</TD></TR><TR><TD>/kythe/node/kind</TD><TD>variable</TD></TR></TABLE>> shape=plaintext  color=blue ];
   586  "App(vname, (@66:69, example, \"\", hello, ex))" [ shape=circle, label="@foo:1.4", color="blue" ];
   587  "App(vname, (@66:69, example, \"\", hello, ex))" -> "App(vname, (foo#0, example, \"\", hello, ex))" [ label="/kythe/edge/defines/binding" ];
   588  "App(vname, (@100:103, example, \"\", hello, ex))" [ shape=circle, label="@foo:4.6", color="blue" ];
   589  "App(vname, (@100:103, example, \"\", hello, ex))" -> "App(vname, (foo#0, example, \"\", hello, ex))" [ label="/kythe/edge/ref" ];
   590  }
   591  ----
   592  
   593  As before, the nodes we've matched are colored blue. In these diagrams, anchors
   594  are presented as circles with `@` labels (unless they are matched to verifier
   595  variables, in which case more information is provided). The vast majority of
   596  the time, you will not be interested in seeing file offsets in these diagrams.
   597  You can still test for facts on `@`-specified nodes as you would any other
   598  node.
   599  
   600  For more examples of the goal language, take a look at the code listings in the
   601  http://www.kythe.io/docs/schema[schema document]. There are also lots more in
   602  the $$C++$$ indexer's
   603  https://kythe.io/repo/kythe/cxx/indexer/cxx/testdata[testdata] and the Java
   604  indexer's
   605  https://kythe.io/repo/kythe/javatests/com/google/devtools/kythe/analyzers/java/testdata[testdata]
   606  directories. Finally, there is a
   607  http://www.kythe.io/docs/schema/verifierstyle.html[style guide] with helpful
   608  tips.
   609  
   610  Note how the verifier goals don't mention any of the internal implementation
   611  decisions we've made about the VNames of anchors or variables. This means that
   612  if we later choose to change those aspects of our implementation, the verifier
   613  tests will not break. Also note that we didn't check for details about the
   614  file itself (as we did in the first example). Tests using the Kythe verifier
   615  rarely examine *all* of an indexer's output, just the subgraph that is relevant
   616  for a particular feature. This makes the tests easier to read and guards against
   617  tests becoming sensitive to new features.