kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/docs/schema/indexing-generated-code.txt

kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/docs/schema/indexing-generated-code.txt (about)

     1  // Copyright 2016 The Kythe Authors. All rights reserved.
     2  //
     3  // Licensed under the Apache License, Version 2.0 (the "License");
     4  // you may not use this file except in compliance with the License.
     5  // You may obtain a copy of the License at
     6  //
     7  //   http://www.apache.org/licenses/LICENSE-2.0
     8  //
     9  // Unless required by applicable law or agreed to in writing, software
    10  // distributed under the License is distributed on an "AS IS" BASIS,
    11  // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    12  // See the License for the specific language governing permissions and
    13  // limitations under the License.
    14  
    15  = Indexing Generated Code
    16  
    17  :Revision: 1.0
    18  :toc2:
    19  :toclevels: 3
    20  :priority: 999
    21  
    22  Source code generators like link:https://www.gnu.org/software/flex/[Flex],
    23  link:https://www.gnu.org/software/bison/[GNU Bison], and
    24  link:http://www.swig.org/[SWIG] take a high-level description of a software
    25  component and generate the code necessary to realize that component in a
    26  lower-level or general-purpose programming language. Users browsing projects
    27  that use these components usually want cross-references to take them
    28  from use sites of a generated interface to the high-level code that brought
    29  that interface into being. They do not normally want to see the generated
    30  implementation, as this is often difficult (or uninteresting) to read. This
    31  document describes how to encode information about generated code to permit
    32  cross-language links.
    33  
    34  To make the discussion easier to understand let's pretend we are working with
    35  two languages: SourceLang and TargetLang. SourceLang has `.source` file and TargetLang
    36  has `.target` files. We also have a tool (generator) that can take generate
    37  `foo.target` file from `foo.source` file. We have following components:
    38  
    39  * Source Indexer - Kythe indexer that takes `.source` files and outputs index
    40    data.
    41  * Target Indexer - Kythe indexer that takes `.target` files and outputs index
    42    data.
    43  * Generator - tool that produces `.target` files from `.source` files.
    44  * Post processor - Kythe tool that takes all index data produced by all
    45    indexers, processes it and outputs final Kythe graph that contains data
    46    for both SourceLang and TargetLang.
    47  
    48  Now we want to teach Kythe how to create cross-references between generated
    49  `foo.target` file and original `foo.source` file. The main idea is pretty simple:
    50  Generator has to output extra data containing mapping of elements in `foo.target`
    51  to the original elements from `foo.source`. Then when Target Indexer is indexing
    52  `foo.target` it will use that mapping to output *generates* or *imputes* edges.
    53  These edges connect nodes from `foo.target` with nodes in `foo.source`.
    54  
    55  Kythe doesn't require implementors to use one concrete approach for passing
    56  mapping metadata and outputting *generates* and *imputes* edges. Below we
    57  describe two different approaches, each has its own pros and cons. But in
    58  both cases it is assumed that implementors can change Generator and Target
    59  Indexer. If possible the *generates* approach is preferred as it requires less
    60  post-processing work.
    61  
    62  TIP: You can find an example implementation at
    63  link:https://github.com/kythe/kythe/tree/master/kythe/examples/proto[GitHub].
    64  The current sample web UI does not interpret the parts of the schema we will
    65  use; this is a work in progress.
    66  
    67  == Java To JavaScript with *imputes* edges
    68  
    69  This approach is generic and works for any combination of SourceLang and
    70  TargetLang. In this example we generate JavaScript files from Java file so
    71  SourceLang is Java and TargetLang is JavaScript. Given `Color.java`:
    72  
    73  [source,java]
    74  -------------------------------------------------------------------------------
    75  public enum Color {
    76    RED;
    77  }
    78  -------------------------------------------------------------------------------
    79  
    80  Generator produces `color.js`:
    81  [source,javascript]
    82  -------------------------------------------------------------------------------
    83  const Color = {
    84    RED: 0,
    85  };
    86  -------------------------------------------------------------------------------
    87  
    88  === Changes to Generator
    89  
    90  To support cross-references betwen `color.js` and `Color.java` we need to update
    91  Generator to output the following mapping data for `Color`, `RED` elements.
    92  
    93  [source,json]
    94  -------------------------------------------------------------------------------
    95  {
    96      "type": "kythe0",
    97      "meta": [{
    98          "type": "anchor_anchor",
    99          "source_begin": 13,
   100          "source_end": 18,
   101          "target_begin": 6,
   102          "target_end": 11,
   103          "edge": "/kythe/edge/imputes",
   104          "source_vname": {
   105              "corpus": "corpus",
   106              "path": "path/to/Color.java"
   107          }
   108      }, {
   109          "type": "anchor_anchor",
   110          "source_begin": 22,
   111          "source_end": 25,
   112          "target_begin": 18,
   113          "target_end": 21,
   114          "edge": "/kythe/edge/imputes",
   115          "source_vname": {
   116              "corpus": "corpus",
   117              "path": "path/to/Color.java"
   118          }
   119      }]
   120  }
   121  -------------------------------------------------------------------------------
   122  
   123  This mapping has 2 `meta` entries. The first entry for `Color`, the second for
   124  `RED`. Note:
   125  
   126  * Each entry doesn't contain names of elements. Each entry contains only
   127    position of elements in the source (`Color.java`) and target (`color.js`)
   128    files.
   129  * Each position is defined as byte offset inside file and not as line/column.
   130    This is required because in Kythe anchors are defined using byte offsets and
   131    not line/column. In this example JavaScript indexer will process this
   132    mapping and will need to output *anchor* for `Color.java` and indexer
   133    doesn't have access to the `Color.java` file (it has access only to JS
   134    files). Because of that JS indexer can't translate line/column to byte
   135    offset.
   136  * Entry doesn't contain vnames of elements in `Color.java` or `color.js` and
   137    instead contains positions. VNames of nodes are internal details of each
   138    indexer and subject to change. Generator usually a standalone tool that
   139    doesn't know rules for producing vnames for specific language so it's
   140    impossible for Generator to output vnames of nodes. If in your case
   141    VNames are stable and well-specified you can use simpler approach
   142    using *generates* described in `Protocol Buffer` section below.
   143  
   144  To pass this mapping to the JavaScript Indexer Generator will append it
   145  as a comment at the last line of `color.js`:
   146  
   147  [source,javascript]
   148  -------------------------------------------------------------------------------
   149  const Color = {
   150    RED: 0,
   151  };
   152  
   153  // Kythe Indexing Metadata:
   154  // {"type":"kythe0","meta":[{"type":"anchor_anchor","source_begin":13,...
   155  -------------------------------------------------------------------------------
   156  
   157  Inlining metadata inside `color.js` has benefit of avoiding passing extra
   158  files to Indexer. All Indexer needs is to know that some JavaScript files can
   159  contain metadata on the last line and parse it.
   160  
   161  One downside is that it adds noise to `color.js` but usually generated
   162  files are invisible to developers so it's not a big concern.
   163  
   164  ==== Changes to JavaScript Indexer
   165  
   166  On JavaScript Indexer side we need to parse metadata and output *imputes*
   167  edges. To parse metadata indexer can check last two lines of all `.js` files
   168  and see if they contain `// Kythe Indexing Metadata:` and if so - parse
   169  the last line as JSON.
   170  
   171  For each `meta` entry indexer should do the following:
   172  
   173  1. Output an *anchor* using `source_begin` and `source_end`. `source_vname`
   174     should be used as file containing the anchor.
   175  2. Find a JavaScript node that has *defines/binding* anchor with the same
   176     `target_begin/end` position.
   177  3. Ouptut one *imputes* edge from the *anchor* created at step 1 to the node
   178     found at step 2.
   179  
   180  Note that this only applies to meta entries with type `anchor_anchor`. For other
   181  types structure might be different. See link:https://github.com/kythe/kythe/issues/3711[issue #3711].
   182  
   183  Here is what JavaScript indexer outputs for the `Color` element using the
   184  rules above:
   185  
   186  [kythe,dot,"JavaScript Indexer graph",0]
   187  --------------------------------------------------------------------------------
   188  digraph G {
   189  size="7,7";
   190  coloranchorjava [label="anchor\nColor.java:0:12-17"];
   191  redanchorjava [label="anchor\nColor.java:1:2-5"];
   192  coloranchorjs [label="anchor\ncolor.js:0:6-11"];
   193  redanchorjs [label="anchor\ncolor.js:0:2-5"];
   194  colornode [label="Color node\nin JS"];
   195  rednode [label="RED node\nin JS"];
   196  
   197  coloranchorjs -> colornode [label = "defines/binding"];
   198  redanchorjs -> rednode [label = "defines/binding"];
   199  coloranchorjava -> colornode [label = "imputes"];
   200  redanchorjava -> rednode [label = "imputes"];
   201  }
   202  --------------------------------------------------------------------------------
   203  
   204  Output of Java Indexer looks like this:
   205  
   206  [kythe,dot,"Java Indexer graph",0]
   207  --------------------------------------------------------------------------------
   208  digraph G {
   209  size="7,7";
   210  coloranchorjava [label="anchor\nColor.java:0:12-17"];
   211  redanchorjava [label="anchor\nColor.java:1:2-5"];
   212  colornodejava [label="Color node\nin Java"];
   213  rednodejava [label="RED node\nin Java"];
   214  
   215  coloranchorjava -> colornodejava [label = "defines/binding"];
   216  redanchorjava -> rednodejava [label = "defines/binding"];
   217  }
   218  --------------------------------------------------------------------------------
   219  
   220  === Post-processor
   221  
   222  Once Java and JavaScript Indexers finished their output is merged and
   223  postprocessor finds all anchors that have both *defines/binding* and
   224  *imputes* edges and creates *generates* edge:
   225  
   226  [kythe,dot,"Processed final graph",0]
   227  --------------------------------------------------------------------------------
   228  digraph G {
   229  size="7,7";
   230  coloranchorjava [label="anchor\nColor.java:0:12-17"];
   231  redanchorjava [label="anchor\nColor.java:1:2-5"];
   232  coloranchorjs [label="anchor\ncolor.js:0:6-11"];
   233  redanchorjs [label="anchor\ncolor.js:0:2-5"];
   234  colornode [label="Color node\nin JS"];
   235  rednode [label="RED node\nin JS"];
   236  colornodejava [label="Color node\nin Java"];
   237  rednodejava [label="RED node\nin Java"];
   238  
   239  coloranchorjs -> colornode [label = "defines/binding"];
   240  redanchorjs -> rednode [label = "defines/binding"];
   241  coloranchorjava -> colornode [label = "imputes"];
   242  redanchorjava -> rednode [label = "imputes"];
   243  coloranchorjava -> colornodejava [label = "defines/binding"];
   244  redanchorjava -> rednodejava [label = "defines/binding"];
   245  colornodejava -> colornode [label = "generates"];
   246  rednodejava -> rednode [label = "generates"];
   247  }
   248  --------------------------------------------------------------------------------
   249  
   250  This is the end state. Now tools using Kythe graph can see that Color enum
   251  in JS is generated by Color enum in Java and perform proper action (for example
   252  IDE upon clicking on `Color` in JS file will go to the definition of `Color`
   253  enum in java file.
   254  
   255  == Protocol Buffers with *generates* edges
   256  
   257  This approach is easier to implement compared to *imputes* approach described
   258  above, but it requires tighter integration with Indexer and Generator. When
   259  Generator outputs code it also adds a mapping as in the *imputes* approach,
   260  but instead of mapping location to location it outputs VNames of nodes from
   261  `foo.source`. It requires Generator to know exactly what VNames will be produced
   262  by the Source Indexer. This approach is feasible when either VNames either
   263  have simple stable form or Generator can reuse code from Source Indexer to
   264  generate VNames.
   265  
   266  In this example we generate C++ files from Protocol buffer definitions. So
   267  SourceLang is Protocol Buffers and TargetLang is C++.
   268  
   269  The Kythe project uses
   270  link:https://developers.google.com/protocol-buffers/[protocol buffers] for
   271  data interchange. The `protoc` compiler reads a domain-specific language
   272  that describes messages and synthesizes code that serializes, deserializes,
   273  and manipulates these messages. It can generate code in a number of different
   274  target languages by swapping out backend components. These accept an encoding
   275  of the message descriptions in the original source file and emit source text.
   276  
   277  [kythe,dot,"protoc architecture",0]
   278  --------------------------------------------------------------------------------
   279  digraph G {
   280  size="7,7";
   281  protosrc [label=".proto", shape=note];
   282  frontend [label="protoc", shape=rectangle];
   283  descriptor [label="descriptor", shape=note];
   284  backend [label="C++ language backend", shape=rectangle];
   285  ccsrc [label=".pb.h", shape=note];
   286  protosrc -> frontend;
   287  frontend -> descriptor;
   288  descriptor -> backend;
   289  backend -> ccsrc;
   290  }
   291  --------------------------------------------------------------------------------
   292  
   293  === Indexing `.proto` definitions
   294  
   295  `.proto` files are written in a domain-specific programming language for
   296  describing various properties about messages and other data. It is interesting
   297  to index these on their own, as messages in one `.proto` file may be used in
   298  another `.proto` file. Here is a very simple example of the language:
   299  
   300  [source,c]
   301  --------------------------------------------------------------------------------
   302  syntax = "proto3";
   303  package kythe.examples.proto.example;
   304  
   305  // A single proto message.
   306  message Foo {
   307  }
   308  --------------------------------------------------------------------------------
   309  
   310  This file describes the empty message `kythe.examples.proto.example.Foo`
   311  using features from version 3 of the language. When run through `protoc`
   312  with the appropriate options set, it will generate the interface `example.pb.h`
   313  and the implementation `example.pb.cc`. These may be used to interact with
   314  `Foo` messages in $$C++$$.
   315  
   316  As it turns out, `protoc` can be coerced into saving the descriptor that it
   317  passes to its backends. Ordinarily, this descriptor would merely be an
   318  abstract version of the `.proto` input file that discards syntax and records
   319  only the details necessary to generate source code. If asked, `protoc` will
   320  also keep track of source locations (`--include_source_info`) and data about
   321  the `.proto` files that are (transitively) imported (`--include_imports`).
   322  This information is sufficient to build a Kythe graph for a given `.proto`
   323  definition file. It will become important later that every object that the
   324  descriptor describes has an address, like "4.0", that corresponds (roughly)
   325  to its position in the descriptor's AST. These addresses are used as keys into
   326  the table that keeps track of source locations in the original `.proto` file.
   327  
   328  [kythe,dot,"protoc architecture with indexer",0]
   329  --------------------------------------------------------------------------------
   330  digraph G {
   331  size="7,7";
   332  protosrc [label=".proto", shape=note];
   333  frontend [label="protoc", shape=rectangle];
   334  descriptor [label="descriptor", shape=note];
   335  descriptorfile [label="FileDescriptorSet", shape=note, color=blue];
   336  indexer [label="Kythe proto_indexer", shape=rectangle, color=blue];
   337  backend [label="C++ language backend", shape=rectangle];
   338  ccsrc [label=".pb.h", shape=note];
   339  entries [label="Kythe entries", shape=note, color=blue];
   340  protosrc -> frontend;
   341  frontend -> descriptor;
   342  frontend -> descriptorfile [color=blue];
   343  protosrc -> indexer [color=blue];
   344  descriptorfile -> indexer [color=blue];
   345  descriptor -> backend;
   346  backend -> ccsrc;
   347  indexer -> entries [color=blue];
   348  }
   349  --------------------------------------------------------------------------------
   350  
   351  This extra information is stored as a file that contains a
   352  `proto2.FileDescriptorSet` message, which in turn is a list of the
   353  `proto2.FileDescriptorProto` messages used in the course of processing `.proto`
   354  input. Note that this message does not contain `.proto` source text, so the
   355  `proto_indexer` must have access to the original source files.
   356  
   357  We can add a verifier assertion to check that `Foo` declares a Kythe node:
   358  
   359  [source,c]
   360  --------------------------------------------------------------------------------
   361  syntax = "proto3";
   362  package kythe.examples.proto.example;
   363  
   364  // A single proto message.
   365  //- @Foo defines/binding MessageFoo?
   366  message Foo {
   367  }
   368  --------------------------------------------------------------------------------
   369  
   370  and see that it was unified with the appropriate VName:
   371  
   372  .Output
   373  ----
   374  MessageFoo: EVar(... = App(vname,
   375      (4.0, kythe, "", kythe/examples/proto/example.proto, protobuf)))
   376  ----
   377  
   378  == Using generated source code
   379  
   380  Imagine that we have a simple $$C++$$ user of our generated source code for
   381  `Foo`. Its code, with a verifier assertion, looks like this:
   382  
   383  [source,c]
   384  --------------------------------------------------------------------------------
   385  #include "kythe/examples/proto/example.pb.h"
   386  
   387  //- @Foo ref CxxFooDecl?
   388  void UseProto(kythe::examples::proto::example::Foo* foo) {
   389  }
   390  --------------------------------------------------------------------------------
   391  
   392  The Kythe pipeline for indexing our combined program looks like this:
   393  
   394  [kythe,dot,"first indexing pipeline",0]
   395  --------------------------------------------------------------------------------
   396  digraph G {
   397  size="7,7!";
   398  usersrc [label="proto_user.cc", shape=note, color=blue];
   399  ccextractor [label="C++ extractor", shape=rectangle, color=blue];
   400  kzip [label="proto_user.kzip", shape=note, color=blue];
   401  protosrc [label=".proto", shape=note];
   402  frontend [label="protoc", shape=rectangle];
   403  descriptor [label="descriptor", shape=note];
   404  descriptorfile [label="FileDescriptorSet", shape=note];
   405  indexer [label="Kythe proto_indexer", shape=rectangle];
   406  ccindexer [label="Kythe C++ indexer", shape=rectangle, color=blue];
   407  backend [label="C++ language backend", shape=rectangle];
   408  ccsrc [label=".pb.h", shape=note];
   409  entries [label="Kythe entries", shape=note];
   410  protosrc -> frontend;
   411  frontend -> descriptor;
   412  frontend -> descriptorfile;
   413  protosrc -> indexer;
   414  descriptorfile -> indexer;
   415  descriptor -> backend;
   416  backend -> ccsrc;
   417  indexer -> entries;
   418  usersrc -> ccextractor [color=blue];
   419  ccsrc -> ccextractor [color=blue];
   420  usersrc -> ccsrc [color=blue];
   421  ccextractor -> kzip [color=blue];
   422  kzip -> ccindexer [color=blue];
   423  ccindexer -> entries [color=blue];
   424  }
   425  --------------------------------------------------------------------------------
   426  
   427  When we use the verifier to inspect the resulting `CxxFooDecl`, we see that
   428  it has not been unified with the VName for `Foo`:
   429  
   430  .Output
   431  ----
   432  CxxFooDecl: EVar(... =
   433      App(vname, (srl0y/pwih+G6wsjFLMTVKQPC7lLH3/9MVK2d2aJHeE=,
   434                  kythe, bazel-out/genfiles, kythe/examples/proto/example.pb.h,
   435                  c++)))
   436  ----
   437  
   438  This is because the `kythe::examples::proto::example::Foo` type is a $$C++$$
   439  type defined in `example.pb.h`. That it was defined in some original `.proto`
   440  file has no meaning to the $$C++$$ compiler. Furthermore, the Kythe $$C++$$
   441  indexer has no understanding of the `protoc` language and the VNames that the
   442  Kythe proto_indexer produces.
   443  
   444  Our goal is to add edges in the graph between `CxxFooDecl` and `MessageFoo`
   445  so that clients can take into account their relationship when displaying
   446  cross-references or answering other queries. We do not want to unify them in the
   447  same node, as they are legitimately different objects. Users may wish to
   448  navigate to the generated $$C++$$ code for `CxxFooDecl` or to view uses of
   449  `MessageFoo` in other languages. To support these different uses, we will emit
   450  a link:/docs/schema#generates[generates] edge such that `MessageFoo`
   451  *generates* `CxxFooDecl`. Clients can choose to follow the edge or to disregard
   452  it.
   453  
   454  Observe that the $$C++$$ indexer and `protoc` backend both observe the same
   455  content in the `.pb.h` file; therefore, both programs see the same offsets
   456  for various tokens. If the `protoc` backend were to link those offsets back
   457  to the objects in the `FileDescriptorProto` using well-known names--and if the
   458  Kythe proto_indexer guaranteed a particular mechanism for generating VNames
   459  from those well-known names--we could close the loop in the $$C++$$ indexer by
   460  emitting *generates* edges to the proto_indexer's nodes whenever the $$C++$$
   461  indexer trips over the `protoc` backend's marked offsets.
   462  
   463  In other words, if the `.pb.h` contained code like:
   464  [source,c]
   465  --------------------------------------------------------------------------------
   466  ...
   467  class Foo {
   468  ...
   469  --------------------------------------------------------------------------------
   470  
   471  and the `protoc` backend that generated it reported that the text range
   472  `Foo` was associated with an object in its original `FileDescriptorProto` at
   473  some location encoded as "4.0"&mdash;and the proto_indexer guaranteed it would
   474  always emit objects with signatures based on their descriptor locations--the
   475  $$C++$$ indexer would only need to watch for *defines/binding* edges starting at
   476  that text range. Should such an edge be emitted, the $$C++$$ indexer would also
   477  emit a *generates* edge to the `proto` node.
   478  
   479  === Annotations in `protoc` backends
   480  
   481  We have already seen how to command the `protoc` frontend to emit location
   482  information for `.proto` source files. The frontend does not, however, know
   483  anything about the source code that its various backends emit. We must pass
   484  additional flags to these backends to get them to produce location information
   485  as `proto2.GeneratedCodeInfo` messages. These messages connect byte offsets
   486  in generated source code with paths in the `proto2.FileDescriptorProto` AST.
   487  These paths are the same ones used by the `proto2.SourceCodeInfo` message that
   488  the Kythe proto_indexer consumes; they are the paths we will use to link up
   489  `protobuf` language nodes with the nodes for generated source code.
   490  
   491  Each `protoc` backend must be individually instrumented to produce
   492  `proto2.GeneratedCodeInfo` messages. To turn annotation on for the $$C++$$
   493  backend, you can pass `--cpp_out=annotate_headers=1:normal/output/path` to
   494  `protoc`. In practice, you will also need to provide an `annotation_pragma_name`
   495  and an `annotation_guard_name`, so the full `cpp_out` value may look like
   496  `annotate_headers=1,annotation_pragma_name=kythe_metadata,annotation_guard_name=KYTHE_IS_RUNNING:normal/output/path`.
   497  
   498  When `annotate_headers=1` is asserted to the $$C++$$ backend, it will generate
   499  `.meta` files alongside any files with annotations. For example, in the same
   500  directory as `example.pb.h`, you will find an `example.pb.h.meta` file. This
   501  file contains a serialized `proto2.GeneratedCodeInfo` message. This message
   502  contains a series of spans in `example.pb.h`, the filenames to the `.proto`
   503  files that caused those spans to be generated, and the AST paths in the
   504  `FileDescriptorProto` for those `.proto` files. `example.pb.h` explicitly
   505  depends on `example.pb.h.meta` using a pragma and a preprocessor symbol:
   506  
   507  [source,c]
   508  --------------------------------------------------------------------------------
   509  // Generated by the protocol buffer compiler.  DO NOT EDIT!
   510  // source: kythe/examples/proto/example.proto
   511  
   512  ...
   513  
   514  #ifdef KYTHE_IS_RUNNING
   515  #pragma kythe_metadata "kythe/examples/proto/example.pb.h.meta"
   516  #endif  // KYTHE_IS_RUNNING
   517  
   518  ...
   519  --------------------------------------------------------------------------------
   520  
   521  The Kythe $$C++$$ extractor and indexer both understand what to do with this
   522  pragma (and both define `KYTHE_IS_RUNNING`). The extractor will add the `.meta`
   523  file to the `kzip` it produces; the indexer will load the `.meta` file,
   524  translate it from `protoc` annotations to generic Kythe metadata, and use it
   525  to append `generates` edges for `defines/binding` edges emitted from
   526  `example.pb.h`.
   527  
   528  [kythe,dot,"first indexing pipeline",0]
   529  --------------------------------------------------------------------------------
   530  digraph G {
   531  size="7,7!";
   532  usersrc [label="proto_user.cc", shape=note];
   533  ccextractor [label="C++ extractor", shape=rectangle];
   534  kzip [label="proto_user.kzip", shape=note];
   535  protosrc [label=".proto", shape=note];
   536  frontend [label="protoc", shape=rectangle];
   537  descriptor [label="descriptor", shape=note];
   538  descriptorfile [label="FileDescriptorSet", shape=note];
   539  indexer [label="Kythe proto_indexer", shape=rectangle];
   540  ccindexer [label="Kythe C++ indexer", shape=rectangle];
   541  backend [label="C++ language backend", shape=rectangle];
   542  ccsrc [label=".pb.h", shape=note];
   543  ccmeta [label=".pb.h.meta", shape=note, color=blue];
   544  entries [label="Kythe entries", shape=note];
   545  protosrc -> frontend;
   546  frontend -> descriptor;
   547  frontend -> descriptorfile;
   548  protosrc -> indexer;
   549  descriptorfile -> indexer;
   550  descriptor -> backend;
   551  backend -> ccsrc;
   552  backend -> ccmeta [color=blue];
   553  ccsrc -> ccmeta [color=blue];
   554  indexer -> entries;
   555  usersrc -> ccsrc;
   556  usersrc -> ccextractor;
   557  ccsrc -> ccextractor;
   558  ccmeta -> ccextractor [color=blue];
   559  ccextractor -> kzip;
   560  kzip -> ccindexer;
   561  ccindexer -> entries;
   562  }
   563  --------------------------------------------------------------------------------
   564  
   565  Now we can write verifier assertions that show we have established a link
   566  between the proto source and use sites of its generated code:
   567  
   568  [source,c]
   569  --------------------------------------------------------------------------------
   570  #include "kythe/examples/proto/example.pb.h"
   571  
   572  //- @Foo ref CxxFooDecl
   573  //- MessageFoo? generates CxxFooDecl
   574  //- vname(_, "kythe", "", "kythe/examples/proto/example.proto", "protobuf")
   575  //-     defines/binding MessageFoo
   576  void UseProto(kythe::examples::proto::example::Foo* foo) {
   577  }
   578  --------------------------------------------------------------------------------
   579  
   580  .Output
   581  ----
   582  MessageFoo: EVar(... = App(vname,
   583      (4.0, kythe, "", kythe/examples/proto/example.proto, protobuf)))
   584  ----
   585  
   586  Of course, Kythe clients need to understand that *generates* edges should be
   587  followed. Solving this problem is out of this document's scope.
   588  
   589  ==== Providing annotations for other languages
   590  
   591  To generate metadata for a different language backend, you must determine or
   592  implement the following:
   593  
   594  * The `protoc` backend for the language must be able to produce
   595    `proto2.GeneratedCodeInfo` buffers.
   596  * There must be some way to signal to your indexer and extractor that a
   597    `.meta` file is associated with a different source file.
   598  * That `.meta` file must be made available to the extractor during extraction.
   599    For hermetic build systems, this means that the target driving `protoc` must
   600    list the `.meta` file as an output. Any target that uses that `protoc`
   601    target must require the `.meta` file as an input.
   602  * The indexer must read the `.meta` file and use it to emit `generates`
   603    edges that connect up to the nodes produced by the Kythe proto_indexer.
   604  
   605  The method for annotating source code is designed such that it can
   606  be implemented purely at the output stage; for example, if you have an
   607  abstraction for emitting *defines/binding* edges from anchors, you can
   608  check at every edge (starting from a file with loaded metadata) whether you
   609  should emit an additional `generates` edge.