kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/docs/schema/marked-source.txt (about)

     1  // Copyright 2017 The Kythe Authors. All rights reserved.
     2  //
     3  // Licensed under the Apache License, Version 2.0 (the "License");
     4  // you may not use this file except in compliance with the License.
     5  // You may obtain a copy of the License at
     6  //
     7  //   http://www.apache.org/licenses/LICENSE-2.0
     8  //
     9  // Unless required by applicable law or agreed to in writing, software
    10  // distributed under the License is distributed on an "AS IS" BASIS,
    11  // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    12  // See the License for the specific language governing permissions and
    13  // limitations under the License.
    14  
    15  Annotating nodes for display
    16  ============================
    17  :Revision: 1.0
    18  :toc2:
    19  :toclevels: 3
    20  :priority: 999
    21  
    22  Semantic nodes in a Kythe graph may stand for objects with complex structure,
    23  such as polymorphic functions bearing many type constraints. Representing these
    24  nodes in a UI for human viewers is often complicated. Displaying only the
    25  source text may omit important context (like types inferred by the compiler).
    26  On the other hand, fully expanding the node's internal representation may result
    27  in a very long, difficult-to-read string. Semantic information may also be lost,
    28  as in the case where programmers use transparent `typedef`s in the C family
    29  of languages.
    30  
    31  The schema provides a link:/docs/schema#code[code] fact, when attached to an
    32  arbitrary semantic node in the Kythe graph, instructs clients on how that node
    33  can be presented to users. The fact's value is a serialized `MarkedSource`
    34  protocol buffer message, defined in
    35  link:https://github.com/kythe/kythe/blob/master/kythe/proto/common.proto[common.proto].
    36  Unlike most facts in the Kythe graph, `MarkedSource` is a structured message
    37  rather than a plain string, because clients have differing requirements for the
    38  amount and level of detail they display. By including or excluding various
    39  parts of this message, clients can precisely format a node's presentation
    40  according to their requirements. The message also offers the ability to link
    41  subspans to other nodes and to include other nodes' *code* by reference. Kythe
    42  indexers are responsible for emitting `MarkedSource` messages.
    43  
    44  == Experimenting with `MarkedSource`
    45  
    46  The Kythe repository contains a sample utility for rendering documentation,
    47  including any included `MarkedSource` messages. You can build it with:
    48  
    49  [source,bash]
    50  ----
    51  bazel build //kythe/cxx/doc
    52  ----
    53  
    54  To run it in a mode that will accept and render a ASCII `MarkedSource` message,
    55  use:
    56  
    57  [source,bash]
    58  ----
    59  ./bazel-bin/kythe/cxx/doc/doc --common_signatures
    60  ----
    61  
    62  An empty message produces the following output (shown between double-quotes with
    63  HTML special characters escaped):
    64  
    65  .Output
    66  ----
    67        RenderSimpleIdentifier: ""
    68  RenderSimpleQualifiedName-ID: ""
    69  RenderSimpleQualifiedName+ID: ""
    70  ----
    71  
    72  == Generating `MarkedSource`
    73  
    74  `MarkedSource` messages describe simplified parse trees for source code. The
    75  parse tree represented by a `MarkedSource` message need not correspond exactly
    76  to the surface syntax of the language, but is intended to be as similar as
    77  possible so that a reader familiar with the language will understand the
    78  structure that is represented. Each message is a node in the parse tree.
    79  Messages have kinds (distinct from the `kind` facts on Kythe nodes) that apply
    80  to themselves and their children, so a message with the `TYPE` kind applies the
    81  type nature to itself and its subtree. When tools render `MarkedSource`, they
    82  include or exclude parts of the parse tree by inspecting kinds. For a full
    83  listing of valid kinds, refer to the message definition in
    84  link:https://github.com/kythe/kythe/blob/master/kythe/proto/xref.proto[xref.proto].
    85  
    86  Renderers traverse the tree in order. If a message is elected to be rendered,
    87  its `pre_text` is appended when it is first visited. Each of the message's
    88  children is traversed. After each child is rendered, the parent's `post_child_text`
    89  is appended, unless that child is the last child. Once all of the children have
    90  been traversed, the parent's `post_text` is appended. For example:
    91  
    92  [source,javascript]
    93  ----
    94  kind: IDENTIFIER
    95  pre_text: "pre"
    96  post_child_text: "post_child"
    97  post_text: "post"
    98  ----
    99  
   100  (Here and elsewhere we show `MarkedSource` messages as text format protobuf
   101  messages.)
   102  
   103  .Output
   104  ----
   105        RenderSimpleIdentifier: "prepost"
   106  RenderSimpleQualifiedName-ID: ""
   107  RenderSimpleQualifiedName+ID: "prepost"
   108  ----
   109  
   110  [source,javascript]
   111  ----
   112  kind: IDENTIFIER
   113  pre_text: "pre"
   114  post_child_text: "post_child"
   115  post_text: "post"
   116  child {
   117    pre_text: "1"
   118  }
   119  child {
   120    pre_text: "2"
   121  }
   122  ----
   123  
   124  .Output
   125  ----
   126        RenderSimpleIdentifier: "pre1post_child2post"
   127  RenderSimpleQualifiedName-ID: ""
   128  RenderSimpleQualifiedName+ID: "pre1post_child2post"
   129  ----
   130  
   131  A `MarkedSource` representation of a typical $$C++$$ qualified name would be:
   132  
   133  [source,javascript]
   134  ----
   135  kind: BOX
   136  child {
   137    kind: CONTEXT
   138    child {
   139      kind: IDENTIFIER
   140      pre_text: "std"
   141    }
   142    child {
   143      kind: IDENTIFIER
   144      pre_text: "experimental"
   145    }
   146    post_child_text: "::"
   147    add_final_list_token: true
   148  }
   149  child {
   150    kind: IDENTIFIER
   151    pre_text: "string_view"
   152  }
   153  ----
   154  
   155  .Output
   156  ----
   157        RenderSimpleIdentifier: "string_view"
   158  RenderSimpleQualifiedName-ID: "std::experimental"
   159  RenderSimpleQualifiedName+ID: "std::experimental::string_view"
   160  ----
   161  
   162  A function prototype would look like:
   163  [source,javascript]
   164  ----
   165  child { kind: TYPE pre_text: "void" }
   166  child { pre_text: " " }
   167  child { kind: IDENTIFIER pre_text: "foo" }
   168  child {
   169    kind: PARAMETER
   170    child {
   171      child { kind: TYPE pre_text: "int" }
   172      child { pre_text: " " }
   173      child {
   174        kind: CONTEXT
   175        child { kind: IDENTIFIER pre_text: "foo" }
   176        post_child_text: "::"
   177        add_final_list_token: true
   178      }
   179      child { kind: IDENTIFIER pre_text: "x" }
   180    }
   181    child {
   182      child { kind: TYPE pre_text: "int" }
   183      child { pre_text: " " }
   184      child {
   185        kind: CONTEXT
   186        child { kind: IDENTIFIER pre_text: "foo" }
   187        post_child_text: "::"
   188        add_final_list_token: true
   189      }
   190      child { kind: IDENTIFIER pre_text: "y" }
   191    }
   192    pre_text: "("
   193    post_child_text: ", "
   194    post_text: ")"
   195  }
   196  ----
   197  
   198  .Output
   199  ----
   200        RenderSimpleIdentifier: "foo"
   201            RenderSimpleParams: "x"
   202            RenderSimpleParams: "y"
   203  RenderSimpleQualifiedName-ID: ""
   204  RenderSimpleQualifiedName+ID: "foo"
   205  ----
   206  
   207  === Including `MarkedSource` by reference
   208  
   209  In the function prototype example above, the `MarkedSource` for `x` and `y` will
   210  appear duplicated in the indexer output: once for each variable, then again
   211  in the `code` fact for `foo`. It is possible to avert this duplication by
   212  including the `code` of another node in the Kythe graph by using a `LOOKUP`
   213  message kind. For example, the prototype could have been equivalently written:
   214  
   215  [source,javascript]
   216  ----
   217  child { kind: TYPE pre_text: "void" }
   218  child { pre_text: " " }
   219  child { kind: IDENTIFIER pre_text: "foo" }
   220  child {
   221    kind: PARAMETER_LOOKUP_BY_PARAM
   222    pre_text: "("
   223    post_child_text: ", "
   224    post_text: ")"
   225  }
   226  ----
   227  
   228  WARNING: There is a tradeoff between size and speed in the use of the `LOOKUP`
   229  kinds. You should not expect more than one `LOOKUP` level to be dereferenced
   230  by the serving infrastructure on your behalf.
   231  
   232  == Testing `MarkedSource` facts
   233  
   234  The link:../kythe-verifier.html[verifier] supports checking `MarkedSource`
   235  subtrees by exploding the protocol buffer into a subgraph. Because this
   236  behavior can add many facts to its database, it is disabled by default.
   237  Enable it using the `--convert_marked_source` flag. If some node `N` has
   238  a fact such that `N.code` is an encoded `MarkedSource`, that fact will be
   239  replaced with a synthesized `code` edge connected to the root `MarkedSource`
   240  node with facts that are named the same as the fields in the `MarkedSource`
   241  proto definition. Child messages are attached via `ParentMS child.N ChildMS`
   242  edges, where `N` is the zero-based index of the child in the parent. For
   243  example, the following test script checks the `MarkedSource` attached to a
   244  $$C++$$ variable:
   245  
   246  [kythe,C++,"Variable source.",1,"","--convert_marked_source"]
   247  --------------------------------------------------------------------------------
   248  //- @x defines/binding VarX
   249  //- VarX code VXRoot
   250  //- VXRoot child.0 VXType
   251  //- VXType.pre_text int
   252  //- VXType.kind "TYPE"
   253  //- VXRoot child.1 VXSpaceBox
   254  //- VXSpaceBox.pre_text " "
   255  //- VXRoot child.2 VXIdentifier
   256  //- VXIdentifier.kind "IDENTIFIER"
   257  //- VXIdentifier.pre_text x
   258  int x;
   259  --------------------------------------------------------------------------------