kythe.io@v0.0.68-0.20240422202219-7225dbc01741/kythe/docs/schema/marked-source.txt (about) 1 // Copyright 2017 The Kythe Authors. All rights reserved. 2 // 3 // Licensed under the Apache License, Version 2.0 (the "License"); 4 // you may not use this file except in compliance with the License. 5 // You may obtain a copy of the License at 6 // 7 // http://www.apache.org/licenses/LICENSE-2.0 8 // 9 // Unless required by applicable law or agreed to in writing, software 10 // distributed under the License is distributed on an "AS IS" BASIS, 11 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 // See the License for the specific language governing permissions and 13 // limitations under the License. 14 15 Annotating nodes for display 16 ============================ 17 :Revision: 1.0 18 :toc2: 19 :toclevels: 3 20 :priority: 999 21 22 Semantic nodes in a Kythe graph may stand for objects with complex structure, 23 such as polymorphic functions bearing many type constraints. Representing these 24 nodes in a UI for human viewers is often complicated. Displaying only the 25 source text may omit important context (like types inferred by the compiler). 26 On the other hand, fully expanding the node's internal representation may result 27 in a very long, difficult-to-read string. Semantic information may also be lost, 28 as in the case where programmers use transparent `typedef`s in the C family 29 of languages. 30 31 The schema provides a link:/docs/schema#code[code] fact, when attached to an 32 arbitrary semantic node in the Kythe graph, instructs clients on how that node 33 can be presented to users. The fact's value is a serialized `MarkedSource` 34 protocol buffer message, defined in 35 link:https://github.com/kythe/kythe/blob/master/kythe/proto/common.proto[common.proto]. 36 Unlike most facts in the Kythe graph, `MarkedSource` is a structured message 37 rather than a plain string, because clients have differing requirements for the 38 amount and level of detail they display. By including or excluding various 39 parts of this message, clients can precisely format a node's presentation 40 according to their requirements. The message also offers the ability to link 41 subspans to other nodes and to include other nodes' *code* by reference. Kythe 42 indexers are responsible for emitting `MarkedSource` messages. 43 44 == Experimenting with `MarkedSource` 45 46 The Kythe repository contains a sample utility for rendering documentation, 47 including any included `MarkedSource` messages. You can build it with: 48 49 [source,bash] 50 ---- 51 bazel build //kythe/cxx/doc 52 ---- 53 54 To run it in a mode that will accept and render a ASCII `MarkedSource` message, 55 use: 56 57 [source,bash] 58 ---- 59 ./bazel-bin/kythe/cxx/doc/doc --common_signatures 60 ---- 61 62 An empty message produces the following output (shown between double-quotes with 63 HTML special characters escaped): 64 65 .Output 66 ---- 67 RenderSimpleIdentifier: "" 68 RenderSimpleQualifiedName-ID: "" 69 RenderSimpleQualifiedName+ID: "" 70 ---- 71 72 == Generating `MarkedSource` 73 74 `MarkedSource` messages describe simplified parse trees for source code. The 75 parse tree represented by a `MarkedSource` message need not correspond exactly 76 to the surface syntax of the language, but is intended to be as similar as 77 possible so that a reader familiar with the language will understand the 78 structure that is represented. Each message is a node in the parse tree. 79 Messages have kinds (distinct from the `kind` facts on Kythe nodes) that apply 80 to themselves and their children, so a message with the `TYPE` kind applies the 81 type nature to itself and its subtree. When tools render `MarkedSource`, they 82 include or exclude parts of the parse tree by inspecting kinds. For a full 83 listing of valid kinds, refer to the message definition in 84 link:https://github.com/kythe/kythe/blob/master/kythe/proto/xref.proto[xref.proto]. 85 86 Renderers traverse the tree in order. If a message is elected to be rendered, 87 its `pre_text` is appended when it is first visited. Each of the message's 88 children is traversed. After each child is rendered, the parent's `post_child_text` 89 is appended, unless that child is the last child. Once all of the children have 90 been traversed, the parent's `post_text` is appended. For example: 91 92 [source,javascript] 93 ---- 94 kind: IDENTIFIER 95 pre_text: "pre" 96 post_child_text: "post_child" 97 post_text: "post" 98 ---- 99 100 (Here and elsewhere we show `MarkedSource` messages as text format protobuf 101 messages.) 102 103 .Output 104 ---- 105 RenderSimpleIdentifier: "prepost" 106 RenderSimpleQualifiedName-ID: "" 107 RenderSimpleQualifiedName+ID: "prepost" 108 ---- 109 110 [source,javascript] 111 ---- 112 kind: IDENTIFIER 113 pre_text: "pre" 114 post_child_text: "post_child" 115 post_text: "post" 116 child { 117 pre_text: "1" 118 } 119 child { 120 pre_text: "2" 121 } 122 ---- 123 124 .Output 125 ---- 126 RenderSimpleIdentifier: "pre1post_child2post" 127 RenderSimpleQualifiedName-ID: "" 128 RenderSimpleQualifiedName+ID: "pre1post_child2post" 129 ---- 130 131 A `MarkedSource` representation of a typical $$C++$$ qualified name would be: 132 133 [source,javascript] 134 ---- 135 kind: BOX 136 child { 137 kind: CONTEXT 138 child { 139 kind: IDENTIFIER 140 pre_text: "std" 141 } 142 child { 143 kind: IDENTIFIER 144 pre_text: "experimental" 145 } 146 post_child_text: "::" 147 add_final_list_token: true 148 } 149 child { 150 kind: IDENTIFIER 151 pre_text: "string_view" 152 } 153 ---- 154 155 .Output 156 ---- 157 RenderSimpleIdentifier: "string_view" 158 RenderSimpleQualifiedName-ID: "std::experimental" 159 RenderSimpleQualifiedName+ID: "std::experimental::string_view" 160 ---- 161 162 A function prototype would look like: 163 [source,javascript] 164 ---- 165 child { kind: TYPE pre_text: "void" } 166 child { pre_text: " " } 167 child { kind: IDENTIFIER pre_text: "foo" } 168 child { 169 kind: PARAMETER 170 child { 171 child { kind: TYPE pre_text: "int" } 172 child { pre_text: " " } 173 child { 174 kind: CONTEXT 175 child { kind: IDENTIFIER pre_text: "foo" } 176 post_child_text: "::" 177 add_final_list_token: true 178 } 179 child { kind: IDENTIFIER pre_text: "x" } 180 } 181 child { 182 child { kind: TYPE pre_text: "int" } 183 child { pre_text: " " } 184 child { 185 kind: CONTEXT 186 child { kind: IDENTIFIER pre_text: "foo" } 187 post_child_text: "::" 188 add_final_list_token: true 189 } 190 child { kind: IDENTIFIER pre_text: "y" } 191 } 192 pre_text: "(" 193 post_child_text: ", " 194 post_text: ")" 195 } 196 ---- 197 198 .Output 199 ---- 200 RenderSimpleIdentifier: "foo" 201 RenderSimpleParams: "x" 202 RenderSimpleParams: "y" 203 RenderSimpleQualifiedName-ID: "" 204 RenderSimpleQualifiedName+ID: "foo" 205 ---- 206 207 === Including `MarkedSource` by reference 208 209 In the function prototype example above, the `MarkedSource` for `x` and `y` will 210 appear duplicated in the indexer output: once for each variable, then again 211 in the `code` fact for `foo`. It is possible to avert this duplication by 212 including the `code` of another node in the Kythe graph by using a `LOOKUP` 213 message kind. For example, the prototype could have been equivalently written: 214 215 [source,javascript] 216 ---- 217 child { kind: TYPE pre_text: "void" } 218 child { pre_text: " " } 219 child { kind: IDENTIFIER pre_text: "foo" } 220 child { 221 kind: PARAMETER_LOOKUP_BY_PARAM 222 pre_text: "(" 223 post_child_text: ", " 224 post_text: ")" 225 } 226 ---- 227 228 WARNING: There is a tradeoff between size and speed in the use of the `LOOKUP` 229 kinds. You should not expect more than one `LOOKUP` level to be dereferenced 230 by the serving infrastructure on your behalf. 231 232 == Testing `MarkedSource` facts 233 234 The link:../kythe-verifier.html[verifier] supports checking `MarkedSource` 235 subtrees by exploding the protocol buffer into a subgraph. Because this 236 behavior can add many facts to its database, it is disabled by default. 237 Enable it using the `--convert_marked_source` flag. If some node `N` has 238 a fact such that `N.code` is an encoded `MarkedSource`, that fact will be 239 replaced with a synthesized `code` edge connected to the root `MarkedSource` 240 node with facts that are named the same as the fields in the `MarkedSource` 241 proto definition. Child messages are attached via `ParentMS child.N ChildMS` 242 edges, where `N` is the zero-based index of the child in the parent. For 243 example, the following test script checks the `MarkedSource` attached to a 244 $$C++$$ variable: 245 246 [kythe,C++,"Variable source.",1,"","--convert_marked_source"] 247 -------------------------------------------------------------------------------- 248 //- @x defines/binding VarX 249 //- VarX code VXRoot 250 //- VXRoot child.0 VXType 251 //- VXType.pre_text int 252 //- VXType.kind "TYPE" 253 //- VXRoot child.1 VXSpaceBox 254 //- VXSpaceBox.pre_text " " 255 //- VXRoot child.2 VXIdentifier 256 //- VXIdentifier.kind "IDENTIFIER" 257 //- VXIdentifier.pre_text x 258 int x; 259 --------------------------------------------------------------------------------