github.com/graybobo/golang.org-package-offline-cache@v0.0.0-20200626051047-6608995c132f/x/tools/go/pointer/doc14.go (about) 1 // Copyright 2013 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 // +build !go1.5 6 7 /* 8 9 Package pointer implements Andersen's analysis, an inclusion-based 10 pointer analysis algorithm first described in (Andersen, 1994). 11 12 A pointer analysis relates every pointer expression in a whole program 13 to the set of memory locations to which it might point. This 14 information can be used to construct a call graph of the program that 15 precisely represents the destinations of dynamic function and method 16 calls. It can also be used to determine, for example, which pairs of 17 channel operations operate on the same channel. 18 19 The package allows the client to request a set of expressions of 20 interest for which the points-to information will be returned once the 21 analysis is complete. In addition, the client may request that a 22 callgraph is constructed. The example program in example_test.go 23 demonstrates both of these features. Clients should not request more 24 information than they need since it may increase the cost of the 25 analysis significantly. 26 27 28 CLASSIFICATION 29 30 Our algorithm is INCLUSION-BASED: the points-to sets for x and y will 31 be related by pts(y) ⊇ pts(x) if the program contains the statement 32 y = x. 33 34 It is FLOW-INSENSITIVE: it ignores all control flow constructs and the 35 order of statements in a program. It is therefore a "MAY ALIAS" 36 analysis: its facts are of the form "P may/may not point to L", 37 not "P must point to L". 38 39 It is FIELD-SENSITIVE: it builds separate points-to sets for distinct 40 fields, such as x and y in struct { x, y *int }. 41 42 It is mostly CONTEXT-INSENSITIVE: most functions are analyzed once, 43 so values can flow in at one call to the function and return out at 44 another. Only some smaller functions are analyzed with consideration 45 of their calling context. 46 47 It has a CONTEXT-SENSITIVE HEAP: objects are named by both allocation 48 site and context, so the objects returned by two distinct calls to f: 49 func f() *T { return new(T) } 50 are distinguished up to the limits of the calling context. 51 52 It is a WHOLE PROGRAM analysis: it requires SSA-form IR for the 53 complete Go program and summaries for native code. 54 55 See the (Hind, PASTE'01) survey paper for an explanation of these terms. 56 57 58 SOUNDNESS 59 60 The analysis is fully sound when invoked on pure Go programs that do not 61 use reflection or unsafe.Pointer conversions. In other words, if there 62 is any possible execution of the program in which pointer P may point to 63 object O, the analysis will report that fact. 64 65 66 REFLECTION 67 68 By default, the "reflect" library is ignored by the analysis, as if all 69 its functions were no-ops, but if the client enables the Reflection flag, 70 the analysis will make a reasonable attempt to model the effects of 71 calls into this library. However, this comes at a significant 72 performance cost, and not all features of that library are yet 73 implemented. In addition, some simplifying approximations must be made 74 to ensure that the analysis terminates; for example, reflection can be 75 used to construct an infinite set of types and values of those types, 76 but the analysis arbitrarily bounds the depth of such types. 77 78 Most but not all reflection operations are supported. 79 In particular, addressable reflect.Values are not yet implemented, so 80 operations such as (reflect.Value).Set have no analytic effect. 81 82 83 UNSAFE POINTER CONVERSIONS 84 85 The pointer analysis makes no attempt to understand aliasing between the 86 operand x and result y of an unsafe.Pointer conversion: 87 y = (*T)(unsafe.Pointer(x)) 88 It is as if the conversion allocated an entirely new object: 89 y = new(T) 90 91 92 NATIVE CODE 93 94 The analysis cannot model the aliasing effects of functions written in 95 languages other than Go, such as runtime intrinsics in C or assembly, or 96 code accessed via cgo. The result is as if such functions are no-ops. 97 However, various important intrinsics are understood by the analysis, 98 along with built-ins such as append. 99 100 The analysis currently provides no way for users to specify the aliasing 101 effects of native code. 102 103 ------------------------------------------------------------------------ 104 105 IMPLEMENTATION 106 107 The remaining documentation is intended for package maintainers and 108 pointer analysis specialists. Maintainers should have a solid 109 understanding of the referenced papers (especially those by H&L and PKH) 110 before making making significant changes. 111 112 The implementation is similar to that described in (Pearce et al, 113 PASTE'04). Unlike many algorithms which interleave constraint 114 generation and solving, constructing the callgraph as they go, this 115 implementation for the most part observes a phase ordering (generation 116 before solving), with only simple (copy) constraints being generated 117 during solving. (The exception is reflection, which creates various 118 constraints during solving as new types flow to reflect.Value 119 operations.) This improves the traction of presolver optimisations, 120 but imposes certain restrictions, e.g. potential context sensitivity 121 is limited since all variants must be created a priori. 122 123 124 TERMINOLOGY 125 126 A type is said to be "pointer-like" if it is a reference to an object. 127 Pointer-like types include pointers and also interfaces, maps, channels, 128 functions and slices. 129 130 We occasionally use C's x->f notation to distinguish the case where x 131 is a struct pointer from x.f where is a struct value. 132 133 Pointer analysis literature (and our comments) often uses the notation 134 dst=*src+offset to mean something different than what it means in Go. 135 It means: for each node index p in pts(src), the node index p+offset is 136 in pts(dst). Similarly *dst+offset=src is used for store constraints 137 and dst=src+offset for offset-address constraints. 138 139 140 NODES 141 142 Nodes are the key datastructure of the analysis, and have a dual role: 143 they represent both constraint variables (equivalence classes of 144 pointers) and members of points-to sets (things that can be pointed 145 at, i.e. "labels"). 146 147 Nodes are naturally numbered. The numbering enables compact 148 representations of sets of nodes such as bitvectors (or BDDs); and the 149 ordering enables a very cheap way to group related nodes together. For 150 example, passing n parameters consists of generating n parallel 151 constraints from caller+i to callee+i for 0<=i<n. 152 153 The zero nodeid means "not a pointer". For simplicity, we generate flow 154 constraints even for non-pointer types such as int. The pointer 155 equivalence (PE) presolver optimization detects which variables cannot 156 point to anything; this includes not only all variables of non-pointer 157 types (such as int) but also variables of pointer-like types if they are 158 always nil, or are parameters to a function that is never called. 159 160 Each node represents a scalar part of a value or object. 161 Aggregate types (structs, tuples, arrays) are recursively flattened 162 out into a sequential list of scalar component types, and all the 163 elements of an array are represented by a single node. (The 164 flattening of a basic type is a list containing a single node.) 165 166 Nodes are connected into a graph with various kinds of labelled edges: 167 simple edges (or copy constraints) represent value flow. Complex 168 edges (load, store, etc) trigger the creation of new simple edges 169 during the solving phase. 170 171 172 OBJECTS 173 174 Conceptually, an "object" is a contiguous sequence of nodes denoting 175 an addressable location: something that a pointer can point to. The 176 first node of an object has a non-nil obj field containing information 177 about the allocation: its size, context, and ssa.Value. 178 179 Objects include: 180 - functions and globals; 181 - variable allocations in the stack frame or heap; 182 - maps, channels and slices created by calls to make(); 183 - allocations to construct an interface; 184 - allocations caused by conversions, e.g. []byte(str). 185 - arrays allocated by calls to append(); 186 187 Many objects have no Go types. For example, the func, map and chan type 188 kinds in Go are all varieties of pointers, but their respective objects 189 are actual functions (executable code), maps (hash tables), and channels 190 (synchronized queues). Given the way we model interfaces, they too are 191 pointers to "tagged" objects with no Go type. And an *ssa.Global denotes 192 the address of a global variable, but the object for a Global is the 193 actual data. So, the types of an ssa.Value that creates an object is 194 "off by one indirection": a pointer to the object. 195 196 The individual nodes of an object are sometimes referred to as "labels". 197 198 For uniformity, all objects have a non-zero number of fields, even those 199 of the empty type struct{}. (All arrays are treated as if of length 1, 200 so there are no empty arrays. The empty tuple is never address-taken, 201 so is never an object.) 202 203 204 TAGGED OBJECTS 205 206 An tagged object has the following layout: 207 208 T -- obj.flags ⊇ {otTagged} 209 v 210 ... 211 212 The T node's typ field is the dynamic type of the "payload": the value 213 v which follows, flattened out. The T node's obj has the otTagged 214 flag. 215 216 Tagged objects are needed when generalizing across types: interfaces, 217 reflect.Values, reflect.Types. Each of these three types is modelled 218 as a pointer that exclusively points to tagged objects. 219 220 Tagged objects may be indirect (obj.flags ⊇ {otIndirect}) meaning that 221 the value v is not of type T but *T; this is used only for 222 reflect.Values that represent lvalues. (These are not implemented yet.) 223 224 225 ANALYSIS ABSTRACTION OF EACH TYPE 226 227 Variables of the following "scalar" types may be represented by a 228 single node: basic types, pointers, channels, maps, slices, 'func' 229 pointers, interfaces. 230 231 Pointers 232 Nothing to say here, oddly. 233 234 Basic types (bool, string, numbers, unsafe.Pointer) 235 Currently all fields in the flattening of a type, including 236 non-pointer basic types such as int, are represented in objects and 237 values. Though non-pointer nodes within values are uninteresting, 238 non-pointer nodes in objects may be useful (if address-taken) 239 because they permit the analysis to deduce, in this example, 240 241 var s struct{ ...; x int; ... } 242 p := &s.x 243 244 that p points to s.x. If we ignored such object fields, we could only 245 say that p points somewhere within s. 246 247 All other basic types are ignored. Expressions of these types have 248 zero nodeid, and fields of these types within aggregate other types 249 are omitted. 250 251 unsafe.Pointers are not modelled as pointers, so a conversion of an 252 unsafe.Pointer to *T is (unsoundly) treated equivalent to new(T). 253 254 Channels 255 An expression of type 'chan T' is a kind of pointer that points 256 exclusively to channel objects, i.e. objects created by MakeChan (or 257 reflection). 258 259 'chan T' is treated like *T. 260 *ssa.MakeChan is treated as equivalent to new(T). 261 *ssa.Send and receive (*ssa.UnOp(ARROW)) and are equivalent to store 262 and load. 263 264 Maps 265 An expression of type 'map[K]V' is a kind of pointer that points 266 exclusively to map objects, i.e. objects created by MakeMap (or 267 reflection). 268 269 map K[V] is treated like *M where M = struct{k K; v V}. 270 *ssa.MakeMap is equivalent to new(M). 271 *ssa.MapUpdate is equivalent to *y=x where *y and x have type M. 272 *ssa.Lookup is equivalent to y=x.v where x has type *M. 273 274 Slices 275 A slice []T, which dynamically resembles a struct{array *T, len, cap int}, 276 is treated as if it were just a *T pointer; the len and cap fields are 277 ignored. 278 279 *ssa.MakeSlice is treated like new([1]T): an allocation of a 280 singleton array. 281 *ssa.Index on a slice is equivalent to a load. 282 *ssa.IndexAddr on a slice returns the address of the sole element of the 283 slice, i.e. the same address. 284 *ssa.Slice is treated as a simple copy. 285 286 Functions 287 An expression of type 'func...' is a kind of pointer that points 288 exclusively to function objects. 289 290 A function object has the following layout: 291 292 identity -- typ:*types.Signature; obj.flags ⊇ {otFunction} 293 params_0 -- (the receiver, if a method) 294 ... 295 params_n-1 296 results_0 297 ... 298 results_m-1 299 300 There may be multiple function objects for the same *ssa.Function 301 due to context-sensitive treatment of some functions. 302 303 The first node is the function's identity node. 304 Associated with every callsite is a special "targets" variable, 305 whose pts() contains the identity node of each function to which 306 the call may dispatch. Identity words are not otherwise used during 307 the analysis, but we construct the call graph from the pts() 308 solution for such nodes. 309 310 The following block of contiguous nodes represents the flattened-out 311 types of the parameters ("P-block") and results ("R-block") of the 312 function object. 313 314 The treatment of free variables of closures (*ssa.FreeVar) is like 315 that of global variables; it is not context-sensitive. 316 *ssa.MakeClosure instructions create copy edges to Captures. 317 318 A Go value of type 'func' (i.e. a pointer to one or more functions) 319 is a pointer whose pts() contains function objects. The valueNode() 320 for an *ssa.Function returns a singleton for that function. 321 322 Interfaces 323 An expression of type 'interface{...}' is a kind of pointer that 324 points exclusively to tagged objects. All tagged objects pointed to 325 by an interface are direct (the otIndirect flag is clear) and 326 concrete (the tag type T is not itself an interface type). The 327 associated ssa.Value for an interface's tagged objects may be an 328 *ssa.MakeInterface instruction, or nil if the tagged object was 329 created by an instrinsic (e.g. reflection). 330 331 Constructing an interface value causes generation of constraints for 332 all of the concrete type's methods; we can't tell a priori which 333 ones may be called. 334 335 TypeAssert y = x.(T) is implemented by a dynamic constraint 336 triggered by each tagged object O added to pts(x): a typeFilter 337 constraint if T is an interface type, or an untag constraint if T is 338 a concrete type. A typeFilter tests whether O.typ implements T; if 339 so, O is added to pts(y). An untagFilter tests whether O.typ is 340 assignable to T,and if so, a copy edge O.v -> y is added. 341 342 ChangeInterface is a simple copy because the representation of 343 tagged objects is independent of the interface type (in contrast 344 to the "method tables" approach used by the gc runtime). 345 346 y := Invoke x.m(...) is implemented by allocating contiguous P/R 347 blocks for the callsite and adding a dynamic rule triggered by each 348 tagged object added to pts(x). The rule adds param/results copy 349 edges to/from each discovered concrete method. 350 351 (Q. Why do we model an interface as a pointer to a pair of type and 352 value, rather than as a pair of a pointer to type and a pointer to 353 value? 354 A. Control-flow joins would merge interfaces ({T1}, {V1}) and ({T2}, 355 {V2}) to make ({T1,T2}, {V1,V2}), leading to the infeasible and 356 type-unsafe combination (T1,V2). Treating the value and its concrete 357 type as inseparable makes the analysis type-safe.) 358 359 reflect.Value 360 A reflect.Value is modelled very similar to an interface{}, i.e. as 361 a pointer exclusively to tagged objects, but with two generalizations. 362 363 1) a reflect.Value that represents an lvalue points to an indirect 364 (obj.flags ⊇ {otIndirect}) tagged object, which has a similar 365 layout to an tagged object except that the value is a pointer to 366 the dynamic type. Indirect tagged objects preserve the correct 367 aliasing so that mutations made by (reflect.Value).Set can be 368 observed. 369 370 Indirect objects only arise when an lvalue is derived from an 371 rvalue by indirection, e.g. the following code: 372 373 type S struct { X T } 374 var s S 375 var i interface{} = &s // i points to a *S-tagged object (from MakeInterface) 376 v1 := reflect.ValueOf(i) // v1 points to same *S-tagged object as i 377 v2 := v1.Elem() // v2 points to an indirect S-tagged object, pointing to s 378 v3 := v2.FieldByName("X") // v3 points to an indirect int-tagged object, pointing to s.X 379 v3.Set(y) // pts(s.X) ⊇ pts(y) 380 381 Whether indirect or not, the concrete type of the tagged object 382 corresponds to the user-visible dynamic type, and the existence 383 of a pointer is an implementation detail. 384 385 (NB: indirect tagged objects are not yet implemented) 386 387 2) The dynamic type tag of a tagged object pointed to by a 388 reflect.Value may be an interface type; it need not be concrete. 389 390 This arises in code such as this: 391 tEface := reflect.TypeOf(new(interface{}).Elem() // interface{} 392 eface := reflect.Zero(tEface) 393 pts(eface) is a singleton containing an interface{}-tagged 394 object. That tagged object's payload is an interface{} value, 395 i.e. the pts of the payload contains only concrete-tagged 396 objects, although in this example it's the zero interface{} value, 397 so its pts is empty. 398 399 reflect.Type 400 Just as in the real "reflect" library, we represent a reflect.Type 401 as an interface whose sole implementation is the concrete type, 402 *reflect.rtype. (This choice is forced on us by go/types: clients 403 cannot fabricate types with arbitrary method sets.) 404 405 rtype instances are canonical: there is at most one per dynamic 406 type. (rtypes are in fact large structs but since identity is all 407 that matters, we represent them by a single node.) 408 409 The payload of each *rtype-tagged object is an *rtype pointer that 410 points to exactly one such canonical rtype object. We exploit this 411 by setting the node.typ of the payload to the dynamic type, not 412 '*rtype'. This saves us an indirection in each resolution rule. As 413 an optimisation, *rtype-tagged objects are canonicalized too. 414 415 416 Aggregate types: 417 418 Aggregate types are treated as if all directly contained 419 aggregates are recursively flattened out. 420 421 Structs 422 *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. 423 424 *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create 425 simple edges for each struct discovered in pts(x). 426 427 The nodes of a struct consist of a special 'identity' node (whose 428 type is that of the struct itself), followed by the nodes for all 429 the struct's fields, recursively flattened out. A pointer to the 430 struct is a pointer to its identity node. That node allows us to 431 distinguish a pointer to a struct from a pointer to its first field. 432 433 Field offsets are logical field offsets (plus one for the identity 434 node), so the sizes of the fields can be ignored by the analysis. 435 436 (The identity node is non-traditional but enables the distiction 437 described above, which is valuable for code comprehension tools. 438 Typical pointer analyses for C, whose purpose is compiler 439 optimization, must soundly model unsafe.Pointer (void*) conversions, 440 and this requires fidelity to the actual memory layout using physical 441 field offsets.) 442 443 *ssa.Field y = x.f creates a simple edge to y from x's node at f's offset. 444 445 *ssa.FieldAddr y = &x->f requires a dynamic closure rule to create 446 simple edges for each struct discovered in pts(x). 447 448 Arrays 449 We model an array by an identity node (whose type is that of the 450 array itself) followed by a node representing all the elements of 451 the array; the analysis does not distinguish elements with different 452 indices. Effectively, an array is treated like struct{elem T}, a 453 load y=x[i] like y=x.elem, and a store x[i]=y like x.elem=y; the 454 index i is ignored. 455 456 A pointer to an array is pointer to its identity node. (A slice is 457 also a pointer to an array's identity node.) The identity node 458 allows us to distinguish a pointer to an array from a pointer to one 459 of its elements, but it is rather costly because it introduces more 460 offset constraints into the system. Furthermore, sound treatment of 461 unsafe.Pointer would require us to dispense with this node. 462 463 Arrays may be allocated by Alloc, by make([]T), by calls to append, 464 and via reflection. 465 466 Tuples (T, ...) 467 Tuples are treated like structs with naturally numbered fields. 468 *ssa.Extract is analogous to *ssa.Field. 469 470 However, tuples have no identity field since by construction, they 471 cannot be address-taken. 472 473 474 FUNCTION CALLS 475 476 There are three kinds of function call: 477 (1) static "call"-mode calls of functions. 478 (2) dynamic "call"-mode calls of functions. 479 (3) dynamic "invoke"-mode calls of interface methods. 480 Cases 1 and 2 apply equally to methods and standalone functions. 481 482 Static calls. 483 A static call consists three steps: 484 - finding the function object of the callee; 485 - creating copy edges from the actual parameter value nodes to the 486 P-block in the function object (this includes the receiver if 487 the callee is a method); 488 - creating copy edges from the R-block in the function object to 489 the value nodes for the result of the call. 490 491 A static function call is little more than two struct value copies 492 between the P/R blocks of caller and callee: 493 494 callee.P = caller.P 495 caller.R = callee.R 496 497 Context sensitivity 498 499 Static calls (alone) may be treated context sensitively, 500 i.e. each callsite may cause a distinct re-analysis of the 501 callee, improving precision. Our current context-sensitivity 502 policy treats all intrinsics and getter/setter methods in this 503 manner since such functions are small and seem like an obvious 504 source of spurious confluences, though this has not yet been 505 evaluated. 506 507 Dynamic function calls 508 509 Dynamic calls work in a similar manner except that the creation of 510 copy edges occurs dynamically, in a similar fashion to a pair of 511 struct copies in which the callee is indirect: 512 513 callee->P = caller.P 514 caller.R = callee->R 515 516 (Recall that the function object's P- and R-blocks are contiguous.) 517 518 Interface method invocation 519 520 For invoke-mode calls, we create a params/results block for the 521 callsite and attach a dynamic closure rule to the interface. For 522 each new tagged object that flows to the interface, we look up 523 the concrete method, find its function object, and connect its P/R 524 blocks to the callsite's P/R blocks, adding copy edges to the graph 525 during solving. 526 527 Recording call targets 528 529 The analysis notifies its clients of each callsite it encounters, 530 passing a CallSite interface. Among other things, the CallSite 531 contains a synthetic constraint variable ("targets") whose 532 points-to solution includes the set of all function objects to 533 which the call may dispatch. 534 535 It is via this mechanism that the callgraph is made available. 536 Clients may also elect to be notified of callgraph edges directly; 537 internally this just iterates all "targets" variables' pts(·)s. 538 539 540 PRESOLVER 541 542 We implement Hash-Value Numbering (HVN), a pre-solver constraint 543 optimization described in Hardekopf & Lin, SAS'07. This is documented 544 in more detail in hvn.go. We intend to add its cousins HR and HU in 545 future. 546 547 548 SOLVER 549 550 The solver is currently a naive Andersen-style implementation; it does 551 not perform online cycle detection, though we plan to add solver 552 optimisations such as Hybrid- and Lazy- Cycle Detection from (Hardekopf 553 & Lin, PLDI'07). 554 555 It uses difference propagation (Pearce et al, SQC'04) to avoid 556 redundant re-triggering of closure rules for values already seen. 557 558 Points-to sets are represented using sparse bit vectors (similar to 559 those used in LLVM and gcc), which are more space- and time-efficient 560 than sets based on Go's built-in map type or dense bit vectors. 561 562 Nodes are permuted prior to solving so that object nodes (which may 563 appear in points-to sets) are lower numbered than non-object (var) 564 nodes. This improves the density of the set over which the PTSs 565 range, and thus the efficiency of the representation. 566 567 Partly thanks to avoiding map iteration, the execution of the solver is 568 100% deterministic, a great help during debugging. 569 570 571 FURTHER READING 572 573 Andersen, L. O. 1994. Program analysis and specialization for the C 574 programming language. Ph.D. dissertation. DIKU, University of 575 Copenhagen. 576 577 David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Efficient 578 field-sensitive pointer analysis for C. In Proceedings of the 5th ACM 579 SIGPLAN-SIGSOFT workshop on Program analysis for software tools and 580 engineering (PASTE '04). ACM, New York, NY, USA, 37-42. 581 http://doi.acm.org/10.1145/996821.996835 582 583 David J. Pearce, Paul H. J. Kelly, and Chris Hankin. 2004. Online 584 Cycle Detection and Difference Propagation: Applications to Pointer 585 Analysis. Software Quality Control 12, 4 (December 2004), 311-337. 586 http://dx.doi.org/10.1023/B:SQJO.0000039791.93071.a2 587 588 David Grove and Craig Chambers. 2001. A framework for call graph 589 construction algorithms. ACM Trans. Program. Lang. Syst. 23, 6 590 (November 2001), 685-746. 591 http://doi.acm.org/10.1145/506315.506316 592 593 Ben Hardekopf and Calvin Lin. 2007. The ant and the grasshopper: fast 594 and accurate pointer analysis for millions of lines of code. In 595 Proceedings of the 2007 ACM SIGPLAN conference on Programming language 596 design and implementation (PLDI '07). ACM, New York, NY, USA, 290-299. 597 http://doi.acm.org/10.1145/1250734.1250767 598 599 Ben Hardekopf and Calvin Lin. 2007. Exploiting pointer and location 600 equivalence to optimize pointer analysis. In Proceedings of the 14th 601 international conference on Static Analysis (SAS'07), Hanne Riis 602 Nielson and Gilberto Filé (Eds.). Springer-Verlag, Berlin, Heidelberg, 603 265-280. 604 605 Atanas Rountev and Satish Chandra. 2000. Off-line variable substitution 606 for scaling points-to analysis. In Proceedings of the ACM SIGPLAN 2000 607 conference on Programming language design and implementation (PLDI '00). 608 ACM, New York, NY, USA, 47-56. DOI=10.1145/349299.349310 609 http://doi.acm.org/10.1145/349299.349310 610 611 */ 612 package pointer // import "golang.org/x/tools/go/pointer"