github.com/april1989/origin-go-tools@v0.0.32/README.md (about)

     1  # Go Tools -> Go Pointer Analysis
     2  
     3  ## This branch is for call back funcs shown in app func but called after several level of lib calls
     4  
     5  We want to skip the analysis of lib calls, too expensive. If we are going to create synthetic ssa for lib functions,
     6     we start here.
     7     1. write a native.xml file with all irs we want
     8     2. preload all irs in native.xml
     9     3. when reach here @ssa/create.go CreatePackage(), we check if it is in preload, if yes, use this synthetic
    10  
    11  Key files:
    12   ssa/create.go
    13   ssa/builder.go
    14   
    15  Standard libraries: https://pkg.go.dev/std
    16  
    17  #### *Update*
    18  
    19  
    20  ====================================================================================
    21  
    22  Git clone from https://github.com/golang/tools, start from commit 146a0deefdd11b942db7520f68c117335329271a (around v0.5.0-pre1).
    23  
    24  The default go pointer analysis algorithm (v0.5.0-pre1) is at ```go_tools/go/pointer_default```.
    25  
    26  For any panic, please submit an issue with copy/paste crash stack. Thanks.
    27  
    28  ## How to Use?
    29  Go to ```go_tools/main```, and run ```go build```. Then, run ```./main``` with the following flags and 
    30  the directory of the go project that you want to analyze.
    31  It will go through all of your main files and analyze them one by one.
    32  
    33  #### Flags
    34  - *path*: default value = "", Designated project filepath. 
    35  - *doLog*: default value = false, Do a log to record all procedure, so verbose. 
    36  - *doCompare*: default value = false, Do comparison with default pta about performance and result.
    37  
    38  For example,
    39   
    40  ```./main -doLog -doCompare ../grpc-go/benchmark/server```
    41  
    42  This will run the origin-sensitive pointer analysis on all main files under directory ```../grpc-go/benchmark/server```,
    43  as well as generate a full log and a comparison with the default algorithm about performance and result.
    44  
    45  *Note* that ```-doLog``` is very verbose and significantly slowdown the analysis.
    46  
    47  ## User APIs (for detector) 
    48  Go to https://github.com/april1989/origin-go-tools/main/main.go, check how to use the callgraph and queries. 
    49  
    50  ## Origin-sensitive
    51  
    52  #### What is Origin? 
    53  We treat a go routine instruction as an origin entry point, and all variables/function calls inside this go rountine share the same context as their belonging go routine.
    54  
    55  #### Main Changes from Default
    56  Instead of pre-computing all cgnodes and their constraints before actually propagating changes among points-to constraints,
    57  we start from the reachable cgnodes ```init``` and ```main``` and gradually compute reachable cgnodes and their constraints. 
    58  
    59  ## kCFA
    60  
    61  #### Main Changes from Default
    62  - Create k-callsite-sensitive contexts for static/invoke calls
    63  - Generate constraints/cgnode online for invoke calls and targets when it is necessary
    64  - Currently, skip the creation of reflection and dynamic calls due to the huge number
    65  
    66  
    67  ========================================================================
    68  ## Doc of Default Algorithm
    69  
    70  The most recent doc is https://pkg.go.dev/golang.org/x/tools/go/pointer#pkg-overview, quoted:
    71  
    72  "SOUNDNESS
    73  
    74  The analysis is fully sound when invoked on pure Go programs that do not use reflection or unsafe.Pointer conversions. In other words, if there is any possible execution of the program in which pointer P may point to object O, the analysis will report that fact."
    75  
    76  However, over soundness is unnecessary. 
    77  
    78  ========================================================================
    79  ## Major differences between the results of mine and default
    80  
    81  #### 1. Queries and Points-to Set (pts)
    82  All example below based on race_checker/tests/cg.go
    83  
    84  #### Why my query has two pointers for one ssa.Value:
    85  There are two pointers involved in one constraint, both of which are stored under the same ssa.Value in my queries.
    86  
    87  #### Why default query is empty but mines is not:
    88  Default has tracked less types than mine, of which constraints and invoke calls are missing in the default result.
    89  Hence, it has empty pts while mine has non-empty pts. For example,
    90  ```
    91  SSA:  &t57[41:int]
    92  My Query: (#obj:  1 )
    93     n597&[0:shared contour; ] : [slicelit[*]]
    94     n2266&[0:shared contour; ] : []
    95  Default Query: (#obj:  0 )
    96     n36563 : []
    97  
    98  In default log:
    99  ; t181 = &t57[41:int]
   100  	localobj[t181] = n37034
   101  	type not tracked: *strconv.leftCheat
   102  
   103  In my log:
   104  ; t181 = &t57[41:int]
   105  	localobj[t181] = n2266
   106  	addr n597 <- {&n2266}
   107  ```
   108  
   109  #### Why my query is empty but default is not:
   110  Due to the default algorithm (pre-compute all constraints for all functions),
   111  it generates a lot of unreachable functions/cgnodes (they have no callers), as well as their constraints.
   112  This also affect the pts of the reachable part in cg and pts, since they may be polluted.
   113  For example,
   114  ```
   115  SSA:  (*internal/reflectlite.rtype).Size  //-> (*internal/reflectlite.rtype).Size is not reachable function
   116  My Query: (#obj:  0 )
   117     n8971&(Global/Local) : []
   118  Default Query: (#obj:  1 )
   119     n6354 : [(*internal/reflectlite.rtype).Size]
   120  ```
   121  
   122  #### Why my query is non-empty but no corresponding pointer in default:
   123  Default does not create queries for those types (not tracked types).
   124  For example,
   125  ```
   126  654.
   127  SSA:  io.ErrClosedPipe
   128  My Query: (#obj:  1 )
   129     n4448&(Global/Local) : [makeinterface:*errors.errorString]
   130  Default Query: nil)
   131  
   132  In default log:
   133  ; *ErrClosedPipe = t10
   134  	copy n19413 <- n39471
   135  
   136  In my log:
   137  ; *ErrClosedPipe = t10
   138  	create n4448 error for global
   139  	globalobj[io.ErrClosedPipe] = n4448
   140  	copy n4448 <- n4431
   141  ```
   142  
   143  
   144  #### Why default query is empty but no corresponding pointer in mine:
   145  IDK.
   146  For example,
   147  ```
   148  SSA:  &r.peekRune [#4]
   149  My Query: nil
   150  Default Query: (#obj:  0 )
   151     n44378 : []
   152  and
   153  SSA:  ssa:wrapnilchk(v, "internal/reflectl...":string, "IsNil":string)
   154  My Query: nil)
   155  Default Query: (#obj:  0 )
   156     n44378 : []
   157  and
   158  	val[t115] = n44377  (*ssa.FieldAddr)
   159  	create n44378 *[16]byte for query
   160  	copy n44378 <- n44377
   161  ```
   162  
   163  #### Why my query has less objs in pts than the default:
   164  All missing objs in my pts are due to objs and constraints introduced by unreachable functions.
   165  This is the pollution we mentioned before.
   166  For example,
   167  ```
   168  SSA:  *t49
   169  My Query: (#obj:  27 )
   170     n11815&[0:shared contour; ] : [makeinterface:int makeinterface:[]int makeinterface:int makeinterface:*internal/reflectlite.ValueError makeinterface:*internal/reflectlite.ValueError makeinterface:string makeinterface:string makeinterface:*internal/reflectlite.ValueError makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string]
   171  Default Query: (#obj:  79 )
   172     n18076 : [makeinterface:int makeinterface:[]int makeinterface:string makeinterface:*internal/reflectlite.ValueError makeinterface:*internal/reflectlite.ValueError makeinterface:*internal/reflectlite.ValueError makeinterface:*internal/reflectlite.ValueError makeinterface:string makeinterface:*internal/reflectlite.ValueError makeinterface:string makeinterface:string makeinterface:string makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:*internal/reflectlite.ValueError makeinterface:string makeinterface:string makeinterface:*internal/reflectlite.ValueError makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:int makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:*internal/reflectlite.ValueError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:fmt.scanError makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:*errors.errorString makeinterface:string makeinterface:string makeinterface:*internal/reflectlite.ValueError makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string makeinterface:string]
   173  ```
   174  
   175  
   176  #### 2. CG
   177  
   178  #### *Why are the cgs from default and my pta different?*
   179  
   180  The default algorithm create cgnodes for functions that are not reachable from the main entry.
   181  For example, when analyzing the main entry ```google.golang.org/grpc/benchmark/server```,
   182  the default algorithm pre-generate constraints and cgnodes for function:
   183  ```go
   184  (*google.golang.org/grpc/credentials.tlsCreds).ServerHandshake
   185  ``` 
   186  which is not reachable from the main entry (it has no caller in cg).
   187  
   188  This can be reflected in the default analysis data: 
   189  ``` 
   190  Call Graph: (function based) 
   191  #Nodes:  14740
   192  #Edges:  45550
   193  #Unreach Nodes:  7698
   194  #Reach Nodes:  7042
   195  #Unreach Functions:  7698
   196  #Reach Functions:  7042
   197  
   198  Done  -- PTA/CG Build; Using  5m13.058385739s . 
   199  ```
   200  Default generates 14740 functions and their constraints, however, only 7042 (at most) of them can be reachable from the main.
   201  
   202  While my analysis data is:
   203  ```
   204  Call Graph: (cgnode based: function + context) 
   205  #Nodes:  6306
   206  #Edges:  23291
   207  #Unreach Nodes:  39
   208  #Reach Nodes:  6267
   209  #Unreach Functions:  39
   210  #Reach Functions:  5870
   211  
   212  #Unreach Nodes from Pre-Gen Nodes:  39
   213  #Unreach Functions from Pre-Gen Nodes:  39
   214  #(Pre-Gen are created for reflections)
   215  
   216  Done  -- PTA/CG Build; Using 10.279403083s. 
   217  ```
   218  My analysis traverse 6267 functions that can be reached after extended the traced types.
   219  
   220  This not only introduce differences in cg, but also unreachable constraints and objs, which can be 
   221  propagated to the cgnodes and constraints that can be reached from the main entry. This causes false 
   222  call edges and callees in default cg.
   223  
   224  Most CG DIFFs from comparing mine with default result are due to this reason. 
   225  
   226  
   227  #### Why the unreachable function/cgnode will be generated?
   228  This is because the default algorithm creates nodes and constraints for all methods of all types
   229  that are dynamically accessible via reflection or interfaces (no matter it will be reached or not).
   230  
   231  
   232  #### Why the cgnodes from default not include some callees as mine?
   233  Because default algo has less type tracked than mine (no constraints generated for them and hence
   234  no propagation), Hence, some invoke calls has no base instance that will exist if we track those types.
   235  Consequently, no callee functions/cgs generated as well as constraints.
   236     
   237   
   238