golang.org/x/tools@v0.21.0/internal/refactor/inline/doc.go (about)

     1  // Copyright 2023 The Go Authors. All rights reserved.
     2  // Use of this source code is governed by a BSD-style
     3  // license that can be found in the LICENSE file.
     4  
     5  /*
     6  Package inline implements inlining of Go function calls.
     7  
     8  The client provides information about the caller and callee,
     9  including the source text, syntax tree, and type information, and
    10  the inliner returns the modified source file for the caller, or an
    11  error if the inlining operation is invalid (for example because the
    12  function body refers to names that are inaccessible to the caller).
    13  
    14  Although this interface demands more information from the client
    15  than might seem necessary, it enables smoother integration with
    16  existing batch and interactive tools that have their own ways of
    17  managing the processes of reading, parsing, and type-checking
    18  packages. In particular, this package does not assume that the
    19  caller and callee belong to the same token.FileSet or
    20  types.Importer realms.
    21  
    22  There are many aspects to a function call. It is the only construct
    23  that can simultaneously bind multiple variables of different
    24  explicit types, with implicit assignment conversions. (Neither var
    25  nor := declarations can do that.) It defines the scope of control
    26  labels, of return statements, and of defer statements. Arguments
    27  and results of function calls may be tuples even though tuples are
    28  not first-class values in Go, and a tuple-valued call expression
    29  may be "spread" across the argument list of a call or the operands
    30  of a return statement. All these unique features mean that in the
    31  general case, not everything that can be expressed by a function
    32  call can be expressed without one.
    33  
    34  So, in general, inlining consists of modifying a function or method
    35  call expression f(a1, ..., an) so that the name of the function f
    36  is replaced ("literalized") by a literal copy of the function
    37  declaration, with free identifiers suitably modified to use the
    38  locally appropriate identifiers or perhaps constant argument
    39  values.
    40  
    41  Inlining must not change the semantics of the call. Semantics
    42  preservation is crucial for clients such as codebase maintenance
    43  tools that automatically inline all calls to designated functions
    44  on a large scale. Such tools must not introduce subtle behavior
    45  changes. (Fully inlining a call is dynamically observable using
    46  reflection over the call stack, but this exception to the rule is
    47  explicitly allowed.)
    48  
    49  In many cases it is possible to entirely replace ("reduce") the
    50  call by a copy of the function's body in which parameters have been
    51  replaced by arguments. The inliner supports a number of reduction
    52  strategies, and we expect this set to grow. Nonetheless, sound
    53  reduction is surprisingly tricky.
    54  
    55  The inliner is in some ways like an optimizing compiler. A compiler
    56  is considered correct if it doesn't change the meaning of the
    57  program in translation from source language to target language. An
    58  optimizing compiler exploits the particulars of the input to
    59  generate better code, where "better" usually means more efficient.
    60  When a case is found in which it emits suboptimal code, the
    61  compiler is improved to recognize more cases, or more rules, and
    62  more exceptions to rules; this process has no end. Inlining is
    63  similar except that "better" code means tidier code. The baseline
    64  translation (literalization) is correct, but there are endless
    65  rules--and exceptions to rules--by which the output can be
    66  improved.
    67  
    68  The following section lists some of the challenges, and ways in
    69  which they can be addressed.
    70  
    71    - All effects of the call argument expressions must be preserved,
    72      both in their number (they must not be eliminated or repeated),
    73      and in their order (both with respect to other arguments, and any
    74      effects in the callee function).
    75  
    76      This must be the case even if the corresponding parameters are
    77      never referenced, are referenced multiple times, referenced in
    78      a different order from the arguments, or referenced within a
    79      nested function that may be executed an arbitrary number of
    80      times.
    81  
    82      Currently, parameter replacement is not applied to arguments
    83      with effects, but with further analysis of the sequence of
    84      strict effects within the callee we could relax this constraint.
    85  
    86    - When not all parameters can be substituted by their arguments
    87      (e.g. due to possible effects), if the call appears in a
    88      statement context, the inliner may introduce a var declaration
    89      that declares the parameter variables (with the correct types)
    90      and assigns them to their corresponding argument values.
    91      The rest of the function body may then follow.
    92      For example, the call
    93  
    94      f(1, 2)
    95  
    96      to the function
    97  
    98      func f(x, y int32) { stmts }
    99  
   100      may be reduced to
   101  
   102      { var x, y int32 = 1, 2; stmts }.
   103  
   104      There are many reasons why this is not always possible. For
   105      example, true parameters are statically resolved in the same
   106      scope, and are dynamically assigned their arguments in
   107      parallel; but each spec in a var declaration is statically
   108      resolved in sequence and dynamically executed in sequence, so
   109      earlier parameters may shadow references in later ones.
   110  
   111    - Even an argument expression as simple as ptr.x may not be
   112      referentially transparent, because another argument may have the
   113      effect of changing the value of ptr.
   114  
   115      This constraint could be relaxed by some kind of alias or
   116      escape analysis that proves that ptr cannot be mutated during
   117      the call.
   118  
   119    - Although constants are referentially transparent, as a matter of
   120      style we do not wish to duplicate literals that are referenced
   121      multiple times in the body because this undoes proper factoring.
   122      Also, string literals may be arbitrarily large.
   123  
   124    - If the function body consists of statements other than just
   125      "return expr", in some contexts it may be syntactically
   126      impossible to reduce the call. Consider:
   127  
   128      if x := f(); cond { ... }
   129  
   130      Go has no equivalent to Lisp's progn or Rust's blocks,
   131      nor ML's let expressions (let param = arg in body);
   132      its closest equivalent is func(param){body}(arg).
   133      Reduction strategies must therefore consider the syntactic
   134      context of the call.
   135  
   136      In such situations we could work harder to extract a statement
   137      context for the call, by transforming it to:
   138  
   139      { x := f(); if cond { ... } }
   140  
   141    - Similarly, without the equivalent of Rust-style blocks and
   142      first-class tuples, there is no general way to reduce a call
   143      to a function such as
   144  
   145      func(params)(args)(results) { stmts; return expr }
   146  
   147      to an expression such as
   148  
   149      { var params = args; stmts; expr }
   150  
   151      or even a statement such as
   152  
   153      results = { var params = args; stmts; expr }
   154  
   155      Consequently the declaration and scope of the result variables,
   156      and the assignment and control-flow implications of the return
   157      statement, must be dealt with by cases.
   158  
   159    - A standalone call statement that calls a function whose body is
   160      "return expr" cannot be simply replaced by the body expression
   161      if it is not itself a call or channel receive expression; it is
   162      necessary to explicitly discard the result using "_ = expr".
   163  
   164      Similarly, if the body is a call expression, only calls to some
   165      built-in functions with no result (such as copy or panic) are
   166      permitted as statements, whereas others (such as append) return
   167      a result that must be used, even if just by discarding.
   168  
   169    - If a parameter or result variable is updated by an assignment
   170      within the function body, it cannot always be safely replaced
   171      by a variable in the caller. For example, given
   172  
   173      func f(a int) int { a++; return a }
   174  
   175      The call y = f(x) cannot be replaced by { x++; y = x } because
   176      this would change the value of the caller's variable x.
   177      Only if the caller is finished with x is this safe.
   178  
   179      A similar argument applies to parameter or result variables
   180      that escape: by eliminating a variable, inlining would change
   181      the identity of the variable that escapes.
   182  
   183    - If the function body uses 'defer' and the inlined call is not a
   184      tail-call, inlining may delay the deferred effects.
   185  
   186    - Because the scope of a control label is the entire function, a
   187      call cannot be reduced if the caller and callee have intersecting
   188      sets of control labels. (It is possible to α-rename any
   189      conflicting ones, but our colleagues building C++ refactoring
   190      tools report that, when tools must choose new identifiers, they
   191      generally do a poor job.)
   192  
   193    - Given
   194  
   195      func f() uint8 { return 0 }
   196  
   197      var x any = f()
   198  
   199      reducing the call to var x any = 0 is unsound because it
   200      discards the implicit conversion to uint8. We may need to make
   201      each argument-to-parameter conversion explicit if the types
   202      differ. Assignments to variadic parameters may need to
   203      explicitly construct a slice.
   204  
   205      An analogous problem applies to the implicit assignments in
   206      return statements:
   207  
   208      func g() any { return f() }
   209  
   210      Replacing the call f() with 0 would silently lose a
   211      conversion to uint8 and change the behavior of the program.
   212  
   213    - When inlining a call f(1, x, g()) where those parameters are
   214      unreferenced, we should be able to avoid evaluating 1 and x
   215      since they are pure and thus have no effect. But x may be the
   216      last reference to a local variable in the caller, so removing
   217      it would cause a compilation error. Parameter substitution must
   218      avoid making the caller's local variables unreferenced (or must
   219      be prepared to eliminate the declaration too---this is where an
   220      iterative framework for simplification would really help).
   221  
   222    - An expression such as s[i] may be valid if s and i are
   223      variables but invalid if either or both of them are constants.
   224      For example, a negative constant index s[-1] is always out of
   225      bounds, and even a non-negative constant index may be out of
   226      bounds depending on the particular string constant (e.g.
   227      "abc"[4]).
   228  
   229      So, if a parameter participates in any expression that is
   230      subject to additional compile-time checks when its operands are
   231      constant, it may be unsafe to substitute that parameter by a
   232      constant argument value (#62664).
   233  
   234  More complex callee functions are inlinable with more elaborate and
   235  invasive changes to the statements surrounding the call expression.
   236  
   237  TODO(adonovan): future work:
   238  
   239    - Handle more of the above special cases by careful analysis,
   240      thoughtful factoring of the large design space, and thorough
   241      test coverage.
   242  
   243    - Compute precisely (not conservatively) when parameter
   244      substitution would remove the last reference to a caller local
   245      variable, and blank out the local instead of retreating from
   246      the substitution.
   247  
   248    - Afford the client more control such as a limit on the total
   249      increase in line count, or a refusal to inline using the
   250      general approach (replacing name by function literal). This
   251      could be achieved by returning metadata alongside the result
   252      and having the client conditionally discard the change.
   253  
   254    - Support inlining of generic functions, replacing type parameters
   255      by their instantiations.
   256  
   257    - Support inlining of calls to function literals ("closures").
   258      But note that the existing algorithm makes widespread assumptions
   259      that the callee is a package-level function or method.
   260  
   261    - Eliminate explicit conversions of "untyped" literals inserted
   262      conservatively when they are redundant. For example, the
   263      conversion int32(1) is redundant when this value is used only as a
   264      slice index; but it may be crucial if it is used in x := int32(1)
   265      as it changes the type of x, which may have further implications.
   266      The conversions may also be important to the falcon analysis.
   267  
   268    - Allow non-'go' build systems such as Bazel/Blaze a chance to
   269      decide whether an import is accessible using logic other than
   270      "/internal/" path segments. This could be achieved by returning
   271      the list of added import paths instead of a text diff.
   272  
   273    - Inlining a function from another module may change the
   274      effective version of the Go language spec that governs it. We
   275      should probably make the client responsible for rejecting
   276      attempts to inline from newer callees to older callers, since
   277      there's no way for this package to access module versions.
   278  
   279    - Use an alternative implementation of the import-organizing
   280      operation that doesn't require operating on a complete file
   281      (and reformatting). Then return the results in a higher-level
   282      form as a set of import additions and deletions plus a single
   283      diff that encloses the call expression. This interface could
   284      perhaps be implemented atop imports.Process by post-processing
   285      its result to obtain the abstract import changes and discarding
   286      its formatted output.
   287  */
   288  package inline