github.com/bazelbuild/bazel-gazelle@v0.36.1-0.20240520142334-61b277ba6fed/Design.rst (about)

     1  Architecture of Gazelle
     2  =======================
     3  
     4  .. All external links are here.
     5  
     6  .. Godoc links
     7  .. _buildifier build: https://godoc.org/github.com/bazelbuild/buildtools/build
     8  .. _config: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/config
     9  .. _go/build: https://godoc.org/go/build
    10  .. _go/parser: https://godoc.org/go/parser
    11  .. _merger: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/merger
    12  .. _packages: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/packages
    13  .. _resolve: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/resolve
    14  .. _rules: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/rules
    15  .. _CallExpr: https://godoc.org/github.com/bazelbuild/buildtools/build#CallExpr
    16  .. _golang.org/x/tools/go/vcs: https://godoc.org/golang.org/x/tools/go/vcs
    17  
    18  .. Other documentation links
    19  .. _buildifier: https://github.com/bazelbuild/buildtools/tree/master/buildifier
    20  .. _config_setting: https://docs.bazel.build/versions/master/be/general.html#config_setting
    21  .. _Fix command transformations: README.rst#fix-command-transformations
    22  .. _full list of directives: README.rst#Directives
    23  .. _select: https://docs.bazel.build/versions/master/skylark/lib/globals.html#select
    24  
    25  .. Issues
    26  .. _#5: https://github.com/bazelbuild/bazel-gazelle/issues/5
    27  .. _#7: https://github.com/bazelbuild/bazel-gazelle/issues/7
    28  
    29  .. Actual content is below
    30  
    31  Gazelle is a tool that generates and updates Bazel build files for Go projects
    32  that follow the conventional "go build" project layout. It is intended to
    33  simplify the maintenance of Bazel Go projects as much as possible.
    34  
    35  This document describes how Gazelle works. It should help users understand why
    36  Gazelle behaves as it does, and it should help developers understand
    37  how to modify Gazelle and how to write similar tools.
    38  
    39  .. contents::
    40  
    41  Overview
    42  --------
    43  
    44  Gazelle generates and updates build files according the algorithm outlined
    45  below. Each of the steps here is described in more detail in the sections below.
    46  
    47  * Build a configuration from command line arguments and special comments
    48    in the top-level build file. See Configuration_.
    49  
    50  * For each directory in the repository:
    51  
    52    * Read the build file if one is present.
    53  
    54    * If the build file should be updated (based on configuration):
    55  
    56      * Apply transformations to the build file to migrate away from deprecated
    57        APIs. See `Fixing build files`_.
    58  
    59      * Scan the source files and collect metadata needed to generate rules
    60        for the directory. See `Scanning source files`_.
    61  
    62      * Generate new rules from the build metadata collected earlier. See
    63        `Generating rules`_.
    64  
    65      * Merge the new rules into the directory's build file. Delete any rules
    66        which are now empty. See `Merging and deleting rules`_.
    67  
    68    * Add the library rules in the directory's build file to a global table,
    69      indexed by import path.
    70  
    71  * For each updated build file:
    72  
    73    * Use the library table to map import paths to Bazel labels for rules that 
    74      were added or merged earlier. See `Resolving dependencies`_.
    75  
    76    * Merge the resolved rules back into the file.
    77  
    78    * Format the file using buildifier_ and emit it according to the output mode:
    79      write to disk, print the whole file, or print the diff.
    80  
    81  Configuration
    82  -------------
    83  
    84  Godoc: config_
    85  
    86  Gazelle stores configuration information in ``Config`` objects. These objects
    87  contain settings that affect the behavior of most packages in the program.
    88  For example:
    89  
    90  * The list of directories that Gazelle should update.
    91  * The path of the repository root directory. Bazel package names are based
    92    on paths relative to this location.
    93  * The current import path prefix and the directory where it was set.
    94    Gazelle uses this to infer import paths for ``go_library`` rules.
    95  * A list of build tags that Gazelle considers to be true on all platforms.
    96  
    97  ``Config`` objects apply to individual directories. Each directory inherits
    98  the ``Config`` from its parent. Values in a ``Config`` may be modified within
    99  a directory using *directives* written in the directory's build file. A
   100  directive is a special comment formatted like this:
   101  
   102  ::
   103  
   104    # gazelle:key value
   105  
   106  Here are a few examples. See the `full list of directives`_.
   107  
   108  * ``# gazelle:prefix`` - sets the Go import path prefix for the current
   109    directory.
   110  * ``# gazelle:build_tags`` - sets the list of build tags which Gazelle considers
   111    to be true on all platforms.
   112  
   113  There are a few directives which are not applied to the ``Config`` object but
   114  are interpreted directly in packages where they are relevant.
   115  
   116  * ``# gazelle:ignore`` - the build file should not be updated by Gazelle.
   117    Gazelle may still index its contents so it can resolve dependencies in other
   118    build files.
   119  * ``# gazelle:exclude path/to/file`` - the named file should not be read by
   120    Gazelle and should not be included in ``srcs`` lists. If this refers to
   121    a directory, Gazelle won't recurse into the directory. This directive may
   122    appear multiple times.
   123  
   124  Fixing build files
   125  ------------------
   126  
   127  Godoc: merger_
   128  
   129  From time to time, APIs in rules_go are changed or updated. Gazelle helps
   130  users stay up to date with these changes by automatically fixing deprecated
   131  usage.
   132  
   133  Minor fixes are applied by Gazelle automatically every time it runs. However,
   134  some fixes may delete or rename existing rules. Users must run ``gazelle fix``
   135  to apply these fixes. By default, Gazelle will only *warn* users that
   136  ``gazelle fix`` should be run.
   137  
   138  Here are a few of the fixes Gazelle performs. See `Fix command transformations`_
   139  for a full list.
   140  
   141  * **Squash cgo libraries:** Gazelle will remove ``cgo_library`` rules and
   142    merge their attributes into ``go_library`` rules that reference them.
   143    This is a major fix and is only applied with ``gazelle fix``.
   144  * **Migrate library attributes:** Gazelle replaces ``library`` attributes
   145    with ``embed`` attributes. The only difference between these is that
   146    ``library`` (which is now deprecated) accepts a single label, while ``embed``
   147    accepts a list. This is a minor fix and is always applied.
   148  
   149  Users can prevent Gazelle from modifying rules, attributes, or individual
   150  values by writing ``# keep`` comments above them.
   151  
   152  Scanning source files
   153  ---------------------
   154  
   155  Godoc: packages_
   156  
   157  Nearly all of the information needed to build a program with the standard Go SDK
   158  is implied by directory structure, file names, and file contents. This is why
   159  ``go build`` doesn't require any sort of build file. The `go/build`_ package in
   160  the standard library collects this information.
   161  
   162  Unfortunately, `go/build`_ can only collect information for one platform at
   163  a time. Gazelle needs to generate build files that work on all platforms, so
   164  we have our own implementation of this logic.
   165  
   166  Information extracted from files
   167  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   168  
   169  Gazelle extracts build metadata from source files and contents in much the
   170  same way that the standard `go/build`_ package does. It gets the following
   171  information from file names:
   172  
   173  * File extension (e.g., .go, .c, .proto). Normally, only .go, .s, and .h files
   174    are included in Go rules. If any cgo code is present, then C/C++ files are
   175    also included. .proto files are also used to build proto rules. Other files
   176    (e.g., .txt) are ignored.
   177  * Test suffix. For example, if a file is named ``foo_test.go``, it will be
   178    included in a test target instead of a library or binary target.
   179  * OS and architecture suffixes. For example, a file named ``foo_linux_amd64.go``
   180    will be listed in the ``linux_amd64`` section of the target it belongs to.
   181  
   182  Gazelle gets the following information from file contents:
   183  
   184  * Package name. This is syntactically the first part of every .go file. All
   185    files in the same directory must have the same package name (except for
   186    external test sources, which have a package name ending with ``_test``). If
   187    there are multiple packages, Gazelle will choose one that matches the
   188    directory name (if present) or report an error.
   189  * Imported libraries. Go import paths are usually URLs. Imports in
   190    platform-specific source files are also platform-specific.
   191  * Build tags. The Go toolchain recognizes comments beginning with ``// +build``
   192    before the package declaration. These tags tell the build system that a file
   193    should only be built for specific platforms. See `this article 
   194    <https://dave.cheney.net/2013/10/12/how-to-use-conditional-compilation-with-the-go-build-tool>`_
   195    for more information.
   196  * Whether cgo code is present. This affects how packages are built and
   197    whether C/C++ files are included.
   198  * C/C++ compile and link options (specified in ``#cgo`` directives in cgo
   199    comments). These may be platform-specific.
   200  
   201  In most cases, only the top of the file is parsed. For Go files, we use the
   202  standard `go/parser`_ package. For proto files, we use regular expressions that
   203  match ``package``, ``go_package``, and ``import`` statements.
   204  
   205  The ``Package`` object
   206  ~~~~~~~~~~~~~~~~~~~~~~
   207  
   208  Gazelle stores build metadata in a ``Package`` object. Currently, we only
   209  support one ``Package`` per directory (which is also what the Go SDK supports),
   210  but this will be expanded in the future. ``Package`` objects contain some
   211  top-level metadata (like the package name and directory path), along with
   212  several target objects (``GoTarget`` and ``ProtoTarget``).
   213  
   214  Target objects correspond directly to rules that will be generated later. They
   215  store lists of sources, imports, and flags in ``PlatformStrings`` objects.
   216  
   217  ``PlatformStrings`` objects store strings in four sections: a generic list, an
   218  OS-specific dictionary, an architecture-specific dictionary, and an
   219  OS-and-architecture-specific dictionary. The keys in the dictionaries are OS
   220  names, architecture names, or OS-and-architecture pairs; the values are lists of
   221  strings. The same string may not appear more than once in a list and may not
   222  appear in more than one section. This is due to a Bazel requirement: the same
   223  label may not appear more than once in a ``deps`` list.
   224  
   225  Generating rules
   226  ----------------
   227  
   228  Godoc: rules_
   229  
   230  Once build metadata has been extracted from the sources in a directory,
   231  Gazelle generates rules for building those sources.
   232  
   233  Generated rules are formatted as CallExpr_ objects. CallExpr_ is defined in the
   234  `buildifier build`_ library. This is the same library used to parse and format
   235  build files. This lets us manipulate newly generated rules and existing rules
   236  with the same code.
   237  
   238  We may generate the following rules:
   239  
   240  * ``proto_library`` and ``go_proto_library`` are generated if there was at
   241    least one .proto source file.
   242  * ``go_library`` is generated if there was at least one non-test source. This
   243    may embed the ``go_proto_library`` if there was one.
   244  * ``go_test`` rules are generated for internal and external tests. Internal
   245    tests embed the ``go_library`` while external tests depend on the
   246    ``go_library`` as a separate package.
   247  * ``go_binary`` is generated if the package name was ``main``. It embeds the
   248    ``go_library``.
   249  
   250  Rules are named according to a pluggable naming policy, but there is currently
   251  only one policy: libraries are named ``go_default_library``, tests are
   252  named ``go_default_test``, and binaries are named after the directory. The
   253  ``go_default_library`` name is an historical artifact from before we had
   254  index-based dependency resolution. We'll need to move away from this naming
   255  scheme in the future (`#5`_) before we support multiple packages (`#7`_).
   256  
   257  Sources, imports, and flags within each target are converted to expressions in a
   258  straightforward fashion. The lists within ``PlatformStrings`` are converted to
   259  list expressions. Dictionaries are converted to calls to `select`_ expressions
   260  (when Bazel evaluates a `select`_ expression, it will choose one of several
   261  provided lists, based on `config_setting`_ rules). Lists and select expressions
   262  may be added together. For example:
   263  
   264  .. code:: bzl
   265  
   266    go_library(
   267        name = "go_default_library",
   268        srcs = [
   269            "terminal.go",
   270        ] + select({
   271            "@io_bazel_rules_go//go/platform:darwin": [
   272                "util.go",
   273                "util_bsd.go",
   274            ],
   275            "@io_bazel_rules_go//go/platform:linux": [
   276                "util.go",
   277                "util_linux.go",
   278            ],
   279            "@io_bazel_rules_go//go/platform:windows": [
   280                "util_windows.go",
   281            ],
   282            "//conditions:default": [],
   283        }),
   284        ...
   285    )
   286  
   287  At this point, Gazelle does not have enough information to generate expressions
   288  ``deps`` attributes. We only have a list of import strings extracted from source
   289  files. These imports are stored temporarily in a special ``_gazelle_imports``
   290  attribute in each rule. Later, the imports are converted to Bazel labels (see
   291  `Resolving dependencies`_), and this attribute is replaced with ``deps``.
   292  
   293  Merging and deleting rules
   294  --------------------------
   295  
   296  Godoc: merger_
   297  
   298  Merging is the process of combining generated rules with the corresponding
   299  rules in an existing build file. If no build file exists in a directory, a
   300  new file is created with generated rules, and no merging is performed.
   301  
   302  Merging occurs in two phases: pre-resolve, and post-resolve. This is due to an
   303  interdependence with dependency resolution. Dependency resolution uses a table
   304  of *merged* library rules, so it can't be performed until the pre-resolve merge
   305  has occurred. After dependency resolution, we need to merge newly generated
   306  ``deps`` attributes; this is done in the post-resolve merge. The two phases use
   307  the same algorithm.
   308  
   309  During the merge process, Gazelle attempts to match generated rules with
   310  existing rules that have the same name and same kind. Rules are only merged if
   311  both name and kind match. If an existing rule has the same name as a generated
   312  rule but a different kind, the generated rule will not be merged.  If no
   313  existing rule matches a generated rule, the generated rule is simply appended to
   314  the end of the file. Existing rules that don't match any generated rule are not
   315  modified.
   316  
   317  When Gazelle identifies a matching pair of rules, it combines each attribute
   318  according to the algorithm below. If an attribute is present in the generated
   319  rule but not in the existing rule, it is copied to the merged rule verbatim. If
   320  an attribute is present in the existing rule but not the generated rule, Gazelle
   321  behaves as if the generated attribute were present but empty.
   322  
   323  * For each value in the existing rule's attribute:
   324  
   325    * If the value also appears in the generated rule's attribute or is marked
   326      with a ``# keep`` comment, preserve it. Otherwise, delete it.
   327  
   328  * For each value in the generated rule's attribute:
   329  
   330    * If the value appears in the generated rule's attribute, ignore it.
   331      Otherwise, add it to the merged rule.
   332  
   333  * If the merged attribute is empty, delete it.
   334  
   335  When a value is present in both the existing and generated attributes, we use
   336  the existing value instead of the generated value, since this preserves
   337  comments.
   338  
   339  Some attributes are considered *unmergeable*, for example, ``visibility`` and
   340  ``gc_goopts``. Gazelle may add these attributes to existing rules if they are
   341  not already present, but existing values won't be modified or deleted.
   342  
   343  Preserving customizations
   344  ~~~~~~~~~~~~~~~~~~~~~~~~~
   345  
   346  Gazelle has several mechanisms for preserving manual modifications to build
   347  files. Some of these mechanisms work automatically; others require explicit
   348  comments.
   349  
   350  * Gazelle will not modify or delete rules that don't appear to have been
   351    generated by Gazelle.
   352  * As mentioned above, some attributes are considered unmergeable. Gazelle may
   353    set initial values for these but won't delete or replace existing values.
   354  * ``# keep`` comments may be attached to any rule, attribute, or value
   355    to prevent Gazelle from modifying it.
   356  * ``# gazelle:exclude <file>`` directives can be used to prevent Gazelle from
   357    adding files to source lists (for example, checked-in .pb.go files). They
   358    can also prevent Gazelle from recursing into directories that contain
   359    unbuildable code (e.g., ``testdata``).
   360  * ``# gazelle:ignore`` directives prevent Gazelle from making any modifications
   361    to build files that contain them.
   362  
   363  Deleting rules
   364  ~~~~~~~~~~~~~~
   365  
   366  Deletion is a special case of the merging algorithm.
   367  
   368  When Gazelle generates rules for a package (see `Generating rules`_), it
   369  actually produces two lists of rules: a list of rules for buildable targets,
   370  and a list of empty rules that may be deleted. The empty rules have no
   371  attributes other than ``name``.
   372  
   373  The empty rules are merged using the same algorithm as the other generated
   374  rules. If, after merging, an empty rule has no attributes that would make the
   375  rule buildable (for example, ``srcs``, or ``deps``), the rule will be deleted.
   376  
   377  Resolving dependencies
   378  ----------------------
   379  
   380  Godoc: resolve_
   381  
   382  When Gazelle generates rules for a package (see `Generating
   383  rules`_), it stores names of the libraries imported by each rule in a special
   384  ``_gazelle_imports`` attribute. During dependency resolution, Gazelle maps these
   385  imports to Bazel labels and replaces ``_gazelle_imports`` with ``deps``.
   386  
   387  Before dependency resolution starts, Gazelle builds a table of all known
   388  libraries. This includes ``go_library``, ``go_proto_library``, and
   389  ``proto_library`` rules. The table is populated by scanning build files after
   390  the pre-resolve merge, so existing and newly generated rules are included
   391  in the table, and deleted rules are excluded. Once all library rules have been
   392  added, Gazelle indexes the table by language-specific import path.
   393  
   394  Gazelle resolves each import string in ``_gazelle_imports`` as follows:
   395  
   396  * If the import is part of the standard library, it is dropped. Standard
   397    library dependencies are implicit.
   398  
   399  * If the import is provided by exactly one rule in the library table, the label
   400    for that rule is used.
   401  
   402  * If the import is provided by multiple libraries, we attempt to resolve
   403    the ambiguity.
   404  
   405    * For Go, we apply the vendoring algorithm. Vendored libraries aren't visible
   406      outside of the vendor directory's parent.
   407  
   408    * Go libraries that are embedded by other Go libraries are not considered.
   409      Embedded libraries may be incomplete.
   410  
   411    * When an ambiguity can't be resolved, Gazelle logs an error and skips
   412      the dependency.
   413  
   414  * If the import is not provided by any rule in the import table, we attempt
   415    to resolve the dependency using heuristics:
   416  
   417    * If the import path starts with the current prefix (set with a 
   418      ``# gazelle:prefix`` directive or on the command line), we construct a label
   419      by concatenating the prefix directory and the portion of the import path
   420      below the prefix into a package name.
   421  
   422    * Otherwise, the import path is considered external and is resolved
   423      according to the external mode set on the command line.
   424  
   425      * In ``external`` mode, Gazelle determines the portion of the import path
   426        that corresponds to a repository using `golang.org/x/tools/go/vcs`_. This
   427        part of the path is converted into a repository name (for example,
   428        ``@org_golang_x_tools_go_vcs``), and the rest is converted to a package name.
   429  
   430      * In ``vendored`` mode, Gazelle constructs a label by prepending ``vendor/``
   431        to the import path.
   432  
   433  Note that ``visibility`` attributes are not considered when resolving imports.
   434  This was part of an initial prototype, but it was confusing in many situations.
   435  
   436  Building and running Gazelle
   437  ----------------------------
   438  
   439  Gazelle is a regular Go program. It can be built, installed, and run without
   440  Bazel, using the regular Go SDK.
   441  
   442  .. code:: bash
   443  
   444    $ go install github.com/bazelbuild/bazel-gazelle/cmd/gazelle@latest
   445    $ gazelle -go_prefix example.com/project
   446  
   447  We lightly discourage this method of running Gazelle. All developers on a
   448  project should use the same version of Gazelle to ensure the build files
   449  they generate are consistent. The easiest way to accomplish this is to build
   450  and run Gazelle through Bazel. Gazelle may added to a WORKSPACE file, 
   451  built as a normal ``go_binary``, then installed or run from the ``bazel-bin/``
   452  directory.
   453  
   454  .. code:: bash
   455  
   456    $ bazel build @bazel_gazelle//cmd/gazelle
   457    $ bazel-bin/external/bazel_gazelle/cmd/gazelle/gazelle -go_prefix example.com/project
   458  
   459  It's usually better to invoke Gazelle through a wrapper script though. This
   460  saves typing and ensures Gazelle is run with a consistent set of arguments.
   461  We provide a Bazel rule that generates such a wrapper script. Developers may
   462  add a snippet like the one below to a build file:
   463  
   464  .. code:: bzl
   465  
   466    load("@bazel_gazelle//:def.bzl", "gazelle")
   467  
   468    gazelle(
   469        name = "gazelle",
   470        command = "fix",
   471        external = "vendored",
   472        prefix = "example.com/project",
   473    )
   474  
   475  This script may be built and executed in a single command with ``bazel run``.
   476  
   477  .. code:: bash
   478  
   479    $ bazel run //:gazelle
   480  
   481  This is the most convenient way to run Gazelle, and it's what we recommend to
   482  users. However, there are two issues with running Gazelle in this
   483  fashion. First, binaries executed by ``bazel run`` are run in the Bazel
   484  execroot, not the user's current directory. The wrapper script uses a hack
   485  (dereferencing symlinks) to jump to the top of the workspace source tree before
   486  running Gazelle. Second, ``bazel run`` holds a lock on the Bazel output
   487  directory. This means Gazelle cannot invoke Bazel without deadlocking. Commands
   488  like ``bazel query`` would be helpful for detecting generated code, but it's not
   489  safe to use them.
   490  
   491  To avoid these limitations, the wrapper script may be copied to the workspace
   492  and optionally checked into version control. When the wrapper script is run
   493  directly (without ``bazel run``), it will rebuild itself to ensure no changes
   494  are needed. If the rebuilt script differs from the running script, it will
   495  prompt the user to copy the rebuilt script into the workspace again.
   496  
   497  .. code:: bash
   498  
   499    $ bazel build //:gazelle
   500    Target //:gazelle up-to-date:
   501      bazel-bin/gazelle.bash
   502    ____Elapsed time: 1.326s, Critical Path: 0.00s
   503    $ cp bazel-bin/gazelle.bash gazelle.bash
   504    $ ./gazelle.bash
   505  
   506  Dependencies
   507  ------------
   508  
   509  Gazelle has the following dependencies:
   510  
   511  github.com/bazelbuild/bazel-skylib
   512    Skylark utility used to generate wrapper script in the ``gazelle`` rule.
   513  github.com/bazelbuild/buildtools/build
   514    Used to parse and rewrite build files.
   515  github.com/bazelbuild/rules_go
   516    Used to build and test Gazelle through Bazel. Gazelle can aslo be built on its
   517    own with the Go SDK.
   518  golang.org/x/tools/vcs
   519    Used during dependency resolution to determine the repository prefix for a
   520    given import path. This uses the network.