
     1  Architecture of Gazelle
     2  =======================
    31  Gazelle is a tool that generates and updates Bazel build files for Go projects
    32  that follow the conventional "go build" project layout. It is intended to
    33  simplify the maintenance of Bazel Go projects as much as possible.
    35  This document describes how Gazelle works. It should help users understand why
    36  Gazelle behaves as it does, and it should help developers understand
    37  how to modify Gazelle and how to write similar tools.
    39  .. contents::
    41  Overview
    42  --------
    44  Gazelle generates and updates build files according the algorithm outlined
    45  below. Each of the steps here is described in more detail in the sections below.
    47  * Build a configuration from command line arguments and special comments
    48    in the top-level build file. See Configuration_.
    50  * For each directory in the repository:
    52    * Read the build file if one is present.
    54    * If the build file should be updated (based on configuration):
    56      * Apply transformations to the build file to migrate away from deprecated
    57        APIs. See `Fixing build files`_.
    59      * Scan the source files and collect metadata needed to generate rules
    60        for the directory. See `Scanning source files`_.
    62      * Generate new rules from the build metadata collected earlier. See
    63        `Generating rules`_.
    65      * Merge the new rules into the directory's build file. Delete any rules
    66        which are now empty. See `Merging and deleting rules`_.
    68    * Add the library rules in the directory's build file to a global table,
    69      indexed by import path.
    71  * For each updated build file:
    73    * Use the library table to map import paths to Bazel labels for rules that 
    74      were added or merged earlier. See `Resolving dependencies`_.
    76    * Merge the resolved rules back into the file.
    78    * Format the file using buildifier_ and emit it according to the output mode:
    79      write to disk, print the whole file, or print the diff.
    81  Configuration
    82  -------------
    84  Godoc: config_
    86  Gazelle stores configuration information in ``Config`` objects. These objects
    87  contain settings that affect the behavior of most packages in the program.
    88  For example:
    90  * The list of directories that Gazelle should update.
    91  * The path of the repository root directory. Bazel package names are based
    92    on paths relative to this location.
    93  * The current import path prefix and the directory where it was set.
    94    Gazelle uses this to infer import paths for ``go_library`` rules.
    95  * A list of build tags that Gazelle considers to be true on all platforms.
    97  ``Config`` objects apply to individual directories. Each directory inherits
    98  the ``Config`` from its parent. Values in a ``Config`` may be modified within
    99  a directory using *directives* written in the directory's build file. A
   100  directive is a special comment formatted like this:
   102  ::
   104    # gazelle:key value
   106  Here are a few examples. See the `full list of directives`_.
   108  * ``# gazelle:prefix`` - sets the Go import path prefix for the current
   109    directory.
   110  * ``# gazelle:build_tags`` - sets the list of build tags which Gazelle considers
   111    to be true on all platforms.
   113  There are a few directives which are not applied to the ``Config`` object but
   114  are interpreted directly in packages where they are relevant.
   116  * ``# gazelle:ignore`` - the build file should not be updated by Gazelle.
   117    Gazelle may still index its contents so it can resolve dependencies in other
   118    build files.
   119  * ``# gazelle:exclude path/to/file`` - the named file should not be read by
   120    Gazelle and should not be included in ``srcs`` lists. If this refers to
   121    a directory, Gazelle won't recurse into the directory. This directive may
   122    appear multiple times.
   124  Fixing build files
   125  ------------------
   127  Godoc: merger_
   129  From time to time, APIs in rules_go are changed or updated. Gazelle helps
   130  users stay up to date with these changes by automatically fixing deprecated
   131  usage.
   133  Minor fixes are applied by Gazelle automatically every time it runs. However,
   134  some fixes may delete or rename existing rules. Users must run ``gazelle fix``
   135  to apply these fixes. By default, Gazelle will only *warn* users that
   136  ``gazelle fix`` should be run.
   138  Here are a few of the fixes Gazelle performs. See `Fix command transformations`_
   139  for a full list.
   141  * **Squash cgo libraries:** Gazelle will remove ``cgo_library`` rules and
   142    merge their attributes into ``go_library`` rules that reference them.
   143    This is a major fix and is only applied with ``gazelle fix``.
   144  * **Migrate library attributes:** Gazelle replaces ``library`` attributes
   145    with ``embed`` attributes. The only difference between these is that
   146    ``library`` (which is now deprecated) accepts a single label, while ``embed``
   147    accepts a list. This is a minor fix and is always applied.
   149  Users can prevent Gazelle from modifying rules, attributes, or individual
   150  values by writing ``# keep`` comments above them.
   152  Scanning source files
   153  ---------------------
   155  Godoc: packages_
   157  Nearly all of the information needed to build a program with the standard Go SDK
   158  is implied by directory structure, file names, and file contents. This is why
   159  ``go build`` doesn't require any sort of build file. The `go/build`_ package in
   160  the standard library collects this information.
   162  Unfortunately, `go/build`_ can only collect information for one platform at
   163  a time. Gazelle needs to generate build files that work on all platforms, so
   164  we have our own implementation of this logic.
   166  Information extracted from files
   167  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   169  Gazelle extracts build metadata from source files and contents in much the
   170  same way that the standard `go/build`_ package does. It gets the following
   171  information from file names:
   173  * File extension (e.g., .go, .c, .proto). Normally, only .go, .s, and .h files
   174    are included in Go rules. If any cgo code is present, then C/C++ files are
   175    also included. .proto files are also used to build proto rules. Other files
   176    (e.g., .txt) are ignored.
   177  * Test suffix. For example, if a file is named ``foo_test.go``, it will be
   178    included in a test target instead of a library or binary target.
   179  * OS and architecture suffixes. For example, a file named ``foo_linux_amd64.go``
   180    will be listed in the ``linux_amd64`` section of the target it belongs to.
   182  Gazelle gets the following information from file contents:
   184  * Package name. This is syntactically the first part of every .go file. All
   185    files in the same directory must have the same package name (except for
   186    external test sources, which have a package name ending with ``_test``). If
   187    there are multiple packages, Gazelle will choose one that matches the
   188    directory name (if present) or report an error.
   189  * Imported libraries. Go import paths are usually URLs. Imports in
   190    platform-specific source files are also platform-specific.
   191  * Build tags. The Go toolchain recognizes comments beginning with ``// +build``
   192    before the package declaration. These tags tell the build system that a file
   193    should only be built for specific platforms. See `this article 
   194    <>`_
   195    for more information.
   196  * Whether cgo code is present. This affects how packages are built and
   197    whether C/C++ files are included.
   198  * C/C++ compile and link options (specified in ``#cgo`` directives in cgo
   199    comments). These may be platform-specific.
   201  In most cases, only the top of the file is parsed. For Go files, we use the
   202  standard `go/parser`_ package. For proto files, we use regular expressions that
   203  match ``package``, ``go_package``, and ``import`` statements.
   205  The ``Package`` object
   206  ~~~~~~~~~~~~~~~~~~~~~~
   208  Gazelle stores build metadata in a ``Package`` object. Currently, we only
   209  support one ``Package`` per directory (which is also what the Go SDK supports),
   210  but this will be expanded in the future. ``Package`` objects contain some
   211  top-level metadata (like the package name and directory path), along with
   212  several target objects (``GoTarget`` and ``ProtoTarget``).
   214  Target objects correspond directly to rules that will be generated later. They
   215  store lists of sources, imports, and flags in ``PlatformStrings`` objects.
   217  ``PlatformStrings`` objects store strings in four sections: a generic list, an
   218  OS-specific dictionary, an architecture-specific dictionary, and an
   219  OS-and-architecture-specific dictionary. The keys in the dictionaries are OS
   220  names, architecture names, or OS-and-architecture pairs; the values are lists of
   221  strings. The same string may not appear more than once in a list and may not
   222  appear in more than one section. This is due to a Bazel requirement: the same
   223  label may not appear more than once in a ``deps`` list.
   225  Generating rules
   226  ----------------
   228  Godoc: rules_
   230  Once build metadata has been extracted from the sources in a directory,
   231  Gazelle generates rules for building those sources.
   233  Generated rules are formatted as CallExpr_ objects. CallExpr_ is defined in the
   234  `buildifier build`_ library. This is the same library used to parse and format
   235  build files. This lets us manipulate newly generated rules and existing rules
   236  with the same code.
   238  We may generate the following rules:
   240  * ``proto_library`` and ``go_proto_library`` are generated if there was at
   241    least one .proto source file.
   242  * ``go_library`` is generated if there was at least one non-test source. This
   243    may embed the ``go_proto_library`` if there was one.
   244  * ``go_test`` rules are generated for internal and external tests. Internal
   245    tests embed the ``go_library`` while external tests depend on the
   246    ``go_library`` as a separate package.
   247  * ``go_binary`` is generated if the package name was ``main``. It embeds the
   248    ``go_library``.
   250  Rules are named according to a pluggable naming policy, but there is currently
   251  only one policy: libraries are named ``go_default_library``, tests are
   252  named ``go_default_test``, and binaries are named after the directory. The
   253  ``go_default_library`` name is an historical artifact from before we had
   254  index-based dependency resolution. We'll need to move away from this naming
   255  scheme in the future (`#5`_) before we support multiple packages (`#7`_).
   257  Sources, imports, and flags within each target are converted to expressions in a
   258  straightforward fashion. The lists within ``PlatformStrings`` are converted to
   259  list expressions. Dictionaries are converted to calls to `select`_ expressions
   260  (when Bazel evaluates a `select`_ expression, it will choose one of several
   261  provided lists, based on `config_setting`_ rules). Lists and select expressions
   262  may be added together. For example:
   264  .. code:: bzl
   266    go_library(
   267        name = "go_default_library",
   268        srcs = [
   269            "terminal.go",
   270        ] + select({
   271            "@io_bazel_rules_go//go/platform:darwin": [
   272                "util.go",
   273                "util_bsd.go",
   274            ],
   275            "@io_bazel_rules_go//go/platform:linux": [
   276                "util.go",
   277                "util_linux.go",
   278            ],
   279            "@io_bazel_rules_go//go/platform:windows": [
   280                "util_windows.go",
   281            ],
   282            "//conditions:default": [],
   283        }),
   284        ...
   285    )
   287  At this point, Gazelle does not have enough information to generate expressions
   288  ``deps`` attributes. We only have a list of import strings extracted from source
   289  files. These imports are stored temporarily in a special ``_gazelle_imports``
   290  attribute in each rule. Later, the imports are converted to Bazel labels (see
   291  `Resolving dependencies`_), and this attribute is replaced with ``deps``.
   293  Merging and deleting rules
   294  --------------------------
   296  Godoc: merger_
   298  Merging is the process of combining generated rules with the corresponding
   299  rules in an existing build file. If no build file exists in a directory, a
   300  new file is created with generated rules, and no merging is performed.
   302  Merging occurs in two phases: pre-resolve, and post-resolve. This is due to an
   303  interdependence with dependency resolution. Dependency resolution uses a table
   304  of *merged* library rules, so it can't be performed until the pre-resolve merge
   305  has occurred. After dependency resolution, we need to merge newly generated
   306  ``deps`` attributes; this is done in the post-resolve merge. The two phases use
   307  the same algorithm.
   309  During the merge process, Gazelle attempts to match generated rules with
   310  existing rules that have the same name and same kind. Rules are only merged if
   311  both name and kind match. If an existing rule has the same name as a generated
   312  rule but a different kind, the generated rule will not be merged.  If no
   313  existing rule matches a generated rule, the generated rule is simply appended to
   314  the end of the file. Existing rules that don't match any generated rule are not
   315  modified.
   317  When Gazelle identifies a matching pair of rules, it combines each attribute
   318  according to the algorithm below. If an attribute is present in the generated
   319  rule but not in the existing rule, it is copied to the merged rule verbatim. If
   320  an attribute is present in the existing rule but not the generated rule, Gazelle
   321  behaves as if the generated attribute were present but empty.
   323  * For each value in the existing rule's attribute:
   325    * If the value also appears in the generated rule's attribute or is marked
   326      with a ``# keep`` comment, preserve it. Otherwise, delete it.
   328  * For each value in the generated rule's attribute:
   330    * If the value appears in the generated rule's attribute, ignore it.
   331      Otherwise, add it to the merged rule.
   333  * If the merged attribute is empty, delete it.
   335  When a value is present in both the existing and generated attributes, we use
   336  the existing value instead of the generated value, since this preserves
   339  Some attributes are considered *unmergeable*, for example, ``visibility`` and
   340  ``gc_goopts``. Gazelle may add these attributes to existing rules if they are
   341  not already present, but existing values won't be modified or deleted.
   343  Preserving customizations
   344  ~~~~~~~~~~~~~~~~~~~~~~~~~
   346  Gazelle has several mechanisms for preserving manual modifications to build
   347  files. Some of these mechanisms work automatically; others require explicit
   350  * Gazelle will not modify or delete rules that don't appear to have been
   351    generated by Gazelle.
   352  * As mentioned above, some attributes are considered unmergeable. Gazelle may
   353    set initial values for these but won't delete or replace existing values.
   354  * ``# keep`` comments may be attached to any rule, attribute, or value
   355    to prevent Gazelle from modifying it.
   356  * ``# gazelle:exclude <file>`` directives can be used to prevent Gazelle from
   357    adding files to source lists (for example, checked-in .pb.go files). They
   358    can also prevent Gazelle from recursing into directories that contain
   359    unbuildable code (e.g., ``testdata``).
   360  * ``# gazelle:ignore`` directives prevent Gazelle from making any modifications
   361    to build files that contain them.
   363  Deleting rules
   364  ~~~~~~~~~~~~~~
   366  Deletion is a special case of the merging algorithm.
   368  When Gazelle generates rules for a package (see `Generating rules`_), it
   369  actually produces two lists of rules: a list of rules for buildable targets,
   370  and a list of empty rules that may be deleted. The empty rules have no
   371  attributes other than ``name``.
   373  The empty rules are merged using the same algorithm as the other generated
   374  rules. If, after merging, an empty rule has no attributes that would make the
   375  rule buildable (for example, ``srcs``, or ``deps``), the rule will be deleted.
   377  Resolving dependencies
   378  ----------------------
   380  Godoc: resolve_
   382  When Gazelle generates rules for a package (see `Generating
   383  rules`_), it stores names of the libraries imported by each rule in a special
   384  ``_gazelle_imports`` attribute. During dependency resolution, Gazelle maps these
   385  imports to Bazel labels and replaces ``_gazelle_imports`` with ``deps``.
   387  Before dependency resolution starts, Gazelle builds a table of all known
   388  libraries. This includes ``go_library``, ``go_proto_library``, and
   389  ``proto_library`` rules. The table is populated by scanning build files after
   390  the pre-resolve merge, so existing and newly generated rules are included
   391  in the table, and deleted rules are excluded. Once all library rules have been
   392  added, Gazelle indexes the table by language-specific import path.
   394  Gazelle resolves each import string in ``_gazelle_imports`` as follows:
   396  * If the import is part of the standard library, it is dropped. Standard
   397    library dependencies are implicit.
   399  * If the import is provided by exactly one rule in the library table, the label
   400    for that rule is used.
   402  * If the import is provided by multiple libraries, we attempt to resolve
   403    the ambiguity.
   405    * For Go, we apply the vendoring algorithm. Vendored libraries aren't visible
   406      outside of the vendor directory's parent.
   408    * Go libraries that are embedded by other Go libraries are not considered.
   409      Embedded libraries may be incomplete.
   411    * When an ambiguity can't be resolved, Gazelle logs an error and skips
   412      the dependency.
   414  * If the import is not provided by any rule in the import table, we attempt
   415    to resolve the dependency using heuristics:
   417    * If the import path starts with the current prefix (set with a 
   418      ``# gazelle:prefix`` directive or on the command line), we construct a label
   419      by concatenating the prefix directory and the portion of the import path
   420      below the prefix into a package name.
   422    * Otherwise, the import path is considered external and is resolved
   423      according to the external mode set on the command line.
   425      * In ``external`` mode, Gazelle determines the portion of the import path
   426        that corresponds to a repository using ``_. This
   427        part of the path is converted into a repository name (for example,
   428        ``@org_golang_x_tools``), and the rest is converted to a package name.
   430      * In ``vendored`` mode, Gazelle constructs a label by prepending ``vendor/``
   431        to the import path.
   433  Note that ``visibility`` attributes are not considered when resolving imports.
   434  This was part of an initial prototype, but it was confusing in many situations.
   436  Building and running Gazelle
   437  ----------------------------
   439  Gazelle is a regular Go program. It can be built, installed, and run without
   440  Bazel, using the regular Go SDK.
   442  .. code:: bash
   444    $ go get -u
   445    $ gazelle -go_prefix
   447  We lightly discourage this method of running Gazelle. All developers on a
   448  project should use the same version of Gazelle to ensure the build files
   449  they generate are consistent. The easiest way to accomplish this is to build
   450  and run Gazelle through Bazel. Gazelle may added to a WORKSPACE file, 
   451  built as a normal ``go_binary``, then installed or run from the ``bazel-bin/``
   452  directory.
   454  .. code:: bash
   456    $ bazel build @bazel_gazelle//cmd/gazelle
   457    $ bazel-bin/external/bazel_gazelle/cmd/gazelle/gazelle -go_prefix
   459  It's usually better to invoke Gazelle through a wrapper script though. This
   460  saves typing and ensures Gazelle is run with a consistent set of arguments.
   461  We provide a Bazel rule that generates such a wrapper script. Developers may
   462  add a snippet like the one below to a build file:
   464  .. code:: bzl
   466    load("@bazel_gazelle//:def.bzl", "gazelle")
   468    gazelle(
   469        name = "gazelle",
   470        command = "fix",
   471        external = "vendored",
   472        prefix = "",
   473    )
   475  This script may be built and executed in a single command with ``bazel run``.
   477  .. code:: bash
   479    $ bazel run //:gazelle
   481  This is the most convenient way to run Gazelle, and it's what we recommend to
   482  users. However, there are two issues with running Gazelle in this
   483  fashion. First, binaries executed by ``bazel run`` are run in the Bazel
   484  execroot, not the user's current directory. The wrapper script uses a hack
   485  (dereferencing symlinks) to jump to the top of the workspace source tree before
   486  running Gazelle. Second, ``bazel run`` holds a lock on the Bazel output
   487  directory. This means Gazelle cannot invoke Bazel without deadlocking. Commands
   488  like ``bazel query`` would be helpful for detecting generated code, but it's not
   489  safe to use them.
   491  To avoid these limitations, the wrapper script may be copied to the workspace
   492  and optionally checked into version control. When the wrapper script is run
   493  directly (without ``bazel run``), it will rebuild itself to ensure no changes
   494  are needed. If the rebuilt script differs from the running script, it will
   495  prompt the user to copy the rebuilt script into the workspace again.
   497  .. code:: bash
   499    $ bazel build //:gazelle
   500    Target //:gazelle up-to-date:
   501      bazel-bin/gazelle.bash
   502    ____Elapsed time: 1.326s, Critical Path: 0.00s
   503    $ cp bazel-bin/gazelle.bash gazelle.bash
   504    $ ./gazelle.bash
   506  Dependencies
   507  ------------
   509  Gazelle has the following dependencies:
   512    Skylark utility used to generate wrapper script in the ``gazelle`` rule.
   514    Used to parse and rewrite build files.
   516    Used to build and test Gazelle through Bazel. Gazelle can aslo be built on its
   517    own with the Go SDK.
   519    Used to import dependencies from dep Gopkg.lock files.
   521    Used during dependency resolution to determine the repository prefix for a
   522    given import path. This uses the network.