github.com/bazelbuild/bazel-gazelle@v0.36.1-0.20240520142334-61b277ba6fed/Design.rst (about) 1 Architecture of Gazelle 2 ======================= 3 4 .. All external links are here. 5 6 .. Godoc links 7 .. _buildifier build: https://godoc.org/github.com/bazelbuild/buildtools/build 8 .. _config: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/config 9 .. _go/build: https://godoc.org/go/build 10 .. _go/parser: https://godoc.org/go/parser 11 .. _merger: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/merger 12 .. _packages: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/packages 13 .. _resolve: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/resolve 14 .. _rules: https://godoc.org/github.com/bazelbuild/bazel-gazelle/internal/rules 15 .. _CallExpr: https://godoc.org/github.com/bazelbuild/buildtools/build#CallExpr 16 .. _golang.org/x/tools/go/vcs: https://godoc.org/golang.org/x/tools/go/vcs 17 18 .. Other documentation links 19 .. _buildifier: https://github.com/bazelbuild/buildtools/tree/master/buildifier 20 .. _config_setting: https://docs.bazel.build/versions/master/be/general.html#config_setting 21 .. _Fix command transformations: README.rst#fix-command-transformations 22 .. _full list of directives: README.rst#Directives 23 .. _select: https://docs.bazel.build/versions/master/skylark/lib/globals.html#select 24 25 .. Issues 26 .. _#5: https://github.com/bazelbuild/bazel-gazelle/issues/5 27 .. _#7: https://github.com/bazelbuild/bazel-gazelle/issues/7 28 29 .. Actual content is below 30 31 Gazelle is a tool that generates and updates Bazel build files for Go projects 32 that follow the conventional "go build" project layout. It is intended to 33 simplify the maintenance of Bazel Go projects as much as possible. 34 35 This document describes how Gazelle works. It should help users understand why 36 Gazelle behaves as it does, and it should help developers understand 37 how to modify Gazelle and how to write similar tools. 38 39 .. contents:: 40 41 Overview 42 -------- 43 44 Gazelle generates and updates build files according the algorithm outlined 45 below. Each of the steps here is described in more detail in the sections below. 46 47 * Build a configuration from command line arguments and special comments 48 in the top-level build file. See Configuration_. 49 50 * For each directory in the repository: 51 52 * Read the build file if one is present. 53 54 * If the build file should be updated (based on configuration): 55 56 * Apply transformations to the build file to migrate away from deprecated 57 APIs. See `Fixing build files`_. 58 59 * Scan the source files and collect metadata needed to generate rules 60 for the directory. See `Scanning source files`_. 61 62 * Generate new rules from the build metadata collected earlier. See 63 `Generating rules`_. 64 65 * Merge the new rules into the directory's build file. Delete any rules 66 which are now empty. See `Merging and deleting rules`_. 67 68 * Add the library rules in the directory's build file to a global table, 69 indexed by import path. 70 71 * For each updated build file: 72 73 * Use the library table to map import paths to Bazel labels for rules that 74 were added or merged earlier. See `Resolving dependencies`_. 75 76 * Merge the resolved rules back into the file. 77 78 * Format the file using buildifier_ and emit it according to the output mode: 79 write to disk, print the whole file, or print the diff. 80 81 Configuration 82 ------------- 83 84 Godoc: config_ 85 86 Gazelle stores configuration information in ``Config`` objects. These objects 87 contain settings that affect the behavior of most packages in the program. 88 For example: 89 90 * The list of directories that Gazelle should update. 91 * The path of the repository root directory. Bazel package names are based 92 on paths relative to this location. 93 * The current import path prefix and the directory where it was set. 94 Gazelle uses this to infer import paths for ``go_library`` rules. 95 * A list of build tags that Gazelle considers to be true on all platforms. 96 97 ``Config`` objects apply to individual directories. Each directory inherits 98 the ``Config`` from its parent. Values in a ``Config`` may be modified within 99 a directory using *directives* written in the directory's build file. A 100 directive is a special comment formatted like this: 101 102 :: 103 104 # gazelle:key value 105 106 Here are a few examples. See the `full list of directives`_. 107 108 * ``# gazelle:prefix`` - sets the Go import path prefix for the current 109 directory. 110 * ``# gazelle:build_tags`` - sets the list of build tags which Gazelle considers 111 to be true on all platforms. 112 113 There are a few directives which are not applied to the ``Config`` object but 114 are interpreted directly in packages where they are relevant. 115 116 * ``# gazelle:ignore`` - the build file should not be updated by Gazelle. 117 Gazelle may still index its contents so it can resolve dependencies in other 118 build files. 119 * ``# gazelle:exclude path/to/file`` - the named file should not be read by 120 Gazelle and should not be included in ``srcs`` lists. If this refers to 121 a directory, Gazelle won't recurse into the directory. This directive may 122 appear multiple times. 123 124 Fixing build files 125 ------------------ 126 127 Godoc: merger_ 128 129 From time to time, APIs in rules_go are changed or updated. Gazelle helps 130 users stay up to date with these changes by automatically fixing deprecated 131 usage. 132 133 Minor fixes are applied by Gazelle automatically every time it runs. However, 134 some fixes may delete or rename existing rules. Users must run ``gazelle fix`` 135 to apply these fixes. By default, Gazelle will only *warn* users that 136 ``gazelle fix`` should be run. 137 138 Here are a few of the fixes Gazelle performs. See `Fix command transformations`_ 139 for a full list. 140 141 * **Squash cgo libraries:** Gazelle will remove ``cgo_library`` rules and 142 merge their attributes into ``go_library`` rules that reference them. 143 This is a major fix and is only applied with ``gazelle fix``. 144 * **Migrate library attributes:** Gazelle replaces ``library`` attributes 145 with ``embed`` attributes. The only difference between these is that 146 ``library`` (which is now deprecated) accepts a single label, while ``embed`` 147 accepts a list. This is a minor fix and is always applied. 148 149 Users can prevent Gazelle from modifying rules, attributes, or individual 150 values by writing ``# keep`` comments above them. 151 152 Scanning source files 153 --------------------- 154 155 Godoc: packages_ 156 157 Nearly all of the information needed to build a program with the standard Go SDK 158 is implied by directory structure, file names, and file contents. This is why 159 ``go build`` doesn't require any sort of build file. The `go/build`_ package in 160 the standard library collects this information. 161 162 Unfortunately, `go/build`_ can only collect information for one platform at 163 a time. Gazelle needs to generate build files that work on all platforms, so 164 we have our own implementation of this logic. 165 166 Information extracted from files 167 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 168 169 Gazelle extracts build metadata from source files and contents in much the 170 same way that the standard `go/build`_ package does. It gets the following 171 information from file names: 172 173 * File extension (e.g., .go, .c, .proto). Normally, only .go, .s, and .h files 174 are included in Go rules. If any cgo code is present, then C/C++ files are 175 also included. .proto files are also used to build proto rules. Other files 176 (e.g., .txt) are ignored. 177 * Test suffix. For example, if a file is named ``foo_test.go``, it will be 178 included in a test target instead of a library or binary target. 179 * OS and architecture suffixes. For example, a file named ``foo_linux_amd64.go`` 180 will be listed in the ``linux_amd64`` section of the target it belongs to. 181 182 Gazelle gets the following information from file contents: 183 184 * Package name. This is syntactically the first part of every .go file. All 185 files in the same directory must have the same package name (except for 186 external test sources, which have a package name ending with ``_test``). If 187 there are multiple packages, Gazelle will choose one that matches the 188 directory name (if present) or report an error. 189 * Imported libraries. Go import paths are usually URLs. Imports in 190 platform-specific source files are also platform-specific. 191 * Build tags. The Go toolchain recognizes comments beginning with ``// +build`` 192 before the package declaration. These tags tell the build system that a file 193 should only be built for specific platforms. See `this article 194 <https://dave.cheney.net/2013/10/12/how-to-use-conditional-compilation-with-the-go-build-tool>`_ 195 for more information. 196 * Whether cgo code is present. This affects how packages are built and 197 whether C/C++ files are included. 198 * C/C++ compile and link options (specified in ``#cgo`` directives in cgo 199 comments). These may be platform-specific. 200 201 In most cases, only the top of the file is parsed. For Go files, we use the 202 standard `go/parser`_ package. For proto files, we use regular expressions that 203 match ``package``, ``go_package``, and ``import`` statements. 204 205 The ``Package`` object 206 ~~~~~~~~~~~~~~~~~~~~~~ 207 208 Gazelle stores build metadata in a ``Package`` object. Currently, we only 209 support one ``Package`` per directory (which is also what the Go SDK supports), 210 but this will be expanded in the future. ``Package`` objects contain some 211 top-level metadata (like the package name and directory path), along with 212 several target objects (``GoTarget`` and ``ProtoTarget``). 213 214 Target objects correspond directly to rules that will be generated later. They 215 store lists of sources, imports, and flags in ``PlatformStrings`` objects. 216 217 ``PlatformStrings`` objects store strings in four sections: a generic list, an 218 OS-specific dictionary, an architecture-specific dictionary, and an 219 OS-and-architecture-specific dictionary. The keys in the dictionaries are OS 220 names, architecture names, or OS-and-architecture pairs; the values are lists of 221 strings. The same string may not appear more than once in a list and may not 222 appear in more than one section. This is due to a Bazel requirement: the same 223 label may not appear more than once in a ``deps`` list. 224 225 Generating rules 226 ---------------- 227 228 Godoc: rules_ 229 230 Once build metadata has been extracted from the sources in a directory, 231 Gazelle generates rules for building those sources. 232 233 Generated rules are formatted as CallExpr_ objects. CallExpr_ is defined in the 234 `buildifier build`_ library. This is the same library used to parse and format 235 build files. This lets us manipulate newly generated rules and existing rules 236 with the same code. 237 238 We may generate the following rules: 239 240 * ``proto_library`` and ``go_proto_library`` are generated if there was at 241 least one .proto source file. 242 * ``go_library`` is generated if there was at least one non-test source. This 243 may embed the ``go_proto_library`` if there was one. 244 * ``go_test`` rules are generated for internal and external tests. Internal 245 tests embed the ``go_library`` while external tests depend on the 246 ``go_library`` as a separate package. 247 * ``go_binary`` is generated if the package name was ``main``. It embeds the 248 ``go_library``. 249 250 Rules are named according to a pluggable naming policy, but there is currently 251 only one policy: libraries are named ``go_default_library``, tests are 252 named ``go_default_test``, and binaries are named after the directory. The 253 ``go_default_library`` name is an historical artifact from before we had 254 index-based dependency resolution. We'll need to move away from this naming 255 scheme in the future (`#5`_) before we support multiple packages (`#7`_). 256 257 Sources, imports, and flags within each target are converted to expressions in a 258 straightforward fashion. The lists within ``PlatformStrings`` are converted to 259 list expressions. Dictionaries are converted to calls to `select`_ expressions 260 (when Bazel evaluates a `select`_ expression, it will choose one of several 261 provided lists, based on `config_setting`_ rules). Lists and select expressions 262 may be added together. For example: 263 264 .. code:: bzl 265 266 go_library( 267 name = "go_default_library", 268 srcs = [ 269 "terminal.go", 270 ] + select({ 271 "@io_bazel_rules_go//go/platform:darwin": [ 272 "util.go", 273 "util_bsd.go", 274 ], 275 "@io_bazel_rules_go//go/platform:linux": [ 276 "util.go", 277 "util_linux.go", 278 ], 279 "@io_bazel_rules_go//go/platform:windows": [ 280 "util_windows.go", 281 ], 282 "//conditions:default": [], 283 }), 284 ... 285 ) 286 287 At this point, Gazelle does not have enough information to generate expressions 288 ``deps`` attributes. We only have a list of import strings extracted from source 289 files. These imports are stored temporarily in a special ``_gazelle_imports`` 290 attribute in each rule. Later, the imports are converted to Bazel labels (see 291 `Resolving dependencies`_), and this attribute is replaced with ``deps``. 292 293 Merging and deleting rules 294 -------------------------- 295 296 Godoc: merger_ 297 298 Merging is the process of combining generated rules with the corresponding 299 rules in an existing build file. If no build file exists in a directory, a 300 new file is created with generated rules, and no merging is performed. 301 302 Merging occurs in two phases: pre-resolve, and post-resolve. This is due to an 303 interdependence with dependency resolution. Dependency resolution uses a table 304 of *merged* library rules, so it can't be performed until the pre-resolve merge 305 has occurred. After dependency resolution, we need to merge newly generated 306 ``deps`` attributes; this is done in the post-resolve merge. The two phases use 307 the same algorithm. 308 309 During the merge process, Gazelle attempts to match generated rules with 310 existing rules that have the same name and same kind. Rules are only merged if 311 both name and kind match. If an existing rule has the same name as a generated 312 rule but a different kind, the generated rule will not be merged. If no 313 existing rule matches a generated rule, the generated rule is simply appended to 314 the end of the file. Existing rules that don't match any generated rule are not 315 modified. 316 317 When Gazelle identifies a matching pair of rules, it combines each attribute 318 according to the algorithm below. If an attribute is present in the generated 319 rule but not in the existing rule, it is copied to the merged rule verbatim. If 320 an attribute is present in the existing rule but not the generated rule, Gazelle 321 behaves as if the generated attribute were present but empty. 322 323 * For each value in the existing rule's attribute: 324 325 * If the value also appears in the generated rule's attribute or is marked 326 with a ``# keep`` comment, preserve it. Otherwise, delete it. 327 328 * For each value in the generated rule's attribute: 329 330 * If the value appears in the generated rule's attribute, ignore it. 331 Otherwise, add it to the merged rule. 332 333 * If the merged attribute is empty, delete it. 334 335 When a value is present in both the existing and generated attributes, we use 336 the existing value instead of the generated value, since this preserves 337 comments. 338 339 Some attributes are considered *unmergeable*, for example, ``visibility`` and 340 ``gc_goopts``. Gazelle may add these attributes to existing rules if they are 341 not already present, but existing values won't be modified or deleted. 342 343 Preserving customizations 344 ~~~~~~~~~~~~~~~~~~~~~~~~~ 345 346 Gazelle has several mechanisms for preserving manual modifications to build 347 files. Some of these mechanisms work automatically; others require explicit 348 comments. 349 350 * Gazelle will not modify or delete rules that don't appear to have been 351 generated by Gazelle. 352 * As mentioned above, some attributes are considered unmergeable. Gazelle may 353 set initial values for these but won't delete or replace existing values. 354 * ``# keep`` comments may be attached to any rule, attribute, or value 355 to prevent Gazelle from modifying it. 356 * ``# gazelle:exclude <file>`` directives can be used to prevent Gazelle from 357 adding files to source lists (for example, checked-in .pb.go files). They 358 can also prevent Gazelle from recursing into directories that contain 359 unbuildable code (e.g., ``testdata``). 360 * ``# gazelle:ignore`` directives prevent Gazelle from making any modifications 361 to build files that contain them. 362 363 Deleting rules 364 ~~~~~~~~~~~~~~ 365 366 Deletion is a special case of the merging algorithm. 367 368 When Gazelle generates rules for a package (see `Generating rules`_), it 369 actually produces two lists of rules: a list of rules for buildable targets, 370 and a list of empty rules that may be deleted. The empty rules have no 371 attributes other than ``name``. 372 373 The empty rules are merged using the same algorithm as the other generated 374 rules. If, after merging, an empty rule has no attributes that would make the 375 rule buildable (for example, ``srcs``, or ``deps``), the rule will be deleted. 376 377 Resolving dependencies 378 ---------------------- 379 380 Godoc: resolve_ 381 382 When Gazelle generates rules for a package (see `Generating 383 rules`_), it stores names of the libraries imported by each rule in a special 384 ``_gazelle_imports`` attribute. During dependency resolution, Gazelle maps these 385 imports to Bazel labels and replaces ``_gazelle_imports`` with ``deps``. 386 387 Before dependency resolution starts, Gazelle builds a table of all known 388 libraries. This includes ``go_library``, ``go_proto_library``, and 389 ``proto_library`` rules. The table is populated by scanning build files after 390 the pre-resolve merge, so existing and newly generated rules are included 391 in the table, and deleted rules are excluded. Once all library rules have been 392 added, Gazelle indexes the table by language-specific import path. 393 394 Gazelle resolves each import string in ``_gazelle_imports`` as follows: 395 396 * If the import is part of the standard library, it is dropped. Standard 397 library dependencies are implicit. 398 399 * If the import is provided by exactly one rule in the library table, the label 400 for that rule is used. 401 402 * If the import is provided by multiple libraries, we attempt to resolve 403 the ambiguity. 404 405 * For Go, we apply the vendoring algorithm. Vendored libraries aren't visible 406 outside of the vendor directory's parent. 407 408 * Go libraries that are embedded by other Go libraries are not considered. 409 Embedded libraries may be incomplete. 410 411 * When an ambiguity can't be resolved, Gazelle logs an error and skips 412 the dependency. 413 414 * If the import is not provided by any rule in the import table, we attempt 415 to resolve the dependency using heuristics: 416 417 * If the import path starts with the current prefix (set with a 418 ``# gazelle:prefix`` directive or on the command line), we construct a label 419 by concatenating the prefix directory and the portion of the import path 420 below the prefix into a package name. 421 422 * Otherwise, the import path is considered external and is resolved 423 according to the external mode set on the command line. 424 425 * In ``external`` mode, Gazelle determines the portion of the import path 426 that corresponds to a repository using `golang.org/x/tools/go/vcs`_. This 427 part of the path is converted into a repository name (for example, 428 ``@org_golang_x_tools_go_vcs``), and the rest is converted to a package name. 429 430 * In ``vendored`` mode, Gazelle constructs a label by prepending ``vendor/`` 431 to the import path. 432 433 Note that ``visibility`` attributes are not considered when resolving imports. 434 This was part of an initial prototype, but it was confusing in many situations. 435 436 Building and running Gazelle 437 ---------------------------- 438 439 Gazelle is a regular Go program. It can be built, installed, and run without 440 Bazel, using the regular Go SDK. 441 442 .. code:: bash 443 444 $ go install github.com/bazelbuild/bazel-gazelle/cmd/gazelle@latest 445 $ gazelle -go_prefix example.com/project 446 447 We lightly discourage this method of running Gazelle. All developers on a 448 project should use the same version of Gazelle to ensure the build files 449 they generate are consistent. The easiest way to accomplish this is to build 450 and run Gazelle through Bazel. Gazelle may added to a WORKSPACE file, 451 built as a normal ``go_binary``, then installed or run from the ``bazel-bin/`` 452 directory. 453 454 .. code:: bash 455 456 $ bazel build @bazel_gazelle//cmd/gazelle 457 $ bazel-bin/external/bazel_gazelle/cmd/gazelle/gazelle -go_prefix example.com/project 458 459 It's usually better to invoke Gazelle through a wrapper script though. This 460 saves typing and ensures Gazelle is run with a consistent set of arguments. 461 We provide a Bazel rule that generates such a wrapper script. Developers may 462 add a snippet like the one below to a build file: 463 464 .. code:: bzl 465 466 load("@bazel_gazelle//:def.bzl", "gazelle") 467 468 gazelle( 469 name = "gazelle", 470 command = "fix", 471 external = "vendored", 472 prefix = "example.com/project", 473 ) 474 475 This script may be built and executed in a single command with ``bazel run``. 476 477 .. code:: bash 478 479 $ bazel run //:gazelle 480 481 This is the most convenient way to run Gazelle, and it's what we recommend to 482 users. However, there are two issues with running Gazelle in this 483 fashion. First, binaries executed by ``bazel run`` are run in the Bazel 484 execroot, not the user's current directory. The wrapper script uses a hack 485 (dereferencing symlinks) to jump to the top of the workspace source tree before 486 running Gazelle. Second, ``bazel run`` holds a lock on the Bazel output 487 directory. This means Gazelle cannot invoke Bazel without deadlocking. Commands 488 like ``bazel query`` would be helpful for detecting generated code, but it's not 489 safe to use them. 490 491 To avoid these limitations, the wrapper script may be copied to the workspace 492 and optionally checked into version control. When the wrapper script is run 493 directly (without ``bazel run``), it will rebuild itself to ensure no changes 494 are needed. If the rebuilt script differs from the running script, it will 495 prompt the user to copy the rebuilt script into the workspace again. 496 497 .. code:: bash 498 499 $ bazel build //:gazelle 500 Target //:gazelle up-to-date: 501 bazel-bin/gazelle.bash 502 ____Elapsed time: 1.326s, Critical Path: 0.00s 503 $ cp bazel-bin/gazelle.bash gazelle.bash 504 $ ./gazelle.bash 505 506 Dependencies 507 ------------ 508 509 Gazelle has the following dependencies: 510 511 github.com/bazelbuild/bazel-skylib 512 Skylark utility used to generate wrapper script in the ``gazelle`` rule. 513 github.com/bazelbuild/buildtools/build 514 Used to parse and rewrite build files. 515 github.com/bazelbuild/rules_go 516 Used to build and test Gazelle through Bazel. Gazelle can aslo be built on its 517 own with the Go SDK. 518 golang.org/x/tools/vcs 519 Used during dependency resolution to determine the repository prefix for a 520 given import path. This uses the network.