go.chromium.org/luci@v0.0.0-20240309015107-7cdc2e660f33/vpython/README.md (about) 1 [TOC] 2 3 ## vpython - simple and easy Virtualenv Python 4 5 `vpython` is a tool, written in Go, which enables the simple and easy invocation 6 of Python code in [Virtualenv](https://virtualenv.pypa.io/en/stable/) 7 environments. 8 9 `vpython` is a simple Python bootstrap which (almost) transparently wraps a 10 Python interpreter invocation to run in a tailored Virtualenv environment. The 11 environment is expressed by a script-specific configuration file. This allows 12 each Python script to trivially express its own package-level dependencies and 13 run in a hermetic world consisting of just those dependencies. 14 15 When invoking such a script via `vpython`, the tool downloads its dependencies 16 and prepares an immutable Virtualenv containing them. It then invokes the 17 script, now running in that Virtualenv, through the preferred Python 18 interpreter. 19 20 `vpython` does its best not to use hacky mechanisms to achieve this. It uses 21 an unmodified Virtualenv package, standard setup methods, and local system 22 resources. The result is transparent canonical Virtualenv environment 23 bootstrapping that meets the expectations of standard Python packages. `vpython` 24 is also safe for concurrent invocation, using safe filesystem-level locking to 25 perform any environment setup and management. 26 27 `vpython` itself is very fast. The wheel downloads and Virtualenvs may also be 28 cached and re-used, optimally limiting the runtime overhead of `vpython` to just 29 one initial setup per unique environment. 30 31 ### Setup and Invocation 32 33 For the standard case, employing `vpython` is as simple as: 34 35 1. Create a `vpython` Virtualenv specification (or don't, if no additional 36 packages are needed. 37 2. Invoke your script through `vpython` instead of `python`. 38 39 If additional Python libraries are needed, you may create new packages for those 40 libraries. This is done in an implementation-specific way (e.g., upload wheels 41 as packages to CIPD). 42 43 Once the packages are available: 44 45 * Add `vpython` to `PATH`. 46 * Write an environment specification naming packages. 47 * Change tool invocation from `python` to `vpython`. 48 49 Using `vpython` offers several benefits to direct Python invocation, especially 50 when vendoring packages. Notably, with `vpython`: 51 52 * It trivially enables hermetic Python everywhere, greatly increasing control 53 and removing per-system differences in Python packages and environment. 54 * It handles situations that system-level packages cannot accommodate, such as 55 different scripts with different versions of packages running in them. 56 * No `sys.path` manipulation is needed to load vendored or imported packages. 57 * Any tool can define which package(s) it needs without requiring coordination 58 or cooperation from other tools. (Note that the package must be made available 59 for download first). 60 * Adding new Python dependencies to a project is non-invasive and immediate. 61 * Package downloading and deployment are baked into `vpython` and built on 62 fast and secure Google Cloud Platform technologies. 63 * No more custom bootstraps. Several projects and tools, including multiple 64 places within Chrome's infra code base, have bootstrap scripts that vendor 65 packages or mimic a Virtualenv. These are at best repetitive and, at worst, 66 buggy and insecure. 67 * Dependencies are explicitly stated, not assumed, and consistent between 68 deployments. 69 70 ### Why Virtualenv? 71 72 Virtualenv offers several benefits over system Python. Primarily, it is the 73 *de facto* encapsulated environment method used by the Python community and is 74 generally used as the standard for a functional deployable package. 75 76 By using the same environment everywhere, Python invocations become 77 reproducible. A tool run on a developer's system will load the same versions 78 of the same libraries as it will on a production system. A production system 79 will no longer fail because it is missing a package, because it has the 80 wrong version of that package, or because a package is incompatible with another 81 installed package. 82 83 A direct mechanism for vendoring, `sys.path` manipulation, is nuanced, buggy, 84 and unsupported by the Python community. It is difficult to do correctly on all 85 platforms in all environments for all packages. A notorious example of this is 86 `protobuf` and other domain-bound packages, which actively fight `sys.path` 87 inclusion and require special non-intuitive hacks to work. Using Virtualenv 88 means that any compliant Python package can trivially be included into a 89 project. 90 91 ### Why CIPD? 92 93 [CIPD](https://github.com/luci/luci-go/tree/master/cipd) is a cross-platform 94 service and associated tooling and packages used to securely fetch and deploy 95 immutable "packages" (~= zip files) into the local file system. Unlike package 96 managers, it avoids platform-specific assumptions, executable hooks, or the 97 complexities of dependency resolution. `vpython` uses this as a mechanism for 98 housing and deploying wheels. 99 100 Unlike `pip`, a CIPD package is defined by its content, enabling precise package 101 matching instead of fuzzy version matching (e.g., `numpy >= 1.2`, and 102 `numpy == 1.2` both can match multiple `numpy` packages in `pip`). 103 104 CIPD also supports ACLs, enabling privileged Python projects to easily vendor 105 sensitive packages. 106 107 ### Why wheels? 108 109 A Python [wheel](https://www.python.org/dev/peps/pep-0427/) is a simple binary 110 distribution of Python code. A wheel can be generic (pure Python) or system- 111 and architecture-bound (e.g., 64-bit Mac OSX). 112 113 Wheels are preferred over Python eggs because they come packaged with compiled 114 binaries. This makes their deployment fast and simple: unpack via `pip`. It also 115 reduces system requirements and variation, since local compilation, headers, 116 and build tools are not enlisted during installation. 117 118 The increased management burden of maintaining separate wheels for the same 119 package, one for each architecture, is handled naturally by CIPD, removing the 120 only real pain point. 121 122 ## Wheel Guidance 123 124 This section contains recommendations for building or uploading wheel CIPD 125 packages, including platform-specific guidance. 126 127 CIPD wheel packages are CIPD packages that contain Python wheels. A given CIPD 128 package can contain multiple wheels for multiple platforms, but should only 129 contain one version of any given package for any given architecture/platform. 130 131 For example, you can bundle a Windows, Linux, and Mac OSX version of `numpy` and 132 `coverage` in the same CIPD package, but you should not bundle `numpy==1.11` and 133 `numpy==1.12` in the same package. 134 135 The reason for this is that `vpython` identifies which wheels to install by 136 scanning the contents of the CIPD package, and if multiple versions appear, 137 there is no clear guidance about which should be used. 138 139 ## Setup and Invocation 140 141 `vpython` can be invoked by replacing `python3` in the command-line with 142 `vpython3`. 143 144 `vpython` works with a default Python environment out of the box. To add 145 vendored packages, you need to define an environment specification file that 146 describes which wheels to install. 147 148 An environment specification file is a text protobuf defined as `Spec` 149 [here](./api/vpython/spec.proto). An example is: 150 151 ``` 152 # Any 3.11 interpreter will do. 153 python_version: "3.11" 154 155 # Include "cffi" for the current architecture. 156 wheel: < 157 name: "infra/python/wheels/cffi/${vpython_platform}" 158 version: "version:1.14.5.chromium.7" 159 > 160 ``` 161 162 This specification can be supplied in one of four ways: 163 164 * Explicitly, as a command-line option to `vpython` (`-vpython-spec`). 165 * Implicitly, as a file alongside your entry point. For example, if you are 166 running `test_runner.py`, `vpython` will look for `test_runner.py.vpython` 167 next to it and load the environment from there. 168 * Implicitly, inlined in your main file. `vpython` will scan the main entry 169 point for sentinel text and, if present, load the specification from that. 170 * Implicitly, through the `VPYTHON_DEFAULT_SPEC` environment variable. 171 172 ### Optimization and Caching 173 174 `vpython` has several levels of caching that it employs to optimize setup and 175 invocation overhead. 176 177 #### Virtualenv 178 179 Once a Virtualenv specification has been resolved, its resulting pinned 180 specification is hashed and used as a key to that Virtualenv. Other `vpython` 181 invocations expressing the same environment will naturally re-use that 182 Virtualenv instead of creating their own. 183 184 #### Download Caching 185 186 Download mechanisms (e.g., CIPD) can optionally include a package cache to avoid 187 the overhead of downloading and/or resolving a package multiple times. 188 189 ### Migration 190 191 #### Command-line. 192 193 `vpython3` is a natural replacement for `pytho3n` in the command line: 194 195 ```sh 196 python3 ./foo/bar/baz.py -d --flag value arg arg whatever 197 ``` 198 199 Becomes: 200 ```sh 201 vpython3 ./foo/bar/baz.py -d --flag value arg arg whatever 202 ``` 203 204 The `vpython` tool accepts its own command-line arguments. In this case, use 205 a `--` separator to differentiate between `vpython` options and `python` options: 206 207 ```sh 208 vpython3 -vpython-spec /path/to/spec.vpython -- ./foo/bar/baz.py 209 ``` 210 211 #### Shebang (POSIX) 212 213 If your script uses implicit specification (file or inline), replacing `python` 214 with `vpython` in your shebang line will automatically work. 215 216 ```sh 217 #!/usr/bin/env vpython3 218 ``` 219 220 ## Configuration 221 222 There are a number of environment variables that can affect vpython's behavior. 223 These are the following: 224 225 * `VPYTHON_BYPASS`: If set to `manually managed python not supported by chrome 226 operations`, vpython will do nothing and will instead directly invoke the 227 next `python` on PATH. Will have no effect if it's set to anything else. 228 * `VPYTHON_DEFAULT_SPEC`: Specifies path to a vpython spec file that will be 229 used if none is provided or found through probing. 230 * `VPYTHON_LOG_TRACE`: Specifies log level of vpython. Can also be specified 231 via the "-vpython-log-level" cmd-line flag. 232 * `VPYTHON_VIRTUALENV_ROOT`: Specifies the VirtualEnv root. Default is 233 `~/.vpython-root`. Can also be specified via the "-vpython-root" cmd-line 234 flag.