go.chromium.org/luci@v0.0.0-20240309015107-7cdc2e660f33/vpython/README.md (about)

     1  [TOC]
     2  
     3  ## vpython - simple and easy Virtualenv Python
     4  
     5  `vpython` is a tool, written in Go, which enables the simple and easy invocation
     6  of Python code in [Virtualenv](https://virtualenv.pypa.io/en/stable/)
     7  environments.
     8  
     9  `vpython` is a simple Python bootstrap which (almost) transparently wraps a
    10  Python interpreter invocation to run in a tailored Virtualenv environment. The
    11  environment is expressed by a script-specific configuration file. This allows
    12  each Python script to trivially express its own package-level dependencies and
    13  run in a hermetic world consisting of just those dependencies.
    14  
    15  When invoking such a script via `vpython`, the tool downloads its dependencies
    16  and prepares an immutable Virtualenv containing them. It then invokes the
    17  script, now running in that Virtualenv, through the preferred Python
    18  interpreter.
    19  
    20  `vpython` does its best not to use hacky mechanisms to achieve this. It uses
    21  an unmodified Virtualenv package, standard setup methods, and local system
    22  resources. The result is transparent canonical Virtualenv environment
    23  bootstrapping that meets the expectations of standard Python packages. `vpython`
    24  is also safe for concurrent invocation, using safe filesystem-level locking to
    25  perform any environment setup and management.
    26  
    27  `vpython` itself is very fast. The wheel downloads and Virtualenvs may also be
    28  cached and re-used, optimally limiting the runtime overhead of `vpython` to just
    29  one initial setup per unique environment.
    30  
    31  ### Setup and Invocation
    32  
    33  For the standard case, employing `vpython` is as simple as:
    34  
    35  1. Create a `vpython` Virtualenv specification (or don't, if no additional
    36     packages are needed.
    37  2. Invoke your script through `vpython` instead of `python`.
    38  
    39  If additional Python libraries are needed, you may create new packages for those
    40  libraries. This is done in an implementation-specific way (e.g., upload wheels
    41  as packages to CIPD).
    42  
    43  Once the packages are available:
    44  
    45  * Add `vpython` to `PATH`.
    46  * Write an environment specification naming packages.
    47  * Change tool invocation from `python` to `vpython`.
    48  
    49  Using `vpython` offers several benefits to direct Python invocation, especially
    50  when vendoring packages. Notably, with `vpython`:
    51  
    52  * It trivially enables hermetic Python everywhere, greatly increasing control
    53    and removing per-system differences in Python packages and environment.
    54  * It handles situations that system-level packages cannot accommodate, such as
    55    different scripts with different versions of packages running in them.
    56  * No `sys.path` manipulation is needed to load vendored or imported packages.
    57  * Any tool can define which package(s) it needs without requiring coordination
    58    or cooperation from other tools. (Note that the package must be made available
    59    for download first).
    60  * Adding new Python dependencies to a project is non-invasive and immediate.
    61  * Package downloading and deployment are baked into `vpython` and built on
    62    fast and secure Google Cloud Platform technologies.
    63  * No more custom bootstraps. Several projects and tools, including multiple
    64    places within Chrome's infra code base, have bootstrap scripts that vendor
    65    packages or mimic a Virtualenv. These are at best repetitive and, at worst,
    66    buggy and insecure.
    67  * Dependencies are explicitly stated, not assumed, and consistent between
    68    deployments.
    69  
    70  ### Why Virtualenv?
    71  
    72  Virtualenv offers several benefits over system Python. Primarily, it is the
    73  *de facto* encapsulated environment method used by the Python community and is
    74  generally used as the standard for a functional deployable package.
    75  
    76  By using the same environment everywhere, Python invocations become
    77  reproducible. A tool run on a developer's system will load the same versions
    78  of the same libraries as it will on a production system. A production system
    79  will no longer fail because it is missing a package, because it has the
    80  wrong version of that package, or because a package is incompatible with another
    81  installed package.
    82  
    83  A direct mechanism for vendoring, `sys.path` manipulation, is nuanced, buggy,
    84  and unsupported by the Python community. It is difficult to do correctly on all
    85  platforms in all environments for all packages. A notorious example of this is
    86  `protobuf` and other domain-bound packages, which actively fight `sys.path`
    87  inclusion and require special non-intuitive hacks to work. Using Virtualenv
    88  means that any compliant Python package can trivially be included into a
    89  project.
    90  
    91  ### Why CIPD?
    92  
    93  [CIPD](https://github.com/luci/luci-go/tree/master/cipd) is a cross-platform
    94  service and associated tooling and packages used to securely fetch and deploy
    95  immutable "packages" (~= zip files) into the local file system. Unlike package
    96  managers, it avoids platform-specific assumptions, executable hooks, or the
    97  complexities of dependency resolution. `vpython` uses this as a mechanism for
    98  housing and deploying wheels.
    99  
   100  Unlike `pip`, a CIPD package is defined by its content, enabling precise package
   101  matching instead of fuzzy version matching (e.g., `numpy >= 1.2`, and
   102  `numpy == 1.2` both can match multiple `numpy` packages in `pip`).
   103  
   104  CIPD also supports ACLs, enabling privileged Python projects to easily vendor
   105  sensitive packages.
   106  
   107  ### Why wheels?
   108  
   109  A Python [wheel](https://www.python.org/dev/peps/pep-0427/) is a simple binary
   110  distribution of Python code. A wheel can be generic (pure Python) or system-
   111  and architecture-bound (e.g., 64-bit Mac OSX).
   112  
   113  Wheels are preferred over Python eggs because they come packaged with compiled
   114  binaries. This makes their deployment fast and simple: unpack via `pip`. It also
   115  reduces system requirements and variation, since local compilation, headers,
   116  and build tools are not enlisted during installation.
   117  
   118  The increased management burden of maintaining separate wheels for the same
   119  package, one for each architecture, is handled naturally by CIPD, removing the
   120  only real pain point.
   121  
   122  ## Wheel Guidance
   123  
   124  This section contains recommendations for building or uploading wheel CIPD
   125  packages, including platform-specific guidance.
   126  
   127  CIPD wheel packages are CIPD packages that contain Python wheels. A given CIPD
   128  package can contain multiple wheels for multiple platforms, but should only
   129  contain one version of any given package for any given architecture/platform.
   130  
   131  For example, you can bundle a Windows, Linux, and Mac OSX version of `numpy` and
   132  `coverage` in the same CIPD package, but you should not bundle `numpy==1.11` and
   133  `numpy==1.12` in the same package.
   134  
   135  The reason for this is that `vpython` identifies which wheels to install by
   136  scanning the contents of the CIPD package, and if multiple versions appear,
   137  there is no clear guidance about which should be used.
   138  
   139  ## Setup and Invocation
   140  
   141  `vpython` can be invoked by replacing `python3` in the command-line with
   142  `vpython3`.
   143  
   144  `vpython` works with a default Python environment out of the box. To add
   145  vendored packages, you need to define an environment specification file that
   146  describes which wheels to install.
   147  
   148  An environment specification file is a text protobuf defined as `Spec`
   149  [here](./api/vpython/spec.proto). An example is:
   150  
   151  ```
   152  # Any 3.11 interpreter will do.
   153  python_version: "3.11"
   154  
   155  # Include "cffi" for the current architecture.
   156  wheel: <
   157    name: "infra/python/wheels/cffi/${vpython_platform}"
   158    version: "version:1.14.5.chromium.7"
   159  >
   160  ```
   161  
   162  This specification can be supplied in one of four ways:
   163  
   164  * Explicitly, as a command-line option to `vpython` (`-vpython-spec`).
   165  * Implicitly, as a file alongside your entry point. For example, if you are
   166    running `test_runner.py`, `vpython` will look for `test_runner.py.vpython`
   167    next to it and load the environment from there.
   168  * Implicitly, inlined in your main file. `vpython` will scan the main entry
   169    point for sentinel text and, if present, load the specification from that.
   170  * Implicitly, through the `VPYTHON_DEFAULT_SPEC` environment variable.
   171  
   172  ### Optimization and Caching
   173  
   174  `vpython` has several levels of caching that it employs to optimize setup and
   175  invocation overhead.
   176  
   177  #### Virtualenv
   178  
   179  Once a Virtualenv specification has been resolved, its resulting pinned
   180  specification is hashed and used as a key to that Virtualenv. Other `vpython`
   181  invocations expressing the same environment will naturally re-use that
   182  Virtualenv instead of creating their own.
   183  
   184  #### Download Caching
   185  
   186  Download mechanisms (e.g., CIPD) can optionally include a package cache to avoid
   187  the overhead of downloading and/or resolving a package multiple times.
   188  
   189  ### Migration
   190  
   191  #### Command-line.
   192  
   193  `vpython3` is a natural replacement for `pytho3n` in the command line:
   194  
   195  ```sh
   196  python3 ./foo/bar/baz.py -d --flag value arg arg whatever
   197  ```
   198  
   199  Becomes:
   200  ```sh
   201  vpython3 ./foo/bar/baz.py -d --flag value arg arg whatever
   202  ```
   203  
   204  The `vpython` tool accepts its own command-line arguments. In this case, use
   205  a `--` separator to differentiate between `vpython` options and `python` options:
   206  
   207  ```sh
   208  vpython3 -vpython-spec /path/to/spec.vpython -- ./foo/bar/baz.py
   209  ```
   210  
   211  #### Shebang (POSIX)
   212  
   213  If your script uses implicit specification (file or inline), replacing `python`
   214  with `vpython` in your shebang line will automatically work.
   215  
   216  ```sh
   217  #!/usr/bin/env vpython3
   218  ```
   219  
   220  ## Configuration
   221  
   222  There are a number of environment variables that can affect vpython's behavior.
   223  These are the following:
   224  
   225  *   `VPYTHON_BYPASS`: If set to `manually managed python not supported by chrome
   226      operations`, vpython will do nothing and will instead directly invoke the
   227      next `python` on PATH. Will have no effect if it's set to anything else.
   228  *   `VPYTHON_DEFAULT_SPEC`: Specifies path to a vpython spec file that will be
   229      used if none is provided or found through probing.
   230  *   `VPYTHON_LOG_TRACE`: Specifies log level of vpython. Can also be specified
   231      via the "-vpython-log-level" cmd-line flag.
   232  *   `VPYTHON_VIRTUALENV_ROOT`: Specifies the VirtualEnv root. Default is
   233      `~/.vpython-root`. Can also be specified via the "-vpython-root" cmd-line
   234      flag.