github.com/telepresenceio/telepresence/v2@v2.20.0-pro.6.0.20240517030216-236ea954e789/DEVELOPING.md

github.com/telepresenceio/telepresence/v2@v2.20.0-pro.6.0.20240517030216-236ea954e789/DEVELOPING.md (about)

     1  # Developing Telepresence 2
     2  
     3  ## Set up your environment
     4  
     5  ### Development environment
     6  
     7   - `TELEPRESENCE_REGISTRY` (required) is the Docker registry that
     8     `make push-images` pushes the `tel2` and `telepresence` image to.
     9     For most developers, the easiest thing is to set it to
    10     `docker.io/USERNAME`.
    11  
    12   - `TELEPRESENCE_VERSION` (optional) is the "vSEMVER" string to
    13     compile-in to the binary and Docker image, if set.  Otherwise,
    14     `make` will automatically set this based on the current Git commit
    15     and the current time.
    16  
    17   - `DTEST_KUBECONFIG` (optional) is the cluster that is used by tests,
    18     if set.  Otherwise the tests will automatically use a K3s cluster
    19     running locally in Docker.  It is not normally necessary to set
    20     this, but it is useful to set it in order to test against different
    21     Kubernetes versions/configurations than what
    22     https://github.com/datawire/dtest uses.
    23  
    24   - `DTEST_REGISTRY` (optional) is the Docker registry that images are
    25     pushed to by the tests, if set.  Otherwise, the tests will
    26     automatically use a registry running locally in Docker
    27     ("localhost:5000").  The tests will push images named `tel2` with
    28     various version tags.  It is not necessary to set this unless you
    29     have set `DTEST_KUBECONFIG`.
    30  
    31     If `DTEST_KUBECONFIG` is pointing to a pre-existing cluster, and you
    32     would like the `DTEST_REGISTRY` to point to a private registry that is
    33     hosted in that cluster, then you can use `make private-registry`. It
    34     will deploy a registry and set it up so that it is reachable at
    35     `localhost:5000`, both from the cluster and from the local workstation.
    36  
    37   - `DEV_TELEPRESENCE_VERSION` (optional) if set to a version such as
    38     `v2.12.1-alpha.0`, the integration tests will assume that this version
    39     is pre-built and available, both as a CLI client (accessible from the
    40     current runtime path), and also pre-pushed into a pre-existing cluster
    41     accessible from `DTEST_KUBECONFIG`. In other words, if this is set, no
    42     no binaries will be built or pushed so the development + test cycle
    43     can be quit rapid.
    44  
    45   - `DEV_CLIENT_IMAGE` (optional) can be set to the fully qualified name of
    46     an alternative image to use for the docker image used for the containerized
    47     daemon when running in docker mode.
    48  
    49   - `DEV_MANAGER_IMAGE` (optional) can be set to the fully qualified name of
    50     an alternative image to use for the traffic manager.
    51  
    52   - `DEV_AGENT_IMAGE` (optional) can be set to the fully qualified name of
    53     an alternative image to use for the traffic agent.
    54  
    55   - `DEV_USERD_PROFILING_PORT` and `DEV_ROOTD_PROFILING_PORT` (optional) if
    56     set, will cause the `telepresence connect` calls in the integration tests
    57     to start daemons where pprof is enabled (see
    58     [Profiling the daemons](#profiling_the_daemons) below).
    59  
    60  The above environment can optionally be provided in a `itest.yml` file
    61  that is placed adjacent to the normal `config.yml` file used to configure
    62  Telepresence. The `itest.yml` currently has only one single entry, the
    63  `Env` which is a map. It can look something like this:
    64  
    65  ```yaml
    66  Env:
    67    DEV_TELEPRESENCE_VERSION: v2.12.1-alpha.0
    68    DTEST_KUBECONFIG: /home/thhal/.kube/testconfig
    69  ```
    70  
    71  The output of `make help` has a bit more information.
    72  
    73  ### Running integration tests
    74  
    75  Integration tests can be run using `go test ./integration_test/...`. For
    76  individual tests, use the `-m.testify=<pattern>` flag. Verbose output using
    77  the `-v` flag is also recommended, because the tests are built with human
    78  readable output in mind and timestamps can be compared to timestamps found
    79  in the telepresence logs.
    80  
    81  Example of running one test with existing cluster and registry:
    82  ```
    83  make private-registry
    84  export DTEST_KUBECONFIG=<your kubeconfig>
    85  export DTEST_REGISTRY=localhost:5000
    86  go test ./integration_test/... -v -testify.m=Test_InterceptDetailedOutput
    87  ```
    88  
    89  If you run these tests on a Mac, localhost won't work. Please use the docker hub, or this value for the registry:
    90  
    91  ```cli
    92  export DTEST_REGISTRY=host.docker.internal:5000
    93  ```
    94  
    95  You must also set this in your docker engine settings: 
    96  
    97  ```json
    98  {
    99     "insecure-registries": [
   100       "host.docker.internal:5000"
   101     ]
   102  }
   103  ```
   104  
   105  The test takes about a minute to complete when using an existing cluster
   106  and a private registry created by `make private-registry`. During that time
   107  it:
   108  - builds the traffic-manager image
   109  - pushes the image to the registry
   110  - builds the client binary
   111  - creates two namespaces for the test
   112  - performs a helm install of a namespace scoped traffic-manager
   113  - runs the test
   114  - uninstalls the traffic-manager
   115  - deletes the namespaces
   116  
   117  The first two can be omitted (and are omitted when the tests run
   118  from CI) by building the binary using `make build`.
   119  Example of running test with existing client and traffic-mananager:
   120  
   121  ```
   122  make private-registry
   123  export TELEPRESENCE_VERSION=v2.12.1-alpha.0
   124  export TELEPRESENCE_REGISTRY=localhost:5000
   125  make build
   126  make push-images
   127  export DTEST_KUBECONFIG=<your kubeconfig>
   128  export DTEST_REGISTRY=$TELEPRESENCE_REGISTRY
   129  export DEV_TELEPRESENCE_VERSION=$TELEPRESENCE_VERSION
   130  
   131  # Run any number of indivitual test with this setup
   132  go test ./integration_test/... -v -testify.m=Test_InterceptDetailedOutput
   133  ```
   134  
   135  The `DEV_TELEPRESENCE_VERSION` tells the integration test that a client and
   136  a traffic-manager of that version has been prebuilt and pushed. This usually
   137  shortens the time for the test with about 20 seconds.
   138  
   139  ### Runtime environment
   140  
   141   - The main thing is that in your `~/.config/telepresence/config.yml`
   142     (`~/Library/Application Support/telepresence/config.yml` on macOS)
   143     file you set `images.registry` to match the `TELEPRESENCE_REGISTRY`
   144     environment variable. See
   145     https://www.getambassador.io/docs/telepresence/latest/reference/config/ 
   146     for more information.
   147  
   148   - `TELEPRESENCE_VERSION` is is the "vSEMVER" string used by the
   149     `telepresence` binary *if* one was not compiled in (for example, if
   150     you're running it with `go run ./cmd/telepresence` rather than
   151     having built it with `make build`).
   152  
   153   - `TELEPRESENCE_AGENT_IMAGE` is is the "name:vSEMVER" string used when
   154     the telepresence auto-installs the traffic-manager unless the config.yml
   155     overrides it by defining `images.agentImage`.
   156  
   157   - You will need have a `~/.kube/config` file (or set `KUBECONFIG` to
   158     point to a different file) file in order to connect to a cluster;
   159     same as any other Kubernetes tool.
   160  
   161   - You will need to have [mockgen](https://github.com/golang/mock) installed
   162     to generate new or updated testing mocks for interfaces.
   163  
   164  ## Blocking Ambassador telemetry
   165  Telemetry to Ambassador Labs can be disabled by having your os resolve the `metriton.datawire.io` to `127.0.0.1`.
   166  
   167  ### Windows
   168  `echo "127.0.0.1 metriton.datawire.io" >> c:\windows\system32\drivers\etc\hosts`
   169  
   170  ### Linux and MacOS
   171  `echo "127.0.0.1 metriton.datawire.io" | sudo tee -a /etc/hosts`
   172  
   173  ## Build the binary, push the image
   174  
   175  The easiest thing to do to get going:
   176  
   177  ```console
   178  $ TELEPRESENCE_REGISTRY=docker.io/thhal make build push-images # use .\build-aux\winmake.bat build on windows
   179  [make] TELEPRESENCE_VERSION=v2.12.1-19-g37085c2d7-1655891839
   180  ... # Lots of output
   181  2.12.1-19-g37085c2d7-1655891839: digest: sha256:40fe852f8d8026a89f196293f37ae8c462c765c85572150d26263d78c43cdd4b size: 1157
   182  ```
   183  
   184  This has 3 primary outputs:
   185   1. The `./build-output/bin/telepresence` executable binary
   186   2. The `${TELEPRESENCE_REGISTRY}/tel2` Docker image
   187   3. The `${TELEPRESENCE_REGISTRY}/telepresence` Docker image
   188  
   189  It essentially does 4 separate tasks:
   190   1. `make build` to build the `./build-output/bin/telepresence`
   191      executable binary
   192   2. `make tel2-image` to build the `${TELEPRESENCE_REGISTRY}/tel2` Docker
   193      image.
   194   3. `make client-image` to build the `${TELEPRESENCE_REGISTRY}/telepresence` Docker
   195     image.
   196   4. `make push-images` to push the `${TELEPRESENCE_REGISTRY}/tel2` and `${TELEPRESENCE_REGISTRY}/telepresence`
   197      Docker images.
   198  
   199  You can run any of those tasks separately, but be warned: The
   200  `TELEPRESENCE_VERSION` for all 4 needs to agree, and `make` includes a
   201  timestamp in the default `TELEPRESENCE_VERSION`; if you run the tasks
   202  separately you will need to explicitly set the `TELEPRESENCE_VERSION`
   203  environment variable so that they all agree.
   204  
   205  When working on just the command-line binary, it is often useful to
   206  run it simply using `go run ./cmd/telepresence` rather than compiling
   207  it first; but be warned: When run this way it won't know its own
   208  version number (`telepresence version` will report "v0.0.0-devel")
   209  unless you set the `TELEPRESENCE_VERSION` environment variable, you
   210  will want to set it to the version of a previously-pushed Docker
   211  image.
   212  
   213  You may think that the initial suggestion of running `make build
   214  push-images` all the time (so that every build gets new matching
   215  version numbers) would be terribly slow.  However, This is not as slow
   216  as you might think; both `go` and `docker` are very good about reusing
   217  existing builds and avoiding unnecessary work.
   218  
   219  ## Run the tests
   220  
   221  Running the tests does *not* require having previously built or pushed
   222  anything.
   223  
   224  The tests make use of `sudo`; it is useful to get in the habit of
   225  running a no-op `sudo` command to pre-emptively prompt for your
   226  password to avoid having to notice when the prompt appears in the test
   227  output.
   228  
   229  ```console
   230  $ sudo id
   231  [sudo] password for lukeshu:
   232  uid=0(root) gid=0(root) groups=0(root)
   233  
   234  $ make check-unit
   235  [make] TELEPRESENCE_VERSION=v2.6.7-20-g9de10e316-1655892249
   236  ...
   237  ```
   238  
   239  The first time you run the tests, you should use `make check`, to get
   240  `make` to automatically create the requisite `heml` tool
   241  binaries.  However, after that initial run, you can instead use
   242  `gotestsum` or `go test` if you prefer.
   243  
   244  ### Test metric collection
   245  
   246  **When running in CI,** `make check-unit` and `make check-integration` will report the result of test
   247  runs to metriton, Ambassador Labs' metrics store. These reports include test name, running time, and
   248  result. They are reported by the tool at `tools/src/test-report`. This `test-report` tool will also
   249  visually modify test output; this happens even running locally, since the json output to go test
   250  is piped to the tool anyway:
   251  
   252  ```console
   253  $ make check-unit
   254  ```
   255  
   256  ## Building for Release
   257  
   258  See https://www.notion.so/datawire/To-Release-Telepresence-2-x-x-2752ef26968444b99d807979cde06f2f
   259  
   260  ## Updating license documentation
   261  
   262  Run `make generate` and commit changes to `DEPENDENCY_LICENSES.md` and `DEPENDENCIES.md`
   263  
   264  ## Developing on Windows
   265  
   266  ### Building on Windows
   267  
   268  We do not currently support using `make` directly to build on Windows. Instead, use `build-aux\winmake.bat` and pass it the same parameters
   269  you would pass to make. `winmake.bat` will run `make` from inside a Docker container, with appropriate parameters to build windows binaries.
   270  
   271  ## Debugging and Troubleshooting
   272  
   273  ### Log output
   274  
   275  There are two logs:
   276   - the `connector.log` log file which contains output from the
   277     background-daemon parts of Telepresence that run as your regular
   278     user: the interaction with the traffic-manager and the cluster
   279     (traffic-manager and traffic-agent installs, intercepts, port
   280     forwards, etc.), and
   281   - the `daemon.log` log file which contains output from the parts of
   282     telepresence that run as the "root" administrator user: the
   283     networking changes and services that happen on your workstation.
   284  
   285  The location of both logs is:
   286  
   287   - on macOS: `~/Library/Logs/telepresence/`
   288   - on GNU/Linux: `~/.cache/telepresence/logs/`
   289   - on Windows `"%USERPROFILE%\AppData\Local\logs"`
   290  
   291  The logs are rotating and a new log is created every time Telepresence
   292  creates a new connection to the cluster, e.g. on `telepresence
   293  connect` after a `telepresence quit` that terminated the last session.
   294  
   295  #### Watching the logs
   296  
   297  A convenient way to watch rotating logs is to use `tail -F
   298  <filename>`.  It will automatically and seamlessly follow the
   299  rotation.
   300  
   301  #### Debugging early-initialization errors
   302  
   303  If there's an error from the connector or daemon during early
   304  initialization, it might quit before the logfiles are set up.  Perhaps
   305  the problem is even with setting up the logfile itself.
   306  
   307  You can run the `connector-foreground` or `daemon-foreground` commands
   308  directly, to see what they spit out on stderr before dying:
   309  
   310  ```console
   311  $ telepresence connector-foreground    # or daemon-foreground
   312  ```
   313  
   314  If stdout is a TTY device, they don't set up logfiles and instead log
   315  to stderr.  In order to debug the logfile setup, simply pipe the
   316  command to `cat` to trigger the usual logfile setup:
   317  
   318  ```console
   319  $ telepresence connector-foreground | cat
   320  ```
   321  
   322  ### Profiling the daemons
   323  
   324  The daemons can be profiled using [pprof](https://pkg.go.dev/net/http/pprof).
   325  The profiling is initialized using the following flags:
   326  
   327  ```console
   328  $ telepresence quit -s
   329  $ telepresence connect --userd-profiling-port 6060 --rootd-profiling-port 6061
   330  ```
   331  
   332  If a daemon is started with pprof, then the goroutine stacks and much other
   333  info can be found by connecting your browser to http://localhost:6060/debug/pprof/
   334  (swap 6060 for whatever port you used with the flags)
   335  
   336  #### Dumping the goroutine stacks
   337  
   338  A dump will be produced in the respective logs for the daemon simply by killing it
   339  with a SIGQUIT signal. On Windows however, using profiling is the only option.
   340  
   341  ### RBAC issues
   342  
   343  If you are debugging or working on RBAC-related feature work with
   344  Telepresence, it can be helpful to have a user with limited RBAC
   345  privileges/roles.  There are many ways you can do this, but the way we
   346  do it in our tests is like so:
   347  
   348  ```console
   349  $ kubectl apply -f k8s/client_rbac.yaml
   350  serviceaccount/telepresence-test-developer created
   351  clusterrole.rbac.authorization.k8s.io/telepresence-role created
   352  clusterrolebinding.rbac.authorization.k8s.io/telepresence-clusterrolebinding created
   353  
   354  $ kubectl get sa telepresence-test-developer -o "jsonpath={.secrets[0].name}"
   355  telepresence-test-developer-token-<hash>
   356  
   357  $ kubectl get secret telepresence-test-developer-token-<hash> -o "jsonpath={.data.token}" > b64_token
   358  $ cat b64_token | base64 --decode
   359  <plaintext token>
   360  
   361  $ kubectl config set-credentials telepresence-test-developer --token <plaintext token>
   362  ```
   363  
   364  This creates a ServiceAccount, ClusterRole, and ClusterRoleBinding
   365  which can be used with kubectl (`kubectl config use-context
   366  telepresence-test-developer`) to work in a RBAC-restricted
   367  environment.
   368  
   369  ### Errors from `make generate`
   370  
   371  #### Missing go.sum entries
   372  If you get an error like this:
   373  
   374  ```
   375  cd tools/src/go-mkopensource && GOOS= GOARCH= go build -o /home/andres/source/production/telepresence/tools/bin/go-mkopensource $(sed -En 's,^import "(.*)".*,\1,p' pin.go)
   376  missing go.sum entry for module providing package github.com/datawire/go-mkopensource; to add:
   377  	go mod download github.com/datawire/go-mkopensource
   378  ```
   379  
   380  Add the missing entries by going to the folder that caused the failure (in this case it's
   381  /home/andres/source/production/telepresence/tools/bin/go-mkopensource) and run the command provided by go:
   382  
   383  ```
   384  go mod download github.com/datawire/go-mkopensource
   385  ```