github.com/Heebron/moby@v0.0.0-20221111184709-6eab4f55faf7/libnetwork/cmd/diagnostic/README.md (about)

     1  ---
     2  description: Learn to use the built-in network debugger to debug overlay networking problems
     3  keywords: network, troubleshooting, debug
     4  title: Debug overlay or swarm networking issues
     5  ---
     6  
     7  **WARNING**
     8  This tool can change the internal state of the libnetwork API, be really mindful
     9  on its use and read carefully the following guide. Improper use of it will damage
    10  or permanently destroy the network configuration.
    11  
    12  
    13  Docker CE 17.12 and higher introduce a network debugging tool designed to help
    14  debug issues with overlay networks and swarm services running on Linux hosts.
    15  When enabled, a network diagnostic server listens on the specified port and
    16  provides diagnostic information. The network debugging tool should only be
    17  started to debug specific issues, and should not be left running all the time.
    18  
    19  Information about networks is stored in the database, which can be examined using
    20  the API. Currently the database contains information about the overlay network
    21  as well as the service discovery data.
    22  
    23  The Docker API exposes endpoints to query and control the network debugging
    24  tool. CLI integration is provided as a preview, but the implementation is not
    25  yet considered stable and commands and options may change without notice.
    26  
    27  The tool is available into 2 forms:
    28  1) client only: dockereng/network-diagnostic:onlyclient
    29  2) docker in docker version: dockereng/network-diagnostic:17.12-dind
    30  The latter allows to use the tool with a cluster running an engine older than 17.12
    31  
    32  ## Enable the diagnostic server
    33  
    34  The tool currently only works on Docker hosts running on Linux. To enable it on a node
    35  follow the step below.
    36  
    37  1.  Set the `network-diagnostic-port` to a port which is free on the Docker
    38      host, in the `/etc/docker/daemon.json` configuration file.
    39  
    40      ```json
    41      “network-diagnostic-port”: <port>
    42      ```
    43  
    44  2.  Get the process ID (PID) of the `dockerd` process. It is the second field in
    45      the output, and is typically a number from 2 to 6 digits long.
    46  
    47      ```bash
    48      $ ps aux |grep dockerd | grep -v grep
    49      ```
    50  
    51  3.  Reload the Docker configuration without restarting Docker, by sending the
    52      `HUP` signal to the PID you found in the previous step.
    53  
    54      ```bash
    55      kill -HUP <pid-of-dockerd>
    56      ```
    57  
    58  If systemd is used the command `systemctl reload docker` will be enough
    59  
    60  
    61  A message like the following will appear in the Docker host logs:
    62  
    63  ```none
    64  Starting the diagnostic server listening on <port> for commands
    65  ```
    66  
    67  ## Disable the diagnostic tool
    68  
    69  Repeat these steps for each node participating in the swarm.
    70  
    71  1.  Remove the `network-diagnostic-port` key from the `/etc/docker/daemon.json`
    72      configuration file.
    73  
    74  2.  Get the process ID (PID) of the `dockerd` process. It is the second field in
    75      the output, and is typically a number from 2 to 6 digits long.
    76  
    77      ```bash
    78      $ ps aux |grep dockerd | grep -v grep
    79      ```
    80  
    81  3.  Reload the Docker configuration without restarting Docker, by sending the
    82      `HUP` signal to the PID you found in the previous step.
    83  
    84      ```bash
    85      kill -HUP <pid-of-dockerd>
    86      ```
    87  
    88  A message like the following will appear in the Docker host logs:
    89  
    90  ```none
    91  Disabling the diagnostic server
    92  ```
    93  
    94  ## Access the diagnostic tool's API
    95  
    96  The network diagnostic tool exposes its own RESTful API. To access the API,
    97  send a HTTP request to the port where the tool is listening. The following
    98  commands assume the tool is listening on port 2000.
    99  
   100  Examples are not given for every endpoint.
   101  
   102  ### Get help
   103  
   104  ```bash
   105  $ curl localhost:2000/help
   106  
   107  OK
   108  /updateentry
   109  /getentry
   110  /gettable
   111  /leavenetwork
   112  /createentry
   113  /help
   114  /clusterpeers
   115  /ready
   116  /joinnetwork
   117  /deleteentry
   118  /networkpeers
   119  /
   120  /join
   121  ```
   122  
   123  ### Join or leave the network database cluster
   124  
   125  ```bash
   126  $ curl localhost:2000/join?members=ip1,ip2,...
   127  ```
   128  
   129  ```bash
   130  $ curl localhost:2000/leave?members=ip1,ip2,...
   131  ```
   132  
   133  `ip1`, `ip2`, ... are the swarm node ips (usually one is enough)
   134  
   135  ### Join or leave a network
   136  
   137  ```bash
   138  $ curl localhost:2000/joinnetwork?nid=<network id>
   139  ```
   140  
   141  ```bash
   142  $ curl localhost:2000/leavenetwork?nid=<network id>
   143  ```
   144  
   145  `network id` can be retrieved on the manager with `docker network ls --no-trunc` and has
   146  to be the full length identifier
   147  
   148  ### List cluster peers
   149  
   150  ```bash
   151  $ curl localhost:2000/clusterpeers
   152  ```
   153  
   154  ### List nodes connected to a given network
   155  
   156  ```bash
   157  $ curl localhost:2000/networkpeers?nid=<network id>
   158  ```
   159  `network id` can be retrieved on the manager with `docker network ls --no-trunc` and has
   160  to be the full length identifier
   161  
   162  ### Dump database tables
   163  
   164  The tables are called `endpoint_table` and `overlay_peer_table`.
   165  The `overlay_peer_table` contains all the overlay forwarding information
   166  The `endpoint_table` contains all the service discovery information
   167  
   168  ```bash
   169  $ curl localhost:2000/gettable?nid=<network id>&tname=<table name>
   170  ```
   171  
   172  ### Interact with a specific database table
   173  
   174  The tables are called `endpoint_table` and `overlay_peer_table`.
   175  
   176  ```bash
   177  $ curl localhost:2000/<method>?nid=<network id>&tname=<table name>&key=<key>[&value=<value>]
   178  ```
   179  
   180  Note:
   181  operations on tables have node ownership, this means that are going to remain persistent till
   182  the node that inserted them is part of the cluster
   183  
   184  ## Access the diagnostic tool's CLI
   185  
   186  The CLI is provided as a preview and is not yet stable. Commands or options may
   187  change at any time.
   188  
   189  The CLI executable is called `diagnosticClient` and is made available using a
   190  standalone container.
   191  
   192  `docker run --net host dockereng/network-diagnostic:onlyclient -v -net <full network id> -t sd`
   193  
   194  The following flags are supported:
   195  
   196  | Flag          | Description                                     |
   197  |---------------|-------------------------------------------------|
   198  | -t <string>   | Table one of `sd` or `overlay`.                 |
   199  | -ip <string>  | The IP address to query. Defaults to 127.0.0.1. |
   200  | -net <string> | The target network ID.                          |
   201  | -port <int>   | The target port. (default port is 2000)         |
   202  | -a            | Join/leave network                              |
   203  | -v            | Enable verbose output.                          |
   204  
   205  *NOTE*
   206  By default the tool won't try to join the network. This is following the intent to not change
   207  the state on which the node is when the diagnostic client is run. This means that it is safe
   208  to run the diagnosticClient against a running daemon because it will just dump the current state.
   209  When using instead the diagnosticClient in the containerized version the flag `-a` MUST be passed
   210  to avoid retrieving empty results. On the other side using the `-a` flag against a loaded daemon
   211  will have the undesirable side effect to leave the network and so cutting down the data path for
   212  that daemon.
   213  
   214  ### Container version of the diagnostic tool
   215  
   216  The CLI is provided as a container with a 17.12 engine that needs to run using privileged mode.
   217  *NOTE*
   218  Remember that table operations have ownership, so any `create entry` will be persistent till
   219  the diagnostic container is part of the swarm.
   220  
   221  1.  Make sure that the node where the diagnostic client will run is not part of the swarm, if so do `docker swarm leave -f`
   222  
   223  2.  To run the container, use a command like the following:
   224  
   225      ```bash
   226      $ docker container run --name net-diagnostic -d --privileged --network host dockereng/network-diagnostic:17.12-dind
   227      ```
   228  
   229  3.  Connect to the container using `docker exec -it <container-ID> sh`,
   230      and start the server using the following command:
   231  
   232      ```bash
   233      $ kill -HUP 1
   234      ```
   235  
   236  4.  Join the diagnostic container to the swarm, then run the diagnostic CLI within the container.
   237  
   238      ```bash
   239      $ ./diagnosticClient <flags>...
   240      ```
   241  
   242  4.  When finished debugging, leave the swarm and stop the container.
   243  
   244  ### Examples
   245  
   246  The following commands dump the service discovery table and verify node
   247  ownership.
   248  
   249  *NOTE*
   250  Remember to use the full network ID, you can easily find that with `docker network ls --no-trunc`
   251  
   252  **Service discovery and load balancer:**
   253  
   254  ```bash
   255  $ diagnostiClient -t sd -v -net n8a8ie6tb3wr2e260vxj8ncy4 -a
   256  ```
   257  
   258  **Overlay network:**
   259  
   260  ```bash
   261  $ diagnostiClient -port 2001 -t overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4 -a
   262  ```