github.com/Heebron/moby@v0.0.0-20221111184709-6eab4f55faf7/libnetwork/cmd/diagnostic/README.md (about) 1 --- 2 description: Learn to use the built-in network debugger to debug overlay networking problems 3 keywords: network, troubleshooting, debug 4 title: Debug overlay or swarm networking issues 5 --- 6 7 **WARNING** 8 This tool can change the internal state of the libnetwork API, be really mindful 9 on its use and read carefully the following guide. Improper use of it will damage 10 or permanently destroy the network configuration. 11 12 13 Docker CE 17.12 and higher introduce a network debugging tool designed to help 14 debug issues with overlay networks and swarm services running on Linux hosts. 15 When enabled, a network diagnostic server listens on the specified port and 16 provides diagnostic information. The network debugging tool should only be 17 started to debug specific issues, and should not be left running all the time. 18 19 Information about networks is stored in the database, which can be examined using 20 the API. Currently the database contains information about the overlay network 21 as well as the service discovery data. 22 23 The Docker API exposes endpoints to query and control the network debugging 24 tool. CLI integration is provided as a preview, but the implementation is not 25 yet considered stable and commands and options may change without notice. 26 27 The tool is available into 2 forms: 28 1) client only: dockereng/network-diagnostic:onlyclient 29 2) docker in docker version: dockereng/network-diagnostic:17.12-dind 30 The latter allows to use the tool with a cluster running an engine older than 17.12 31 32 ## Enable the diagnostic server 33 34 The tool currently only works on Docker hosts running on Linux. To enable it on a node 35 follow the step below. 36 37 1. Set the `network-diagnostic-port` to a port which is free on the Docker 38 host, in the `/etc/docker/daemon.json` configuration file. 39 40 ```json 41 “network-diagnostic-port”: <port> 42 ``` 43 44 2. Get the process ID (PID) of the `dockerd` process. It is the second field in 45 the output, and is typically a number from 2 to 6 digits long. 46 47 ```bash 48 $ ps aux |grep dockerd | grep -v grep 49 ``` 50 51 3. Reload the Docker configuration without restarting Docker, by sending the 52 `HUP` signal to the PID you found in the previous step. 53 54 ```bash 55 kill -HUP <pid-of-dockerd> 56 ``` 57 58 If systemd is used the command `systemctl reload docker` will be enough 59 60 61 A message like the following will appear in the Docker host logs: 62 63 ```none 64 Starting the diagnostic server listening on <port> for commands 65 ``` 66 67 ## Disable the diagnostic tool 68 69 Repeat these steps for each node participating in the swarm. 70 71 1. Remove the `network-diagnostic-port` key from the `/etc/docker/daemon.json` 72 configuration file. 73 74 2. Get the process ID (PID) of the `dockerd` process. It is the second field in 75 the output, and is typically a number from 2 to 6 digits long. 76 77 ```bash 78 $ ps aux |grep dockerd | grep -v grep 79 ``` 80 81 3. Reload the Docker configuration without restarting Docker, by sending the 82 `HUP` signal to the PID you found in the previous step. 83 84 ```bash 85 kill -HUP <pid-of-dockerd> 86 ``` 87 88 A message like the following will appear in the Docker host logs: 89 90 ```none 91 Disabling the diagnostic server 92 ``` 93 94 ## Access the diagnostic tool's API 95 96 The network diagnostic tool exposes its own RESTful API. To access the API, 97 send a HTTP request to the port where the tool is listening. The following 98 commands assume the tool is listening on port 2000. 99 100 Examples are not given for every endpoint. 101 102 ### Get help 103 104 ```bash 105 $ curl localhost:2000/help 106 107 OK 108 /updateentry 109 /getentry 110 /gettable 111 /leavenetwork 112 /createentry 113 /help 114 /clusterpeers 115 /ready 116 /joinnetwork 117 /deleteentry 118 /networkpeers 119 / 120 /join 121 ``` 122 123 ### Join or leave the network database cluster 124 125 ```bash 126 $ curl localhost:2000/join?members=ip1,ip2,... 127 ``` 128 129 ```bash 130 $ curl localhost:2000/leave?members=ip1,ip2,... 131 ``` 132 133 `ip1`, `ip2`, ... are the swarm node ips (usually one is enough) 134 135 ### Join or leave a network 136 137 ```bash 138 $ curl localhost:2000/joinnetwork?nid=<network id> 139 ``` 140 141 ```bash 142 $ curl localhost:2000/leavenetwork?nid=<network id> 143 ``` 144 145 `network id` can be retrieved on the manager with `docker network ls --no-trunc` and has 146 to be the full length identifier 147 148 ### List cluster peers 149 150 ```bash 151 $ curl localhost:2000/clusterpeers 152 ``` 153 154 ### List nodes connected to a given network 155 156 ```bash 157 $ curl localhost:2000/networkpeers?nid=<network id> 158 ``` 159 `network id` can be retrieved on the manager with `docker network ls --no-trunc` and has 160 to be the full length identifier 161 162 ### Dump database tables 163 164 The tables are called `endpoint_table` and `overlay_peer_table`. 165 The `overlay_peer_table` contains all the overlay forwarding information 166 The `endpoint_table` contains all the service discovery information 167 168 ```bash 169 $ curl localhost:2000/gettable?nid=<network id>&tname=<table name> 170 ``` 171 172 ### Interact with a specific database table 173 174 The tables are called `endpoint_table` and `overlay_peer_table`. 175 176 ```bash 177 $ curl localhost:2000/<method>?nid=<network id>&tname=<table name>&key=<key>[&value=<value>] 178 ``` 179 180 Note: 181 operations on tables have node ownership, this means that are going to remain persistent till 182 the node that inserted them is part of the cluster 183 184 ## Access the diagnostic tool's CLI 185 186 The CLI is provided as a preview and is not yet stable. Commands or options may 187 change at any time. 188 189 The CLI executable is called `diagnosticClient` and is made available using a 190 standalone container. 191 192 `docker run --net host dockereng/network-diagnostic:onlyclient -v -net <full network id> -t sd` 193 194 The following flags are supported: 195 196 | Flag | Description | 197 |---------------|-------------------------------------------------| 198 | -t <string> | Table one of `sd` or `overlay`. | 199 | -ip <string> | The IP address to query. Defaults to 127.0.0.1. | 200 | -net <string> | The target network ID. | 201 | -port <int> | The target port. (default port is 2000) | 202 | -a | Join/leave network | 203 | -v | Enable verbose output. | 204 205 *NOTE* 206 By default the tool won't try to join the network. This is following the intent to not change 207 the state on which the node is when the diagnostic client is run. This means that it is safe 208 to run the diagnosticClient against a running daemon because it will just dump the current state. 209 When using instead the diagnosticClient in the containerized version the flag `-a` MUST be passed 210 to avoid retrieving empty results. On the other side using the `-a` flag against a loaded daemon 211 will have the undesirable side effect to leave the network and so cutting down the data path for 212 that daemon. 213 214 ### Container version of the diagnostic tool 215 216 The CLI is provided as a container with a 17.12 engine that needs to run using privileged mode. 217 *NOTE* 218 Remember that table operations have ownership, so any `create entry` will be persistent till 219 the diagnostic container is part of the swarm. 220 221 1. Make sure that the node where the diagnostic client will run is not part of the swarm, if so do `docker swarm leave -f` 222 223 2. To run the container, use a command like the following: 224 225 ```bash 226 $ docker container run --name net-diagnostic -d --privileged --network host dockereng/network-diagnostic:17.12-dind 227 ``` 228 229 3. Connect to the container using `docker exec -it <container-ID> sh`, 230 and start the server using the following command: 231 232 ```bash 233 $ kill -HUP 1 234 ``` 235 236 4. Join the diagnostic container to the swarm, then run the diagnostic CLI within the container. 237 238 ```bash 239 $ ./diagnosticClient <flags>... 240 ``` 241 242 4. When finished debugging, leave the swarm and stop the container. 243 244 ### Examples 245 246 The following commands dump the service discovery table and verify node 247 ownership. 248 249 *NOTE* 250 Remember to use the full network ID, you can easily find that with `docker network ls --no-trunc` 251 252 **Service discovery and load balancer:** 253 254 ```bash 255 $ diagnostiClient -t sd -v -net n8a8ie6tb3wr2e260vxj8ncy4 -a 256 ``` 257 258 **Overlay network:** 259 260 ```bash 261 $ diagnostiClient -port 2001 -t overlay -v -net n8a8ie6tb3wr2e260vxj8ncy4 -a 262 ```