gvisor.dev/gvisor@v0.0.0-20240520182842-f9d4d51c7e0f/website/blog/2023-06-20-gpu-pytorch-stable-diffusion.md (about) 1 # Running Stable Diffusion on GPU with gVisor 2 3 gVisor is [starting to support GPU][gVisor GPU support] workloads. This post 4 showcases running the [Stable Diffusion] generative model from [Stability AI] to 5 generate images using a GPU from within gVisor. Both the 6 [Automatic1111 Stable Diffusion web UI][automatic1111/stable-diffusion-webui] 7 and the [PyTorch] code used by Stable Diffusion were run entirely within gVisor 8 while being able to leverage the NVIDIA GPU. 9 10 ![A sandboxed GPU](/assets/images/2023-06-20-sandboxed-gpu.png "A sandboxed GPU.") 11 <span class="attribution">**Sand**boxing a GPU. Generated with Stable Diffusion 12 v1.5.<br/>This picture gets a lot deeper once you realize that GPUs are made out 13 of sand.</span> 14 15 -------------------------------------------------------------------------------- 16 17 ## Disclaimer 18 19 As of this writing (2023-06), [gVisor's GPU support][gVisor GPU support] is not 20 generalized. Only some PyTorch workloads have been tested on NVIDIA T4, L4, 21 A100, and H100 GPUs, using the specific driver versions `525.60.13` and 22 `525.105.17`. Contributions are welcome to expand this set to support other GPUs 23 and driver versions! 24 25 Additionally, while gVisor does its best to sandbox the workload, interacting 26 with the GPU inherently requires running code on GPU hardware, where isolation 27 is enforced by the GPU driver and hardware itself rather than gVisor. More to 28 come soon on the value of the protection gVisor provides for GPU workloads. 29 30 In a few months, gVisor's GPU support will have broadened and become 31 easier-to-use, such that it will not be constrained to the specific sets of 32 versions used here. In the meantime, this blog stands as an example of what's 33 possible today with gVisor's GPU support. 34 35 ![Various space suit helmets](/assets/images/2023-06-20-spacesuit-helmets.png "Various space suit helmets."){:width="100%"} 36 <span class="attribution">**A collection of astronaut helmets in various styles**.<br/>Other than the helmet in the center, each helmet was generated using Stable Diffusion v1.5.</span> 37 38 ## Why even do this? 39 40 The recent explosion of machine learning models has led to a large number of new 41 open-source projects. Much like it is good practice to be careful about running 42 new software downloaded from the Internet, it is good practice to run new 43 open-source projects in a sandbox. For projects like the 44 [Automatic1111 Stable Diffusion web UI][automatic1111/stable-diffusion-webui], 45 which automatically download various models, components, and 46 [extensions][Stable Diffusion Web UI extensions] from external repositories as 47 the user enables them in the web UI, this principle applies all the more. 48 49 Additionally, within the machine learning space, tooling for packaging and 50 distributing models are still nascent. While some models (including Stable 51 Diffusion) are packaged using the more secure [safetensors] format, **the 52 majority of models available online today are distributed using the 53 [Pickle format], which can execute arbitrary Python code** upon deserialization. 54 As such, even when using trustworthy software, using Pickle-formatted models may 55 still be risky (**Edited 2024-04-04: 56 [this exact vulnerability vector was found in Hugging Face's Inference API](https://www.wiz.io/blog/wiz-and-hugging-face-address-risks-to-ai-infrastructure)**). 57 gVisor provides a layer of protection around this process which helps protect 58 the host machine. 59 60 Third, **machine learning applications are typically not I/O heavy**, which 61 means they tend not to experience a significant performance overhead. The 62 process of uploading code to the GPU is not a significant number of system 63 calls, and most communication to/from the GPU happens over shared memory, where 64 gVisor imposes no overhead. Therefore, the question is not so much "why should I 65 run this GPU workload in gVisor?" but rather "why not?". 66 67 ![Cool astronauts don't look at explosions](/assets/images/2023-06-20-turbo.png "Cool astronauts don't look at explosions.") 68 <span class="attribution">**Cool astronauts don't look at explosions**. 69 Generated using Stable Diffusion v1.5.</span> 70 71 Lastly, running GPU workloads in gVisor is pretty cool. 72 73 ## Setup 74 75 We use a Debian virtual machine on GCE. The machine needs to have a GPU and to 76 have sufficient RAM and disk space to handle Stable Diffusion and its large 77 model files. The following command creates a VM with 4 vCPUs, 15GiB of RAM, 64GB 78 of disk space, and an NVIDIA T4 GPU, running Debian 11 (bullseye). Since this is 79 just an experiment, the VM is set to self-destruct after 6 hours. 80 81 ```shell 82 $ gcloud compute instances create stable-diffusion-testing \ 83 --zone=us-central1-a \ 84 --machine-type=n1-standard-4 \ 85 --max-run-duration=6h \ 86 --instance-termination-action=DELETE \ 87 --maintenance-policy TERMINATE \ 88 --accelerator=count=1,type=nvidia-tesla-t4 \ 89 --create-disk=auto-delete=yes,boot=yes,device-name=stable-diffusion-testing,image=projects/debian-cloud/global/images/debian-11-bullseye-v20230509,mode=rw,size=64 90 $ gcloud compute ssh --zone=us-central1-a stable-diffusion-testing 91 ``` 92 93 All further commands in this post are performed while SSH'd into the VM. We 94 first need to install the specific NVIDIA driver version that gVisor is 95 currently compatible with. 96 97 ```shell 98 $ sudo apt-get update && sudo apt-get -y upgrade 99 $ sudo apt-get install -y build-essential linux-headers-$(uname -r) 100 $ DRIVER_VERSION=525.60.13 101 $ curl -fSsl -O "https://us.download.nvidia.com/tesla/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run" 102 $ sudo sh NVIDIA-Linux-x86_64-$DRIVER_VERSION.run 103 ``` 104 105 <!-- 106 The above in a single live, for convenience: 107 DRIVER_VERSION=525.60.13; sudo apt-get update && sudo apt-get -y upgrade && sudo apt-get install -y build-essential linux-headers-$(uname -r) && curl -fSsl -O "https://us.download.nvidia.com/tesla/$DRIVER_VERSION/NVIDIA-Linux-x86_64-$DRIVER_VERSION.run" && sudo sh NVIDIA-Linux-x86_64-$DRIVER_VERSION.run 108 --> 109 110 Next, we install Docker, per [its instructions][Docker installation on Debian]. 111 112 ```shell 113 $ sudo apt-get install -y ca-certificates curl gnupg 114 $ sudo install -m 0755 -d /etc/apt/keyrings 115 $ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor --batch --yes -o /etc/apt/keyrings/docker.gpg 116 $ sudo chmod a+r /etc/apt/keyrings/docker.gpg 117 $ echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null 118 $ sudo apt-get update && sudo apt-get install -y docker-ce docker-ce-cli 119 ``` 120 121 <!-- 122 The above in a single live, for convenience: 123 sudo apt-get install -y ca-certificates curl gnupg && sudo install -m 0755 -d /etc/apt/keyrings && curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor --batch --yes -o /etc/apt/keyrings/docker.gpg && sudo chmod a+r /etc/apt/keyrings/docker.gpg && echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null && sudo apt-get update && sudo apt-get install -y docker-ce docker-ce-cli 124 --> 125 126 We will also need the [NVIDIA container toolkit], which enables use of GPUs with 127 Docker. Per its 128 [installation instructions][NVIDIA container toolkit installation]: 129 130 ```shell 131 $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list 132 $ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit 133 ``` 134 135 Of course, we also need to [install gVisor][gVisor setup] itself. 136 137 ```shell 138 $ sudo apt-get install -y apt-transport-https ca-certificates curl gnupg 139 $ curl -fsSL https://gvisor.dev/archive.key | sudo gpg --dearmor -o /usr/share/keyrings/gvisor-archive-keyring.gpg 140 $ echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/gvisor-archive-keyring.gpg] https://storage.googleapis.com/gvisor/releases release main" | sudo tee /etc/apt/sources.list.d/gvisor.list > /dev/null 141 $ sudo apt-get update && sudo apt-get install -y runsc 142 143 # As gVisor does not yet enable GPU support by default, we need to set the flags 144 # that will enable it: 145 $ sudo runsc install -- --nvproxy=true --nvproxy-docker=true 146 147 $ sudo systemctl restart docker 148 ``` 149 150 Now, let's make sure everything works by running commands that involve more and 151 more of what we just set up. 152 153 ```shell 154 # Check that the NVIDIA drivers are installed, with the right version, and with 155 # a supported GPU attached 156 $ sudo nvidia-smi -L 157 GPU 0: Tesla T4 (UUID: GPU-6a96a2af-2271-5627-34c5-91dcb4f408aa) 158 $ sudo cat /proc/driver/nvidia/version 159 NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.60.13 Wed Nov 30 06:39:21 UTC 2022 160 161 # Check that Docker works. 162 $ sudo docker version 163 # [...] 164 Server: Docker Engine - Community 165 Engine: 166 Version: 24.0.2 167 # [...] 168 169 # Check that gVisor works. 170 $ sudo docker run --rm --runtime=runsc debian:latest dmesg | head -1 171 [ 0.000000] Starting gVisor... 172 173 # Check that Docker GPU support (without gVisor) works. 174 $ sudo docker run --rm --gpus=all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi -L 175 GPU 0: Tesla T4 (UUID: GPU-6a96a2af-2271-5627-34c5-91dcb4f408aa) 176 177 # Check that gVisor works with the GPU. 178 $ sudo docker run --rm --runtime=runsc --gpus=all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi -L 179 GPU 0: Tesla T4 (UUID: GPU-6a96a2af-2271-5627-34c5-91dcb4f408aa) 180 ``` 181 182 We're all set! Now we can actually get Stable Diffusion running. 183 184 We used the following `Dockerfile` to run Stable Diffusion and its web UI within 185 a GPU-enabled Docker container. 186 187 ```dockerfile 188 FROM python:3.10 189 190 # Set of dependencies that are needed to make this work. 191 RUN apt-get update && apt-get install -y git wget build-essential \ 192 nghttp2 libnghttp2-dev libssl-dev ffmpeg libsm6 libxext6 193 # Clone the project at the revision used for this test. 194 RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git && \ 195 cd /stable-diffusion-webui && \ 196 git checkout baf6946e06249c5af9851c60171692c44ef633e0 197 # We don't want the build step to start the server. 198 RUN sed -i '/start()/d' /stable-diffusion-webui/launch.py 199 # Install some pip packages. 200 # Note that this command will run as part of the Docker build process, 201 # which is *not* sandboxed by gVisor. 202 RUN cd /stable-diffusion-webui && COMMANDLINE_ARGS=--skip-torch-cuda-test python launch.py 203 WORKDIR /stable-diffusion-webui 204 # This causes the web UI to use the Gradio service to create a public URL. 205 # Do not use this if you plan on leaving the container running long-term. 206 ENV COMMANDLINE_ARGS=--share 207 # Start the webui app. 208 CMD ["python", "webui.py"] 209 ``` 210 211 We build the image and create a container with it using the `docker` 212 command-line. 213 214 ```shell 215 $ cat > Dockerfile 216 (... Paste the above contents...) 217 ^D 218 $ sudo docker build --tag=sdui . 219 ``` 220 221 Finally, we can start the Stable Diffusion web UI. Note that it will take a long 222 time to start, as it has to download all the models from the Internet. To keep 223 this post simple, we didn't set up any kind of volume that would enable data 224 persistence, so it will do this every time the container starts. 225 226 ```shell 227 $ sudo docker run --runtime=runsc --gpus=all --name=sdui --detach sdui 228 229 # Follow the logs: 230 $ sudo docker logs -f sdui 231 # [...] 232 Calculating sha256 for /stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors: Running on local URL: http://127.0.0.1:7860 233 Running on public URL: https://4446d982b4129a66d7.gradio.live 234 235 This share link expires in 72 hours. 236 # [...] 237 ``` 238 239 We're all set! Now we can browse to the Gradio URL shown in the logs and start 240 generating pictures, all within the secure confines of gVisor. 241 242 ![Stable Diffusion Web UI](/assets/images/2023-06-20-stable-diffusion-web-ui.png "Stable Diffusion UI."){:width="100%"} 243 <span class="attribution">**Stable Diffusion Web UI screenshot.** Inner image 244 generated with Stable Diffusion v1.5.</span> 245 246 Happy sandboxing! 247 248 ![Astronaut showing thumbs up](/assets/images/2023-06-20-astronaut-thumbs-up.png "Astronaut showing thumbs up.") 249 <span class="attribution">**Happy sandboxing!** Generated with Stable Diffusion 250 v1.5.</span> 251 252 [gVisor GPU support]: https://github.com/google/gvisor/blob/master/g3doc/proposals/nvidia_driver_proxy.md 253 [Stable Diffusion]: https://stability.ai/blog/stable-diffusion-public-release 254 [Stability AI]: https://stability.ai/ 255 [automatic1111/stable-diffusion-webui]: https://github.com/AUTOMATIC1111/stable-diffusion-webui 256 [Stable Diffusion Web UI extensions]: https://github.com/AUTOMATIC1111/stable-diffusion-webui-extensions/blob/master/index.json 257 [PyTorch]: https://pytorch.org/ 258 [safetensors]: https://github.com/huggingface/safetensors 259 [Pickle format]: https://www.splunk.com/en_us/blog/security/paws-in-the-pickle-jar-risk-vulnerability-in-the-model-sharing-ecosystem.html 260 [Docker installation on Debian]: https://docs.docker.com/engine/install/debian/ 261 [NVIDIA container toolkit]: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html 262 [NVIDIA container toolkit installation]: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html 263 [gVisor setup]: https://gvisor.dev/docs/user_guide/install/