github.com/castai/kvisor@v1.7.1-0.20240516114728-b3572a2607b5/DEVELOPMENT.md (about) 1 ## Develop ebpf locally 2 3 First setup lima. It uses nix and devbox to manage most of the packages. 4 ```sh 5 ln -s ~/c/kvisor /tmp/kvisor 6 limactl start ./tools/lima-ebpf.yaml 7 limactl shell lima-ebpf 8 cd /tmp/kvisor 9 devbox install 10 ``` 11 12 Now inside vm you can run ebpftracer 13 ```sh 14 cd /tmp/kvisor/pkg/ebpftracer 15 go generate ./... 16 go test -v . -run=TestTracer 17 ``` 18 19 Trigger ebpf events 20 21 ```sh 22 limactl shell lima-ebpf 23 curl google.com 24 ``` 25 26 ## Run E2E tests locally 27 28 You can run tests on your local kind cluster. 29 30 ``` 31 KIND_CONTEXT=tilt IMAGE_TAG=local ./e2e/run.sh 32 ``` 33 34 ## Colima 35 Colima is a wrapper around Lima for macos and can be used as docker desktop for mac replacement. 36 37 ``` 38 colima start --cpu 2 --memory 4 --disk 100 -t vz --mount-type virtiofs 39 ``` 40 41 Lima is lower level VM which allows to create customizable templates. This is recommended if you need to work with ebpf code. 42 ### Lima 43 ``` 44 ln -s ~/c/kvisor /tmp/kvisor 45 limactl start ./tools/lima-ebpf.yaml 46 ``` 47 48 ## TILT local development 49 50 ### 1. Install docker, you need 6+ kernel with btf support. 51 52 ### 2. Setup local k8s. 53 54 You can use kind or any other local k8s cluster. Kind is recommended. 55 56 ``` 57 kind create cluster 58 kubectl cluster-info --context kind-kind 59 ``` 60 61 ### 3. Start tilt 62 ``` 63 tilt up 64 ``` 65 66 ### 4. Port-forward server api 67 ``` 68 kubectl port-forward svc/kvisord-server 6060:80 -n kvisord 69 ``` 70 71 ### 5. Run dashboard UI locally 72 73 ``` 74 cd ui 75 npm install 76 PUBLIC_API_BASE_URL=http://localhost:6060 npm run dev 77 ``` 78 79 80 ## GKE Cluster 81 82 Create cluster 83 ```sh 84 export GCP_PROJECT="my-project" 85 export CLUSTER_NAME="my-cluster-name" 86 gcloud beta container --project $GCP_PROJECT \ 87 clusters create $CLUSTER_NAME \ 88 --zone "us-central1-c" \ 89 --cluster-version "1.25.8-gke.500" \ 90 --machine-type "e2-small" \ 91 --disk-type "pd-balanced" \ 92 --disk-size "100" \ 93 --num-nodes "2" \ 94 --node-locations "us-central1-c" 95 ``` 96 97 Connect 98 ```sh 99 gcloud container clusters get-credentials $CLUSTER_NAME --zone us-central1-c --project $GCP_PROJECT 100 ``` 101 102 ## eBPF 103 104 ### Mount tracepoint events 105 106 Mount tracepoints if they are not mounted yet. 107 ``` 108 mount -t debugfs none /sys/kernel/debug 109 ls /sys/kernel/debug/tracing/events 110 ``` 111 112 113 ### Print logs in ebpf 114 115 ```c 116 bpf_printk("called"); 117 ``` 118 119 ``` 120 cat /sys/kernel/debug/tracing/trace_pipe 121 ``` 122 123 124 ## Clickhouse 125 126 Clickhouse show columns data size 127 ```sql 128 select column, formatReadableSize(sum(column_bytes_on_disk)) bytes_on_disk, formatReadableSize(sum(column_data_uncompressed_bytes)) uncompressed 129 from system.parts_columns 130 where active = 1 and table like '%events%' 131 group by database,table, column 132 order by sum(column_bytes_on_disk) desc; 133 ``` 134 135 Query container resource usage stats: 136 ```sql 137 select toStartOfMinute(ts) t, 138 case when group = 'cpu' then toString(avg(value)) else formatReadableSize(avg(value)) end as val, 139 group||'_'||subgroup name from container_stats 140 where container_name='kvisor' and group!='syscall' and t > now() - interval 5 minute 141 group by t, group, name 142 order by t; 143 ``` 144 145 ## Push chart to gcp artifact registry 146 147 Create package and push chart 148 ```sh 149 helm package ./charts/kvisord 150 helm push kvisord-0.1.6.tgz oci://us-east4-docker.pkg.dev/kvisor/helm-charts 151 ``` 152 153 Test chart template 154 ```sh 155 helm template kvisord oci://us-east4-docker.pkg.dev/kvisor/helm-charts/kvisord --version 0.1.6 156 ``` 157 158 ### Public demo 159 160 Install kvisord 161 162 ```sh 163 helm upgrade --install kvisord oci://us-east4-docker.pkg.dev/kvisor/helm-charts/kvisord \ 164 --version 0.7.0 \ 165 --namespace kvisord --create-namespace \ 166 --set storage.resources.requests.cpu=2 \ 167 --set storage.resources.requests.memory=8Gi \ 168 --set agent.resources.requests.cpu=100m \ 169 --set agent.resources.requests.memory=128Mi \ 170 --set server.resources.requests.cpu=2 \ 171 --set server.resources.requests.memory=4Gi \ 172 --set server.extraArgs.events-batch-size=1000 \ 173 --set server.extraArgs.events-batch-queue-size=30000 \ 174 --set-string server.extraArgs.workload-profiles-enabled=false \ 175 ``` 176 177 Open port-forward to local dashboard 178 179 ```sh 180 kubectl port-forward svc/kvisord-server 6060:80 -n kvisord 181 ``` 182 183 Delete kvisord 184 185 ```sh 186 helm uninstall kvisord -n kvisord 187 ``` 188 189 ### Integrate with CASTAI 190 ``` 191 helm upgrade --install castai-kvisor oci://us-east4-docker.pkg.dev/kvisor/helm-charts/castai-kvisor \ 192 --version 0.12.0 \ 193 --namespace kvisord --create-namespace \ 194 --set image.tag=0f07db05bbe55f9aba04952337f7023f3a4553e5 \ 195 --set castai.clusterID=<CLUSTER-ID> \ 196 --set castai.apiKey=<API-KEY> \ 197 --set agent.enabled=true \ 198 --set controller.extraArgs.image-scan-enabled=true \ 199 --set controller.extraArgs.kube-bench-enabled=true 200 ``` 201 202 ## Misc 203 204 List available trace functions for ftrace. 205 ``` 206 cat /sys/kernel/debug/tracing/available_filter_functions | grep socket_connect 207 ``` 208 209 ## Making new release 210 211 1. Go to https://github.com/castai/kvisor/releases 212 2. Click draw new release (should open https://github.com/castai/kvisor/releases/new) 213 3. Choose tag. Add new tag. Follow semver. For fixes only bump patch version. 214 4. Click generate release notes. 215 5. Publish release. 216 217 218 ## Testing netflow 219 220 Install kvisor with netflow export to local clickhouse. 221 222 ``` 223 helm repo add castai-helm https://castai.github.io/helm-charts 224 helm repo update castai-helm 225 226 helm upgrade --install castai-kvisor castai-helm/castai-kvisor \ 227 --namespace castai-agent --create-namespace \ 228 --set castai.enabled=false \ 229 --set agent.enabled=true \ 230 --set agent.extraArgs.netflow-enabled=true \ 231 --set clickhouse.enabled=true 232 ``` 233 234 Check pods are running 235 236 ``` 237 kubectl get pods -n kvisor 238 ``` 239 240 You should see agent, clickhouse and controller pods 241 242 ``` 243 NAME READY STATUS RESTARTS AGE 244 castai-kvisor-agent-djjcq 1/1 Running 0 67s 245 castai-kvisor-clickhouse-0 2/2 Running 0 66s 246 castai-kvisor-controller-8697bbf8cd-sq6jp 1/1 Running 0 67s 247 ``` 248 249 ### Query flows 250 251 Port forward clickhouse connection 252 253 ``` 254 kubectl port-forward -n kvisor svc/castai-kvisor-clickhouse 8123 255 ``` 256 Connect to clickhouse with your favorite sql client with credentials: 257 ``` 258 Username: kvisor 259 Password: kvisor 260 Database: kvisor 261 ``` 262 263 Example query: 264 265 ```sql 266 select toStartOfInterval(start, INTERVAL 1 HOUR) AS period, 267 pod_name, 268 namespace, 269 workload_name, 270 workload_kind, 271 zone, 272 dst_pod_name, 273 dst_namespace, 274 dst_domain, 275 dst_workload_name, 276 dst_workload_kind, 277 dst_zone, 278 formatReadableSize(sum(tx_bytes)) total_egress, 279 formatReadableSize(sum(rx_bytes)) total_ingress 280 from netflows 281 group by period, 282 pod_name, 283 namespace, 284 workload_name, 285 workload_kind, 286 zone, 287 dst_pod_name, 288 dst_namespace, 289 dst_domain, 290 dst_workload_name, 291 dst_workload_kind, 292 dst_zone 293 order by period; 294 ```