github.com/castai/kvisor@v1.7.1-0.20240516114728-b3572a2607b5/DEVELOPMENT.md (about)

     1  ## Develop ebpf locally
     2  
     3  First setup lima. It uses nix and devbox to manage most of the packages.
     4  ```sh
     5  ln -s ~/c/kvisor /tmp/kvisor
     6  limactl start ./tools/lima-ebpf.yaml
     7  limactl shell lima-ebpf
     8  cd /tmp/kvisor
     9  devbox install
    10  ```
    11  
    12  Now inside vm you can run ebpftracer
    13  ```sh
    14  cd /tmp/kvisor/pkg/ebpftracer
    15  go generate ./...
    16  go test -v . -run=TestTracer
    17  ```
    18  
    19  Trigger ebpf events
    20  
    21  ```sh
    22  limactl shell lima-ebpf
    23  curl google.com
    24  ```
    25  
    26  ## Run E2E tests locally
    27  
    28  You can run tests on your local kind cluster.
    29  
    30  ```
    31  KIND_CONTEXT=tilt IMAGE_TAG=local ./e2e/run.sh
    32  ```
    33  
    34  ## Colima
    35  Colima is a wrapper around Lima for macos and can be used as docker desktop for mac replacement.
    36  
    37  ```
    38  colima start  --cpu 2 --memory 4 --disk 100 -t vz --mount-type virtiofs
    39  ```
    40  
    41  Lima is lower level VM which allows to create customizable templates. This is recommended if you need to work with ebpf code.
    42  ### Lima
    43  ```
    44  ln -s ~/c/kvisor /tmp/kvisor
    45  limactl start ./tools/lima-ebpf.yaml
    46  ```
    47  
    48  ## TILT local development
    49  
    50  ### 1. Install docker, you need 6+ kernel with btf support.
    51  
    52  ### 2. Setup local k8s.
    53  
    54  You can use kind or any other local k8s cluster. Kind is recommended.
    55  
    56  ```
    57  kind create cluster
    58  kubectl cluster-info --context kind-kind
    59  ```
    60  
    61  ### 3. Start tilt
    62  ```
    63  tilt up
    64  ```
    65  
    66  ### 4. Port-forward server api
    67  ```
    68  kubectl port-forward svc/kvisord-server 6060:80 -n kvisord
    69  ```
    70  
    71  ### 5. Run dashboard UI locally
    72  
    73  ```
    74  cd ui
    75  npm install
    76  PUBLIC_API_BASE_URL=http://localhost:6060 npm run dev
    77  ```
    78  
    79  
    80  ## GKE Cluster
    81  
    82  Create cluster
    83  ```sh
    84  export GCP_PROJECT="my-project"
    85  export CLUSTER_NAME="my-cluster-name"
    86  gcloud beta container --project $GCP_PROJECT \
    87    clusters create $CLUSTER_NAME \
    88    --zone "us-central1-c" \
    89    --cluster-version "1.25.8-gke.500" \
    90    --machine-type "e2-small" \
    91    --disk-type "pd-balanced" \
    92    --disk-size "100" \
    93    --num-nodes "2" \
    94    --node-locations "us-central1-c"
    95  ```
    96  
    97  Connect
    98  ```sh
    99  gcloud container clusters get-credentials $CLUSTER_NAME --zone us-central1-c --project $GCP_PROJECT
   100  ```
   101  
   102  ## eBPF
   103  
   104  ### Mount tracepoint events
   105  
   106  Mount tracepoints if they are not mounted yet.
   107  ```
   108  mount -t debugfs none /sys/kernel/debug
   109  ls /sys/kernel/debug/tracing/events
   110  ```
   111  
   112  
   113  ### Print logs in ebpf
   114  
   115  ```c
   116  bpf_printk("called");
   117  ```
   118  
   119  ```
   120  cat /sys/kernel/debug/tracing/trace_pipe
   121  ```
   122  
   123  
   124  ## Clickhouse
   125  
   126  Clickhouse show columns data size
   127  ```sql
   128  select column, formatReadableSize(sum(column_bytes_on_disk)) bytes_on_disk, formatReadableSize(sum(column_data_uncompressed_bytes)) uncompressed
   129  from system.parts_columns
   130  where active = 1 and table like '%events%'
   131  group by database,table, column
   132  order by sum(column_bytes_on_disk) desc;
   133  ```
   134  
   135  Query container resource usage stats:
   136  ```sql
   137  select toStartOfMinute(ts) t,
   138         case when group = 'cpu' then toString(avg(value)) else formatReadableSize(avg(value)) end as val,
   139         group||'_'||subgroup name from container_stats
   140  where container_name='kvisor' and group!='syscall' and t > now() - interval 5 minute
   141  group by t, group, name
   142  order by t;
   143  ```
   144  
   145  ## Push chart to gcp artifact registry
   146  
   147  Create package and push chart
   148  ```sh
   149  helm package ./charts/kvisord
   150  helm push kvisord-0.1.6.tgz oci://us-east4-docker.pkg.dev/kvisor/helm-charts
   151  ```
   152  
   153  Test chart template
   154  ```sh
   155  helm template kvisord oci://us-east4-docker.pkg.dev/kvisor/helm-charts/kvisord --version 0.1.6
   156  ```
   157  
   158  ### Public demo
   159  
   160  Install kvisord
   161  
   162  ```sh
   163  helm upgrade --install kvisord oci://us-east4-docker.pkg.dev/kvisor/helm-charts/kvisord \
   164      --version 0.7.0 \
   165      --namespace kvisord --create-namespace \
   166      --set storage.resources.requests.cpu=2 \
   167      --set storage.resources.requests.memory=8Gi \
   168      --set agent.resources.requests.cpu=100m \
   169      --set agent.resources.requests.memory=128Mi \
   170      --set server.resources.requests.cpu=2 \
   171      --set server.resources.requests.memory=4Gi \
   172      --set server.extraArgs.events-batch-size=1000 \
   173      --set server.extraArgs.events-batch-queue-size=30000 \
   174      --set-string server.extraArgs.workload-profiles-enabled=false \
   175  ```
   176  
   177  Open port-forward to local dashboard
   178  
   179  ```sh
   180  kubectl port-forward svc/kvisord-server 6060:80 -n kvisord
   181  ```
   182  
   183  Delete kvisord
   184  
   185  ```sh
   186  helm uninstall kvisord -n kvisord
   187  ```
   188  
   189  ### Integrate with CASTAI
   190  ```
   191  helm upgrade --install castai-kvisor oci://us-east4-docker.pkg.dev/kvisor/helm-charts/castai-kvisor \
   192      --version 0.12.0 \
   193      --namespace kvisord --create-namespace \
   194      --set image.tag=0f07db05bbe55f9aba04952337f7023f3a4553e5 \
   195      --set castai.clusterID=<CLUSTER-ID> \
   196      --set castai.apiKey=<API-KEY> \
   197      --set agent.enabled=true \
   198      --set controller.extraArgs.image-scan-enabled=true \
   199      --set controller.extraArgs.kube-bench-enabled=true
   200  ```
   201  
   202  ## Misc
   203  
   204  List available trace functions for ftrace.
   205  ```
   206  cat /sys/kernel/debug/tracing/available_filter_functions | grep socket_connect
   207  ```
   208  
   209  ## Making new release
   210  
   211  1. Go to https://github.com/castai/kvisor/releases
   212  2. Click draw new release (should open https://github.com/castai/kvisor/releases/new)
   213  3. Choose tag. Add new tag. Follow semver. For fixes only bump patch version.
   214  4. Click generate release notes.
   215  5. Publish release.
   216  
   217  
   218  ## Testing netflow
   219  
   220  Install kvisor with netflow export to local clickhouse.
   221  
   222  ```
   223  helm repo add castai-helm https://castai.github.io/helm-charts
   224  helm repo update castai-helm
   225  
   226  helm upgrade --install castai-kvisor castai-helm/castai-kvisor \
   227      --namespace castai-agent --create-namespace \
   228      --set castai.enabled=false \
   229      --set agent.enabled=true \
   230      --set agent.extraArgs.netflow-enabled=true \
   231      --set clickhouse.enabled=true
   232  ```
   233  
   234  Check pods are running
   235  
   236  ```
   237  kubectl get pods -n kvisor
   238  ```
   239  
   240  You should see agent, clickhouse and controller pods
   241  
   242  ```
   243  NAME                                        READY   STATUS    RESTARTS   AGE
   244  castai-kvisor-agent-djjcq                   1/1     Running   0          67s
   245  castai-kvisor-clickhouse-0                  2/2     Running   0          66s
   246  castai-kvisor-controller-8697bbf8cd-sq6jp   1/1     Running   0          67s
   247  ```
   248  
   249  ### Query flows
   250  
   251  Port forward clickhouse connection
   252  
   253  ```
   254  kubectl port-forward -n kvisor svc/castai-kvisor-clickhouse 8123
   255  ```
   256  Connect to clickhouse with your favorite sql client with credentials:
   257  ```
   258  Username: kvisor
   259  Password: kvisor
   260  Database: kvisor
   261  ```
   262  
   263  Example query:
   264  
   265  ```sql
   266  select toStartOfInterval(start, INTERVAL 1 HOUR) AS period,
   267         pod_name,
   268         namespace,
   269         workload_name,
   270         workload_kind,
   271         zone,
   272         dst_pod_name,
   273         dst_namespace,
   274         dst_domain,
   275         dst_workload_name,
   276         dst_workload_kind,
   277         dst_zone,
   278         formatReadableSize(sum(tx_bytes)) total_egress,
   279         formatReadableSize(sum(rx_bytes)) total_ingress
   280  from netflows
   281  group by period,
   282           pod_name,
   283           namespace,
   284           workload_name,
   285           workload_kind,
   286           zone,
   287           dst_pod_name,
   288           dst_namespace,
   289           dst_domain,
   290           dst_workload_name,
   291           dst_workload_kind,
   292           dst_zone
   293  order by period;
   294  ```