github.com/kubeflow/training-operator@v1.7.0/docs/development/developer_guide.md (about) 1 # Developer Guide 2 3 Kubeflow Training Operator is currently at v1. 4 5 ## Requirements 6 7 - [Go](https://golang.org/) (1.20 or later) 8 9 ## Building the operator 10 11 Create a symbolic link inside your GOPATH to the location you checked out the code 12 13 ```sh 14 mkdir -p ${go env GOPATH}/src/github.com/kubeflow 15 ln -sf ${GIT_TRAINING} ${go env GOPATH}/src/github.com/kubeflow/training-operator 16 ``` 17 18 - GIT_TRAINING should be the location where you checked out https://github.com/kubeflow/training-operator 19 20 Install dependencies 21 22 ```sh 23 go mod vendor 24 ``` 25 26 Build it 27 28 ```sh 29 go install github.com/kubeflow/training-operator/cmd/training-operator.v1 30 ``` 31 32 ## Running the Operator Locally 33 34 Running the operator locally (as opposed to deploying it on a K8s cluster) is convenient for debugging/development. 35 36 ### Run a Kubernetes cluster 37 38 First, you need to run a Kubernetes cluster locally. There are lots of choices: 39 40 - [local-up-cluster.sh in Kubernetes](https://github.com/kubernetes/kubernetes/blob/master/hack/local-up-cluster.sh) 41 - [minikube](https://github.com/kubernetes/minikube) 42 43 `local-up-cluster.sh` runs a single-node Kubernetes cluster locally, but Minikube runs a single-node Kubernetes cluster inside a VM. It is all compilable with the controller, but the Kubernetes version should be `1.8` or above. 44 45 Notice: If you use `local-up-cluster.sh`, please make sure that the kube-dns is up, see [kubernetes/kubernetes#47739](https://github.com/kubernetes/kubernetes/issues/47739) for more details. 46 47 ### Configure KUBECONFIG and KUBEFLOW_NAMESPACE 48 49 We can configure the operator to run locally using the configuration available in your kubeconfig to communicate with 50 a K8s cluster. Set your environment: 51 52 ```sh 53 export KUBECONFIG=$(echo ~/.kube/config) 54 export KUBEFLOW_NAMESPACE=$(your_namespace) 55 ``` 56 57 - KUBEFLOW_NAMESPACE is used when deployed on Kubernetes, we use this variable to create other resources (e.g. the resource lock) internal in the same namespace. It is optional, use `default` namespace if not set. 58 59 ### Create the TFJob CRD 60 61 After the cluster is up, the TFJob CRD should be created on the cluster. 62 63 ```bash 64 make install 65 ``` 66 67 ### Run Operator 68 69 Now we are ready to run operator locally: 70 71 ```sh 72 make run 73 ``` 74 75 To verify local operator is working, create an example job and you should see jobs created by it. 76 77 ```sh 78 cd ./examples/v1/dist-mnist 79 docker build -f Dockerfile -t kubeflow/tf-dist-mnist-test:1.0 . 80 kubectl create -f ./tf_job_mnist.yaml 81 ``` 82 83 ## Go version 84 85 On ubuntu the default go package appears to be gccgo-go which has problems see [issue](https://github.com/golang/go/issues/15429) golang-go package is also really old so install from golang tarballs instead. 86 87 ## Generate Python SDK 88 89 To generate Python SDK for the operator, run: 90 91 ``` 92 ./hack/python-sdk/gen-sdk.sh 93 ``` 94 95 This command will re-generate the api and model files together with the documentation and model tests. 96 The following files/folders in `sdk/python` are auto-generated and should not be modified directly: 97 98 ``` 99 sdk/python/docs 100 sdk/python/kubeflow/training/models 101 sdk/python/kubeflow/training/*.py 102 sdk/python/test/*.py 103 ``` 104 105 The Training Operator client and public APIs are located here: 106 107 ``` 108 sdk/python/kubeflow/training/api 109 ``` 110 111 ## Code Style 112 113 ### Python 114 115 - Use [yapf](https://github.com/google/yapf) to format Python code 116 - `yapf` style is configured in `.style.yapf` file 117 - To autoformat code 118 119 ```sh 120 yapf -i py/**/*.py 121 ``` 122 123 - To sort imports 124 125 ```sh 126 isort path/to/module.py 127 ```