sigs.k8s.io/kueue@v0.6.2/README.md (about) 1 # Kueue 2 3 [![GoReport Widget]][GoReport Status] 4 [](https://github.com/kubernetes-sigs/kueue/releases/latest) 5 6 [GoReport Widget]: https://goreportcard.com/badge/github.com/kubernetes-sigs/kueue 7 [GoReport Status]: https://goreportcard.com/report/github.com/kubernetes-sigs/kueue 8 9 <img src="https://github.com/kubernetes-sigs/kueue/blob/main/site/static/images/logo.svg" width="100" alt="kueue logo"> 10 11 Kueue is a set of APIs and controller for [job](https://kueue.sigs.k8s.io/docs/concepts/workload) 12 [queueing](https://kueue.sigs.k8s.io/docs/concepts#queueing). It is a job-level manager that decides when 13 a job should be [admitted](https://kueue.sigs.k8s.io/docs/concepts#admission) to start (as in pods can be 14 created) and when it should stop (as in active pods should be deleted). 15 16 Read the [overview](https://kueue.sigs.k8s.io/docs/overview/) to learn more. 17 18 ## Features overview 19 20 - **Job management:** Support job queueing based on [priorities](https://kueue.sigs.k8s.io/docs/concepts/workload/#priority) with different [strategies](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#queueing-strategy): `StrictFIFO` and `BestEffortFIFO`. 21 - **Resource management:** Support resource fair sharing and [preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption) with a variety of policies between different tenants. 22 - **Dynamic resource reclaim:** A mechanism to [release](https://kueue.sigs.k8s.io/docs/concepts/workload/#dynamic-reclaim) quota as the pods of a Job complete. 23 - **Resource flavor fungibility:** Quota [borrowing or preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#flavorfungibility) in ClusterQueue and Cohort. 24 - **Integrations:** Built-in support for popular jobs, e.g. [BatchJob](https://kueue.sigs.k8s.io/docs/tasks/run_jobs/), [Kubeflow training jobs](https://kueue.sigs.k8s.io/docs/tasks/run_kubeflow_jobs/), [RayJob](https://kueue.sigs.k8s.io/docs/tasks/run_rayjobs/), [RayCluster](https://kueue.sigs.k8s.io/docs/tasks/run_rayclusters/), [JobSet](https://kueue.sigs.k8s.io/docs/tasks/run_jobsets/), [plain Pod](https://kueue.sigs.k8s.io/docs/tasks/run_plain_pods/). 25 - **System insight:** Build-in [prometheus metrics](https://kueue.sigs.k8s.io/docs/reference/metrics/) to help monitor the state of the system, as well as Conditions. 26 - **AdmissionChecks:** A mechanism for internal or external components to influence whether a workload can be [admitted](https://kueue.sigs.k8s.io/docs/concepts/admission_check/). 27 - **Advanced autoscaling support:** Integration with cluster-autoscaler's [provisioningRequest](https://kueue.sigs.k8s.io/docs/admission-check-controllers/provisioning/#job-using-a-provisioningrequest) via admissionChecks. 28 - **Sequential admission:** A simple implementation of [all-or-nothing scheduling](https://kueue.sigs.k8s.io/docs/tasks/setup_sequential_admission/). 29 - **Partial admission:** Allows jobs to run with a [smaller parallelism](https://kueue.sigs.k8s.io/docs/tasks/run_jobs/#partial-admission), based on available quota, if the application supports it. 30 31 ## Production Readiness status 32 33 - ✔️ API version: v1beta1, respecting [Kubernetes Deprecation Policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/) 34 - ✔️ Up-to-date [documentation](https://kueue.sigs.k8s.io/docs). 35 - ✔️ Test Coverage: 36 - ✔️ Unit Test [testgrid](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-unit-main). 37 - ✔️ Integration Test [testgrid](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-integration-main) 38 - ✔️ E2E Tests for Kubernetes 39 [1.25](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-e2e-main-1-25), 40 [1.26](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-e2e-main-1-26), 41 [1.27](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-e2e-main-1-27), 42 [1.28](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-e2e-main-1-28) 43 on Kind. 44 - ✔️ Scalability verification via [performance tests](https://github.com/kubernetes-sigs/kueue/tree/main/test/performance). 45 - ✔️ Monitoring via [metrics](https://kueue.sigs.k8s.io/docs/reference/metrics). 46 - ✔️ Security: RBAC based accessibility. 47 - ✔️ Stable release cycle(2-3 months) for new features, bugfixes, cleanups. 48 - ✔️ [Adopters](https://kueue.sigs.k8s.io/docs/adopters/) running on production. 49 50 _Based on community feedback, we continue to simplify and evolve the API to 51 address new use cases_. 52 53 ## Installation 54 55 **Requires Kubernetes 1.22 or newer**. 56 57 To install the latest release of Kueue in your cluster, run the following command: 58 59 ```shell 60 kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.6.2/manifests.yaml 61 ``` 62 63 The controller runs in the `kueue-system` namespace. 64 65 Read the [installation guide](https://kueue.sigs.k8s.io/docs/installation/) to learn more. 66 67 ## Usage 68 69 A minimal configuration can be set by running the [examples](examples): 70 71 ```shell 72 kubectl apply -f examples/admin/single-clusterqueue-setup.yaml 73 ``` 74 75 Then you can run a job with: 76 77 ```shell 78 kubectl create -f examples/jobs/sample-job.yaml 79 ``` 80 81 Learn more about: 82 83 - Kueue [concepts](https://kueue.sigs.k8s.io/docs/concepts). 84 - Common and advanced [tasks](https://kueue.sigs.k8s.io/docs/tasks). 85 86 ## Architecture 87 88 <!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo --> 89 90 Learn more about the architecture of Kueue with the following design docs: 91 92 - [bit.ly/kueue-apis](https://bit.ly/kueue-apis) discusses the API proposal and a high 93 level description of how Kueue operates. Join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch) 94 to get document access. 95 - [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design) 96 presents the detailed design of the controller. 97 98 ## Roadmap 99 100 This is a high-level overview of the main priorities for 2023, in expected order of release: 101 102 - Cooperative preemption support for workloads that implement checkpointing [#477](https://github.com/kubernetes-sigs/kueue/issues/477) 103 - Flavor assignment strategies, e.g. _minimizing cost_ vs _minimizing borrowing_ [#312](https://github.com/kubernetes-sigs/kueue/issues/312) 104 - Integration with cluster-autoscaler for guaranteed resource provisioning 105 - Integration with common custom workloads [#74](https://github.com/kubernetes-sigs/kueue/issues/74): 106 - Kubeflow (TFJob, MPIJob, etc.) 107 - Spark 108 - Ray 109 - Workflows (Tekton, Argo, etc.) 110 111 These are features that we aim to have in the long-term, in no particular order: 112 113 - Budget support [#28](https://github.com/kubernetes-sigs/kueue/issues/28) 114 - Dashboard for management and monitoring for administrators 115 - Multi-cluster support 116 117 ## Community, discussion, contribution, and support 118 119 Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/) 120 and the [contributor's guide](CONTRIBUTING.md). 121 122 You can reach the maintainers of this project at: 123 124 - [Slack](https://kubernetes.slack.com/messages/wg-batch) 125 - [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch) 126 127 ### Code of conduct 128 129 Participation in the Kubernetes community is governed by the [Kubernetes Code of Conduct](code-of-conduct.md).