sigs.k8s.io/kueue@v0.6.2/README.md (about)

     1  # Kueue
     2  
     3  [![GoReport Widget]][GoReport Status]
     4  [![Latest Release](https://img.shields.io/github/v/release/kubernetes-sigs/kueue?include_prereleases)](https://github.com/kubernetes-sigs/kueue/releases/latest)
     5  
     6  [GoReport Widget]: https://goreportcard.com/badge/github.com/kubernetes-sigs/kueue
     7  [GoReport Status]: https://goreportcard.com/report/github.com/kubernetes-sigs/kueue
     8  
     9  <img src="https://github.com/kubernetes-sigs/kueue/blob/main/site/static/images/logo.svg" width="100" alt="kueue logo">
    10  
    11  Kueue is a set of APIs and controller for [job](https://kueue.sigs.k8s.io/docs/concepts/workload)
    12  [queueing](https://kueue.sigs.k8s.io/docs/concepts#queueing). It is a job-level manager that decides when
    13  a job should be [admitted](https://kueue.sigs.k8s.io/docs/concepts#admission) to start (as in pods can be
    14  created) and when it should stop (as in active pods should be deleted).
    15  
    16  Read the [overview](https://kueue.sigs.k8s.io/docs/overview/) to learn more.
    17  
    18  ## Features overview
    19  
    20  - **Job management:** Support job queueing based on [priorities](https://kueue.sigs.k8s.io/docs/concepts/workload/#priority) with different [strategies](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#queueing-strategy): `StrictFIFO` and `BestEffortFIFO`.
    21  - **Resource management:** Support resource fair sharing and [preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption) with a variety of policies between different tenants.
    22  - **Dynamic resource reclaim:** A mechanism to [release](https://kueue.sigs.k8s.io/docs/concepts/workload/#dynamic-reclaim) quota as the pods of a Job complete.
    23  - **Resource flavor fungibility:** Quota [borrowing or preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#flavorfungibility) in ClusterQueue and Cohort.
    24  - **Integrations:** Built-in support for popular jobs, e.g. [BatchJob](https://kueue.sigs.k8s.io/docs/tasks/run_jobs/), [Kubeflow training jobs](https://kueue.sigs.k8s.io/docs/tasks/run_kubeflow_jobs/), [RayJob](https://kueue.sigs.k8s.io/docs/tasks/run_rayjobs/), [RayCluster](https://kueue.sigs.k8s.io/docs/tasks/run_rayclusters/), [JobSet](https://kueue.sigs.k8s.io/docs/tasks/run_jobsets/),  [plain Pod](https://kueue.sigs.k8s.io/docs/tasks/run_plain_pods/).
    25  - **System insight:** Build-in [prometheus metrics](https://kueue.sigs.k8s.io/docs/reference/metrics/) to help monitor the state of the system, as well as Conditions.
    26  - **AdmissionChecks:** A mechanism for internal or external components to influence whether a workload can be [admitted](https://kueue.sigs.k8s.io/docs/concepts/admission_check/).
    27  - **Advanced autoscaling support:** Integration with cluster-autoscaler's [provisioningRequest](https://kueue.sigs.k8s.io/docs/admission-check-controllers/provisioning/#job-using-a-provisioningrequest) via admissionChecks.
    28  - **Sequential admission:** A simple implementation of [all-or-nothing scheduling](https://kueue.sigs.k8s.io/docs/tasks/setup_sequential_admission/).
    29  - **Partial admission:** Allows jobs to run with a [smaller parallelism](https://kueue.sigs.k8s.io/docs/tasks/run_jobs/#partial-admission), based on available quota, if the application supports it.
    30  
    31  ## Production Readiness status
    32  
    33  - ✔️ API version: v1beta1, respecting [Kubernetes Deprecation Policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/)
    34  - ✔️ Up-to-date [documentation](https://kueue.sigs.k8s.io/docs).
    35  - ✔️ Test Coverage:
    36    - ✔️ Unit Test [testgrid](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-unit-main).
    37    - ✔️ Integration Test [testgrid](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-integration-main)
    38    - ✔️ E2E Tests for Kubernetes
    39      [1.25](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-e2e-main-1-25),
    40      [1.26](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-e2e-main-1-26),
    41      [1.27](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-e2e-main-1-27),
    42      [1.28](https://testgrid.k8s.io/sig-scheduling#periodic-kueue-test-e2e-main-1-28)
    43      on Kind.
    44  - ✔️ Scalability verification via [performance tests](https://github.com/kubernetes-sigs/kueue/tree/main/test/performance).
    45  - ✔️ Monitoring via [metrics](https://kueue.sigs.k8s.io/docs/reference/metrics).
    46  - ✔️ Security: RBAC based accessibility.
    47  - ✔️ Stable release cycle(2-3 months) for new features, bugfixes, cleanups.
    48  - ✔️ [Adopters](https://kueue.sigs.k8s.io/docs/adopters/) running on production.
    49  
    50    _Based on community feedback, we continue to simplify and evolve the API to
    51    address new use cases_.
    52  
    53  ## Installation
    54  
    55  **Requires Kubernetes 1.22 or newer**.
    56  
    57  To install the latest release of Kueue in your cluster, run the following command:
    58  
    59  ```shell
    60  kubectl apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.6.2/manifests.yaml
    61  ```
    62  
    63  The controller runs in the `kueue-system` namespace.
    64  
    65  Read the [installation guide](https://kueue.sigs.k8s.io/docs/installation/) to learn more.
    66  
    67  ## Usage
    68  
    69  A minimal configuration can be set by running the [examples](examples):
    70  
    71  ```shell
    72  kubectl apply -f examples/admin/single-clusterqueue-setup.yaml
    73  ```
    74  
    75  Then you can run a job with:
    76  
    77  ```shell
    78  kubectl create -f examples/jobs/sample-job.yaml
    79  ```
    80  
    81  Learn more about:
    82  
    83  - Kueue [concepts](https://kueue.sigs.k8s.io/docs/concepts).
    84  - Common and advanced [tasks](https://kueue.sigs.k8s.io/docs/tasks).
    85  
    86  ## Architecture
    87  
    88  <!-- TODO(#64) Remove links to google docs once the contents have been migrated to this repo -->
    89  
    90  Learn more about the architecture of Kueue with the following design docs:
    91  
    92  - [bit.ly/kueue-apis](https://bit.ly/kueue-apis) discusses the API proposal and a high
    93    level description of how Kueue operates. Join the [mailing list](https://groups.google.com/a/kubernetes.io/g/wg-batch)
    94  to get document access.
    95  - [bit.ly/kueue-controller-design](https://bit.ly/kueue-controller-design)
    96  presents the detailed design of the controller.
    97  
    98  ## Roadmap
    99  
   100  This is a high-level overview of the main priorities for 2023, in expected order of release:
   101  
   102  - Cooperative preemption support for workloads that implement checkpointing [#477](https://github.com/kubernetes-sigs/kueue/issues/477)
   103  - Flavor assignment strategies, e.g. _minimizing cost_ vs _minimizing borrowing_ [#312](https://github.com/kubernetes-sigs/kueue/issues/312)
   104  - Integration with cluster-autoscaler for guaranteed resource provisioning
   105  - Integration with common custom workloads [#74](https://github.com/kubernetes-sigs/kueue/issues/74):
   106    - Kubeflow (TFJob, MPIJob, etc.)
   107    - Spark
   108    - Ray
   109    - Workflows (Tekton, Argo, etc.)
   110  
   111  These are features that we aim to have in the long-term, in no particular order:
   112  
   113  - Budget support [#28](https://github.com/kubernetes-sigs/kueue/issues/28)
   114  - Dashboard for management and monitoring for administrators
   115  - Multi-cluster support
   116  
   117  ## Community, discussion, contribution, and support
   118  
   119  Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/)
   120  and the [contributor's guide](CONTRIBUTING.md).
   121  
   122  You can reach the maintainers of this project at:
   123  
   124  - [Slack](https://kubernetes.slack.com/messages/wg-batch)
   125  - [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)
   126  
   127  ### Code of conduct
   128  
   129  Participation in the Kubernetes community is governed by the [Kubernetes Code of Conduct](code-of-conduct.md).