github.com/kubeflow/training-operator@v1.7.0/README.md

github.com/kubeflow/training-operator@v1.7.0/README.md (about)

     1  # Kubeflow Training Operator
     2  
     3  [![Build Status](https://github.com/kubeflow/training-operator/actions/workflows/test-go.yaml/badge.svg?branch=master)](https://github.com/kubeflow/training-operator/actions/workflows/test-go.yaml?branch=master)
     4  [![Coverage Status](https://coveralls.io/repos/github/kubeflow/training-operator/badge.svg?branch=master)](https://coveralls.io/github/kubeflow/training-operator?branch=master)
     5  [![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/training-operator)](https://goreportcard.com/report/github.com/kubeflow/training-operator)
     6  
     7  ## Overview
     8  
     9  Starting from v1.3, this training operator provides Kubernetes custom resources that makes it easy to
    10  run distributed or non-distributed TensorFlow/PyTorch/Apache MXNet/XGBoost/MPI jobs on Kubernetes.
    11  
    12  > Note: Before v1.2 release, Kubeflow Training Operator only supports TFJob on Kubernetes.
    13  
    14  - For a complete reference of the custom resource definitions, please refer to the API Definition.
    15    - [TensorFlow API Definition](pkg/apis/kubeflow.org/v1/tensorflow_types.go)
    16    - [PyTorch API Definition](pkg/apis/kubeflow.org/v1/pytorch_types.go)
    17    - [Apache MXNet API Definition](pkg/apis/kubeflow.org/v1/mxnet_types.go)
    18    - [XGBoost API Definition](pkg/apis/kubeflow.org/v1/xgboost_types.go)
    19    - [MPI API Definition](pkg/apis/kubeflow.org/v1/mpi_types.go)
    20    - [PaddlePaddle API Definition](pkg/apis/kubeflow.org/v1/paddlepaddle_types.go)
    21  - For details on API design, please refer to the [v1alpha2 design doc](https://github.com/kubeflow/community/blob/master/proposals/tf-operator-design-v1alpha2.md).
    22  - For details of all-in-one operator design, please refer to the [All-in-one Kubeflow Training Operator](https://docs.google.com/document/d/1x1JPDQfDMIbnoQRftDH1IzGU0qvHGSU4W6Jl4rJLPhI/edit#heading=h.e33ufidnl8z6)
    23  - For details on its observability, please refer to the [monitoring design doc](docs/monitoring/README.md).
    24  
    25  ## Prerequisites
    26  
    27  - Version >= 1.23 of Kubernetes cluster and `kubectl`
    28  
    29  ## Installation
    30  
    31  ### Master Branch
    32  
    33  ```bash
    34  kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone"
    35  ```
    36  
    37  ### Stable Release
    38  
    39  ```bash
    40  kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.5.0"
    41  ```
    42  
    43  ### TensorFlow Release Only
    44  
    45  For users who prefer to use original TensorFlow controllers, please checkout `v1.2-branch`, patches for bug fixes will still be accepted to this branch.
    46  
    47  ```bash
    48  kubectl apply -k "github.com/kubeflow/training-operator/manifests/overlays/standalone?ref=v1.2.0"
    49  ```
    50  
    51  ### Python SDK for Kubeflow Training Operator
    52  
    53  Training Operator provides Python SDK for the custom resources. More docs are available in [sdk/python](sdk/python) folder.
    54  
    55  Use `pip install` command to install the latest release of the SDK:
    56  
    57  ```
    58  pip install kubeflow-training
    59  ```
    60  
    61  ## Quick Start
    62  
    63  Please refer to the [quick-start-v1.md](docs/quick-start-v1.md) and [Kubeflow Training User Guide](https://www.kubeflow.org/docs/guides/components/tftraining/) for more information.
    64  
    65  ## API Documentation
    66  
    67  Please refer to following API Documentation:
    68  
    69  - [Kubeflow.org v1 API Documentation](docs/api/kubeflow.org_v1_generated.asciidoc)
    70  
    71  ## Community
    72  
    73  You can:
    74  
    75  - Join our [Slack](https://www.kubeflow.org/docs/about/community/#kubeflow-slack) channel.
    76  - Check out [who is using this operator](./docs/adopters.md).
    77  
    78  This is a part of Kubeflow, so please see [readme in kubeflow/kubeflow](https://github.com/kubeflow/kubeflow#get-involved) to get in touch with the community.
    79  
    80  ## Contributing
    81  
    82  Please refer to the [DEVELOPMENT](docs/development/developer_guide.md)
    83  
    84  ## Change Log
    85  
    86  Please refer to [CHANGELOG](CHANGELOG.md)
    87  
    88  ## Version Matrix
    89  
    90  The following table lists the most recent few versions of the operator.
    91  
    92  | Operator Version       | API Version | Kubernetes Version |
    93  | ---------------------- | ----------- | ------------------ |
    94  | `v1.0.x`               | `v1`        | 1.16+              |
    95  | `v1.1.x`               | `v1`        | 1.16+              |
    96  | `v1.2.x`               | `v1`        | 1.16+              |
    97  | `v1.3.x`               | `v1`        | 1.18+              |
    98  | `v1.4.x`               | `v1`        | 1.23+              |
    99  | `v1.5.x`               | `v1`        | 1.23+              |
   100  | `latest` (master HEAD) | `v1`        | 1.23+              |
   101  
   102  ## Acknowledgement
   103  
   104  This project was originally started as a distributed training operator for TensorFlow and later we merged efforts from other Kubeflow training operators to provide a unified and simplified experience for both users and developers. We are very grateful to all who filed issues or helped resolve them, asked and answered questions, and were part of inspiring discussions. We'd also like to thank everyone who's contributed to and maintained the original operators.
   105  
   106  - PyTorch Operator: [list of contributors](https://github.com/kubeflow/pytorch-operator/graphs/contributors) and [maintainers](https://github.com/kubeflow/pytorch-operator/blob/master/OWNERS).
   107  - MPI Operator: [list of contributors](https://github.com/kubeflow/mpi-operator/graphs/contributors) and [maintainers](https://github.com/kubeflow/mpi-operator/blob/master/OWNERS).
   108  - XGBoost Operator: [list of contributors](https://github.com/kubeflow/xgboost-operator/graphs/contributors) and [maintainers](https://github.com/kubeflow/xgboost-operator/blob/master/OWNERS).
   109  - MXNet Operator: [list of contributors](https://github.com/kubeflow/mxnet-operator/graphs/contributors) and [maintainers](https://github.com/kubeflow/mxnet-operator/blob/master/OWNERS).
   110  - Common library: [list of contributors](https://github.com/kubeflow/common/graphs/contributors) and [maintainers](https://github.com/kubeflow/common/blob/master/OWNERS).