github.com/pachyderm/pachyderm@v1.13.4/doc/docs/1.11.x/deploy-manage/manage/configure-external-access.md (about)

     1  # Overview
     2  
     3  When you deploy a Kubernetes application, like Pachyderm, typically, it cannot
     4  be accessed from outside the Kubernetes cluster right away. This ensures that
     5  your cluster is secure and resilient to malicious cyber attacks. In the
     6  simplest example, such as running a Pachyderm cluster locally, implicit and
     7  explicit port-forwarding enables you to communicate with `pachd`, the Pachyderm
     8  daemon pod, and `pachd-dash`, the Pachyderm UI. Port-forwarding can be used in
     9  cloud environments as well, but a production environment might require you to
    10  define more sophisticated inbound connection rules.
    11  
    12  Kubernetes provides multiple ways to deliver external traffic to a service,
    13  including the following:
    14  
    15  * Through a Service with `type: NodePort`. A NodePort service provides basic
    16  access to services. By default, the `pachd` service is deployed as `NodePort`
    17  to simplify interaction with your localhost. `NodePort` is a limited solution
    18  that might not be considered reliable or secure enough in production
    19  deployments. 
    20  
    21  * Through a Service with `type: LoadBalancer`. A Kubernetes service with
    22  `type: LoadBalancer` can perform basic load balancing. Typically, if you
    23  change the `pachd` service type to LoadBalancer in a cloud provider, the
    24  cloud provider automatically deploys a load balancer to serve your
    25  application. The downside of this approach is that you will have to change
    26  all your services to the load balancer type and have a separate load
    27  balancers for each service. This can become difficult to manage long term.
    28  
    29  * Through the `Ingress` Kubernetes resource. An ingress resource is
    30  completely independent of the services that you deploy. Because an Ingress
    31  resource provides advanced routing capabilities, it is the recommended option
    32  to use with Pachyderm. The only complication of this approach is that you need
    33  to deploy an ingress controller, such as NGINX or traefik, in your Kubernetes
    34  cluster. Such an ingress controller is not deployed by default with the
    35  `pachctl deploy command`.
    36  
    37  ## Pachyderm Ingress Requirements
    38  
    39  Kubernetes supports multiple ingress controller options, and you are free to
    40  pick the one that works best for your environment. However, not all of them
    41  might be fully compatible with Pachyderm. Moreover, exposing your cluster
    42  through an ingress incorrectly might make your Pachyderm cluster and your
    43  data insecure and vulnerable to external attacks. Regardless of your
    44  choice of ingress resource, your environment must meet the
    45  following security requirements to protect your data:
    46  
    47  * **Use secure connection**
    48  
    49    Exposing your application to an outside world might pose a security
    50    risk to your data and organization. Make sure that you have Transport
    51    Layer Security (TLS) enabled for Ingress connections.
    52  
    53  * **Use Pachyderm authentication**
    54  
    55    Pachyderm authentication must be enabled and access provided to a
    56    verified list of users. Pachyderm authentication is an additional
    57    security layer to protect your data from malicious attacks.
    58    If you cannot use Pachyderm authentication providers, we highly recommend to
    59    use Pachyderm port-forwarding for security reasons. Exposing Pachyderm
    60    services through an ingress without Pachyderm authentication might result in
    61    your Pachyderm and Kubernetes clusters being compromised, along with your data.
    62  
    63  * **The ingress controller must support gRPC protocol and websockets**
    64  
    65    Some of the ingress controllers that support gRPC include NGNIX and Traefik.
    66  
    67  ## Ingress Configuration Workflow
    68  
    69  This section outlines the general workflow for ingress configuration.
    70  Depending on your use case, you might need to start from the bottom of
    71  this list and determine your firewall and whitelist requirements first.
    72  But commonly, you need to start with deciding which ingress controller
    73  you want to use. In any case, read and understand the requirements
    74  outlined below before you proceed with any configuration.
    75  
    76  A general workflow for enabling external traffic inside of a Pachyderm
    77  cluster includes the following steps:
    78  
    79  * **Configure Kubernetes networking.**
    80  
    81    You can use one of the following options:
    82  
    83    * (Recommended) Deploy an ingress controller and configure an ingress
    84    resource.
    85  
    86      Pachyderm supports the [Traefik](https://docs.traefik.io/)
    87      ingress controller. For more information, see
    88      [Expose a Pachyderm UI Through an Ingress](../expose-pach-ui-ingress/).
    89  
    90    * Configure the pachd service as a `LoadBalancer` by changing 
    91    `type: Nodeport` to `type: LoabBalancer` in the `pachd` service
    92    manifest. As mentioned above, this is the simplest way to expose
    93    Pachyderm services to the outside world that does not provide
    94    any sophisticated control over load balancing. This option works
    95    on most cloud platforms, such as AWS and GKE, as well as in
    96    minikube, and majorly used for internal use.
    97  
    98  * **Configure access to your ingress public IP addresses through firewalls
    99  and whitelisting.**
   100  
   101    If you are deploying Pachyderm on a cloud provider, you need to make sure
   102    that the ingress IP is available to external users. For example, in AWS, you can
   103    configure access through security groups in the Virtual Private Cloud (VPC)
   104    on which the Kubernetes with Pachyderm runs. Other cloud providers have
   105    similar functionality.
   106  
   107  * **Secure the connection end-to-end.**
   108  
   109    If you run Pachyderm in a cloud platform, the cloud provider is responsible
   110    for securing the underlying infrastructure, such as the Kubernetes control
   111    plane. Most cloud providers have a security compliance program that address
   112    these issues. If you are running Kubernetes locally, the security of
   113    Kubernetes APIs, kubelet, and other components becomes your responsibility.
   114    See security recommendations in the [Kubernetes documentation](https://kubernetes.io/docs/tasks/administer-cluster/securing-a-cluster/). 
   115  
   116    As for Pachyderm, you need to make sure that you deploy Pachyderm with
   117    TLS enabled. You can deploy `pachd` and `dash` with different certificates
   118    if required. Self-signed certificates might require additional configuration.
   119    For instructions on deployment with TLS, see [Deploy Pachyderm with TLS](https://docs.pachyderm.com/latest/deploy-manage/deploy/deploy_w_tls/).
   120  
   121    In addition, you must have administrative access to the Domain Name
   122    Server (DNS) that you will use to access Pachyderm. If you are deploying
   123    Pachyderm to an internal site with a self-signed certificate, contact our
   124    support organization for assistance.
   125  
   126  !!! note "See Also"
   127  
   128      - [Expose a Pachyderm UI Through an Ingress](../expose-pach-ui-ingress/)