github.com/smintz/nomad@v0.8.3/website/source/guides/spark/pre.html.md

github.com/smintz/nomad@v0.8.3/website/source/guides/spark/pre.html.md (about)

     1  ---
     2  layout: "guides"
     3  page_title: "Apache Spark Integration - Getting Started"
     4  sidebar_current: "guides-spark-pre"
     5  description: |-
     6    Get started with the Nomad/Spark integration.
     7  ---
     8  
     9  # Getting Started
    10  
    11  To get started, you can use Nomad's example Terraform configuration to 
    12  automatically provision an environment in AWS, or you can manually provision a 
    13  cluster.
    14  
    15  ## Provision a Cluster in AWS
    16  
    17  Nomad's [Terraform configuration](https://github.com/hashicorp/nomad/tree/master/terraform) 
    18  can be used to quickly provision a Spark-enabled Nomad environment in
    19   AWS. The embedded [Spark example](https://github.com/hashicorp/nomad/tree/master/terraform/examples/spark)
    20   provides for a quickstart experience that can be used in conjunction with 
    21   this guide. When you have a cluster up and running, you can proceed to 
    22  [Submitting applications](/guides/spark/submit.html).
    23  
    24  ## Manually Provision a Cluster
    25  
    26  To manually configure provision a cluster, see the Nomad 
    27  [Getting Started](/intro/getting-started/install.html) guide. There are two 
    28  basic prerequisites to using the Spark integration once you have a cluster up 
    29  and running:
    30  
    31  - Access to a [Spark distribution](https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz) 
    32  built with Nomad support. This is required for the machine that will submit 
    33  applications as well as the Nomad tasks that will run the Spark executors.
    34  
    35  - A Java runtime environment (JRE) for the submitting machine and the executors.
    36  
    37  The subsections below explain further.
    38  
    39  ### Configure the Submitting Machine
    40  
    41  To run Spark applications on Nomad, the submitting machine must have access to 
    42  the cluster and have the Nomad-enabled Spark distribution installed. The code 
    43  snippets below walk through installing Java and Spark on Ubuntu:
    44  
    45  Install Java:
    46  
    47  ```shell
    48  $ sudo add-apt-repository -y ppa:openjdk-r/ppa
    49  $ sudo apt-get update 
    50  $ sudo apt-get install -y openjdk-8-jdk
    51  $ JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
    52  ```
    53  
    54  Install Spark:
    55  
    56  
    57  ```shell
    58  $ wget -O - https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz \
    59    | sudo tar xz -C /usr/local
    60  $ export PATH=$PATH:/usr/local/spark-2.1.0-bin-nomad/bin
    61  ```
    62  
    63  Export NOMAD_ADDR to point Spark to your Nomad cluster:
    64  
    65  ```shell
    66  $ export NOMAD_ADDR=http://NOMAD_SERVER_IP:4646
    67  ```
    68  
    69  ### Executor Access to the Spark Distribution
    70  
    71  When running on Nomad, Spark creates Nomad tasks to run executors for use by the 
    72  application's driver program. The executor tasks need access to a JRE, a Spark 
    73  distribution built with Nomad support, and (in cluster mode) the Spark 
    74  application itself. By default, Nomad will only place Spark executors on client 
    75  nodes that have the Java runtime installed (version 7 or higher).
    76  
    77  In this example, the Spark distribution and the Spark application JAR file are
    78  being pulled from Amazon S3:
    79  
    80  ```shell
    81  $ spark-submit \
    82      --class org.apache.spark.examples.JavaSparkPi \
    83      --master nomad \
    84      --deploy-mode cluster \
    85      --conf spark.executor.instances=4 \
    86      --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz \
    87      https://s3.amazonaws.com/nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
    88  ```
    89  
    90  ### Using a Docker Image
    91  
    92  An alternative to installing the JRE on every client node is to set the 
    93  [spark.nomad.dockerImage](/guides/spark/configuration.html#spark-nomad-dockerimage)
    94   configuration property to the URL of a Docker image that has the Java runtime 
    95  installed. If set, Nomad will use the `docker` driver to run Spark executors in 
    96  a container created from the image. The 
    97  [spark.nomad.dockerAuth](/guides/spark/configuration.html#spark-nomad-dockerauth) 
    98   configuration property can be set to a JSON object to provide Docker repository
    99   authentication configuration.
   100  
   101  When using a Docker image, both the Spark distribution and the application 
   102  itself can be included (in which case local URLs can be used for `spark-submit`).
   103  
   104  Here, we include [spark.nomad.dockerImage](/guides/spark/configuration.html#spark-nomad-dockerimage) 
   105  and use local paths for 
   106  [spark.nomad.sparkDistribution](/guides/spark/configuration.html#spark-nomad-sparkdistribution) 
   107  and the application JAR file:
   108  
   109  ```shell
   110  $ spark-submit \
   111      --class org.apache.spark.examples.JavaSparkPi \
   112      --master nomad \
   113      --deploy-mode cluster \
   114      --conf spark.nomad.dockerImage=rcgenova/spark \
   115      --conf spark.executor.instances=4 \
   116      --conf spark.nomad.sparkDistribution=/spark-2.1.0-bin-nomad.tgz \
   117      /spark-examples_2.11-2.1.0-SNAPSHOT.jar 100
   118  ```
   119  
   120  ## Next Steps
   121  
   122  Learn how to [submit applications](/guides/spark/submit.html).