github.com/smintz/nomad@v0.8.3/website/source/guides/spark/pre.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Apache Spark Integration - Getting Started" 4 sidebar_current: "guides-spark-pre" 5 description: |- 6 Get started with the Nomad/Spark integration. 7 --- 8 9 # Getting Started 10 11 To get started, you can use Nomad's example Terraform configuration to 12 automatically provision an environment in AWS, or you can manually provision a 13 cluster. 14 15 ## Provision a Cluster in AWS 16 17 Nomad's [Terraform configuration](https://github.com/hashicorp/nomad/tree/master/terraform) 18 can be used to quickly provision a Spark-enabled Nomad environment in 19 AWS. The embedded [Spark example](https://github.com/hashicorp/nomad/tree/master/terraform/examples/spark) 20 provides for a quickstart experience that can be used in conjunction with 21 this guide. When you have a cluster up and running, you can proceed to 22 [Submitting applications](/guides/spark/submit.html). 23 24 ## Manually Provision a Cluster 25 26 To manually configure provision a cluster, see the Nomad 27 [Getting Started](/intro/getting-started/install.html) guide. There are two 28 basic prerequisites to using the Spark integration once you have a cluster up 29 and running: 30 31 - Access to a [Spark distribution](https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz) 32 built with Nomad support. This is required for the machine that will submit 33 applications as well as the Nomad tasks that will run the Spark executors. 34 35 - A Java runtime environment (JRE) for the submitting machine and the executors. 36 37 The subsections below explain further. 38 39 ### Configure the Submitting Machine 40 41 To run Spark applications on Nomad, the submitting machine must have access to 42 the cluster and have the Nomad-enabled Spark distribution installed. The code 43 snippets below walk through installing Java and Spark on Ubuntu: 44 45 Install Java: 46 47 ```shell 48 $ sudo add-apt-repository -y ppa:openjdk-r/ppa 49 $ sudo apt-get update 50 $ sudo apt-get install -y openjdk-8-jdk 51 $ JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::") 52 ``` 53 54 Install Spark: 55 56 57 ```shell 58 $ wget -O - https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz \ 59 | sudo tar xz -C /usr/local 60 $ export PATH=$PATH:/usr/local/spark-2.1.0-bin-nomad/bin 61 ``` 62 63 Export NOMAD_ADDR to point Spark to your Nomad cluster: 64 65 ```shell 66 $ export NOMAD_ADDR=http://NOMAD_SERVER_IP:4646 67 ``` 68 69 ### Executor Access to the Spark Distribution 70 71 When running on Nomad, Spark creates Nomad tasks to run executors for use by the 72 application's driver program. The executor tasks need access to a JRE, a Spark 73 distribution built with Nomad support, and (in cluster mode) the Spark 74 application itself. By default, Nomad will only place Spark executors on client 75 nodes that have the Java runtime installed (version 7 or higher). 76 77 In this example, the Spark distribution and the Spark application JAR file are 78 being pulled from Amazon S3: 79 80 ```shell 81 $ spark-submit \ 82 --class org.apache.spark.examples.JavaSparkPi \ 83 --master nomad \ 84 --deploy-mode cluster \ 85 --conf spark.executor.instances=4 \ 86 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz \ 87 https://s3.amazonaws.com/nomad-spark/spark-examples_2.11-2.1.0-SNAPSHOT.jar 100 88 ``` 89 90 ### Using a Docker Image 91 92 An alternative to installing the JRE on every client node is to set the 93 [spark.nomad.dockerImage](/guides/spark/configuration.html#spark-nomad-dockerimage) 94 configuration property to the URL of a Docker image that has the Java runtime 95 installed. If set, Nomad will use the `docker` driver to run Spark executors in 96 a container created from the image. The 97 [spark.nomad.dockerAuth](/guides/spark/configuration.html#spark-nomad-dockerauth) 98 configuration property can be set to a JSON object to provide Docker repository 99 authentication configuration. 100 101 When using a Docker image, both the Spark distribution and the application 102 itself can be included (in which case local URLs can be used for `spark-submit`). 103 104 Here, we include [spark.nomad.dockerImage](/guides/spark/configuration.html#spark-nomad-dockerimage) 105 and use local paths for 106 [spark.nomad.sparkDistribution](/guides/spark/configuration.html#spark-nomad-sparkdistribution) 107 and the application JAR file: 108 109 ```shell 110 $ spark-submit \ 111 --class org.apache.spark.examples.JavaSparkPi \ 112 --master nomad \ 113 --deploy-mode cluster \ 114 --conf spark.nomad.dockerImage=rcgenova/spark \ 115 --conf spark.executor.instances=4 \ 116 --conf spark.nomad.sparkDistribution=/spark-2.1.0-bin-nomad.tgz \ 117 /spark-examples_2.11-2.1.0-SNAPSHOT.jar 100 118 ``` 119 120 ## Next Steps 121 122 Learn how to [submit applications](/guides/spark/submit.html).