github.com/smintz/nomad@v0.8.3/website/source/guides/spark/submit.html.md (about) 1 --- 2 layout: "guides" 3 page_title: "Apache Spark Integration - Submitting Applications" 4 sidebar_current: "guides-spark-submit" 5 description: |- 6 Learn how to submit Spark jobs that run on a Nomad cluster. 7 --- 8 9 # Submitting Applications 10 11 The [`spark-submit`](https://spark.apache.org/docs/latest/submitting-applications.html) 12 script located in Spark’s `bin` directory is used to launch applications on a 13 cluster. Spark applications can be submitted to Nomad in either `client` mode 14 or `cluster` mode. 15 16 ## Client Mode 17 18 In `client` mode, the application driver runs on a machine that is not 19 necessarily in the Nomad cluster. The driver’s `SparkContext` creates a Nomad 20 job to run Spark executors. The executors connect to the driver and run Spark 21 tasks on behalf of the application. When the driver’s SparkContext is stopped, 22 the executors are shut down. Note that the machine running the driver or 23 `spark-submit` needs to be reachable from the Nomad clients so that the 24 executors can connect to it. 25 26 In `client` mode, application resources need to start out present on the 27 submitting machine, so JAR files (both the primary JAR and those added with the 28 `--jars` option) can not be specified using `http:` or `https:` URLs. You can 29 either use files on the submitting machine (either as raw paths or `file:` URLs) 30 , or use `local:` URLs to indicate that the files are independently available on 31 both the submitting machine and all of the Nomad clients where the executors 32 might run. 33 34 In this mode, the `spark-submit` invocation doesn’t return until the application 35 has finished running, and killing the `spark-submit` process kills the 36 application. 37 38 In this example, the `spark-submit` command is used to run the `SparkPi` sample 39 application: 40 41 ```shell 42 $ spark-submit --class org.apache.spark.examples.SparkPi \ 43 --master nomad \ 44 --conf spark.nomad.sparkDistribution=https://s3.amazonaws.com/nomad-spark/spark-2.1.0-bin-nomad.tgz \ 45 lib/spark-examples*.jar \ 46 10 47 ``` 48 49 ## Cluster Mode 50 51 In cluster mode, the `spark-submit` process creates a Nomad job to run the Spark 52 application driver itself. The driver’s `SparkContext` then adds Spark executors 53 to the Nomad job. The executors connect to the driver and run Spark tasks on 54 behalf of the application. When the driver’s `SparkContext` is stopped, the 55 executors are shut down. 56 57 In cluster mode, application resources need to be hosted somewhere accessible 58 to the Nomad cluster, so JARs (both the primary JAR and those added with the 59 `--jars` option) can’t be specified using raw paths or `file:` URLs. You can either 60 use `http:` or `https:` URLs, or use `local:` URLs to indicate that the files are 61 independently available on all of the Nomad clients where the driver and executors 62 might run. 63 64 Note that in cluster mode, the Nomad master URL needs to be routable from both 65 the submitting machine and the Nomad client node that runs the driver. If the 66 Nomad cluster is integrated with Consul, you may want to use a DNS name for the 67 Nomad service served by Consul. 68 69 For example, to submit an application in cluster mode: 70 71 ```shell 72 $ spark-submit --class org.apache.spark.examples.SparkPi \ 73 --master nomad \ 74 --deploy-mode cluster \ 75 --conf spark.nomad.sparkDistribution=http://example.com/spark.tgz \ 76 http://example.com/spark-examples.jar \ 77 10 78 ``` 79 80 ## Next Steps 81 82 Learn how to [customize applications](/guides/spark/customizing.html).