github.com/nathanielks/terraform@v0.6.1-0.20170509030759-13e1a62319dc/website/source/docs/providers/aws/r/emr_cluster.html.md (about) 1 --- 2 layout: "aws" 3 page_title: "AWS: aws_emr_cluster" 4 sidebar_current: "docs-aws-resource-emr-cluster" 5 description: |- 6 Provides an Elastic MapReduce Cluster 7 --- 8 9 # aws\_emr\_cluster 10 11 Provides an Elastic MapReduce Cluster, a web service that makes it easy to 12 process large amounts of data efficiently. See [Amazon Elastic MapReduce Documentation](https://aws.amazon.com/documentation/elastic-mapreduce/) 13 for more information. 14 15 ## Example Usage 16 17 ```hcl 18 resource "aws_emr_cluster" "emr-test-cluster" { 19 name = "emr-test-arn" 20 release_label = "emr-4.6.0" 21 applications = ["Spark"] 22 23 termination_protection = false 24 keep_job_flow_alive_when_no_steps = true 25 26 ec2_attributes { 27 subnet_id = "${aws_subnet.main.id}" 28 emr_managed_master_security_group = "${aws_security_group.sg.id}" 29 emr_managed_slave_security_group = "${aws_security_group.sg.id}" 30 instance_profile = "${aws_iam_instance_profile.emr_profile.arn}" 31 } 32 33 master_instance_type = "m3.xlarge" 34 core_instance_type = "m3.xlarge" 35 core_instance_count = 1 36 37 tags { 38 role = "rolename" 39 env = "env" 40 } 41 42 bootstrap_action { 43 path = "s3://elasticmapreduce/bootstrap-actions/run-if" 44 name = "runif" 45 args = ["instance.isMaster=true", "echo running on master node"] 46 } 47 48 configurations = "test-fixtures/emr_configurations.json" 49 50 service_role = "${aws_iam_role.iam_emr_service_role.arn}" 51 } 52 ``` 53 54 The `aws_emr_cluster` resource typically requires two IAM roles, one for the EMR Cluster 55 to use as a service, and another to place on your Cluster Instances to interact 56 with AWS from those instances. The suggested role policy template for the EMR service is `AmazonElasticMapReduceRole`, 57 and `AmazonElasticMapReduceforEC2Role` for the EC2 profile. See the [Getting 58 Started](https://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-gs-launch-sample-cluster.html) 59 guide for more information on these IAM roles. There is also a fully-bootable 60 example Terraform configuration at the bottom of this page. 61 62 ## Argument Reference 63 64 The following arguments are supported: 65 66 * `name` - (Required) The name of the job flow 67 * `release_label` - (Required) The release label for the Amazon EMR release 68 * `master_instance_type` - (Required) The EC2 instance type of the master node 69 * `service_role` - (Required) IAM role that will be assumed by the Amazon EMR service to access AWS resources 70 * `security_configuration` - (Optional) The security configuration name to attach to the EMR cluster. Only valid for EMR clusters with `release_label` 4.8.0 or greater 71 * `core_instance_type` - (Optional) The EC2 instance type of the slave nodes 72 * `core_instance_count` - (Optional) Number of Amazon EC2 instances used to execute the job flow. EMR will use one node as the cluster's master node and use the remainder of the nodes (`core_instance_count`-1) as core nodes. Default `1` 73 * `log_uri` - (Optional) S3 bucket to write the log files of the job flow. If a value 74 is not provided, logs are not created 75 * `applications` - (Optional) A list of applications for the cluster. Valid values are: `Flink`, `Hadoop`, `Hive`, `Mahout`, `Pig`, and `Spark`. Case insensitive 76 * `termination_protection` - (Optional) Switch on/off termination protection (default is off) 77 * `keep_job_flow_alive_when_no_steps` - (Optional) Switch on/off run cluster with no steps or when all steps are complete (default is on) 78 * `ec2_attributes` - (Optional) Attributes for the EC2 instances running the job 79 flow. Defined below 80 * `bootstrap_action` - (Optional) List of bootstrap actions that will be run before Hadoop is started on 81 the cluster nodes. Defined below 82 * `configurations` - (Optional) List of configurations supplied for the EMR cluster you are creating 83 * `visible_to_all_users` - (Optional) Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. Default `true` 84 * `autoscaling_role` - (Optional) An IAM role for automatic scaling policies. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate EC2 instances in an instance group. 85 * `tags` - (Optional) list of tags to apply to the EMR Cluster 86 87 88 ## ec2\_attributes 89 90 Attributes for the Amazon EC2 instances running the job flow 91 92 * `key_name` - (Optional) Amazon EC2 key pair that can be used to ssh to the master 93 node as the user called `hadoop` 94 * `subnet_id` - (Optional) VPC subnet id where you want the job flow to launch. 95 Cannot specify the `cc1.4xlarge` instance type for nodes of a job flow launched in a Amazon VPC 96 * `additional_master_security_groups` - (Optional) List of additional Amazon EC2 security group IDs for the master node 97 * `additional_slave_security_groups` - (Optional) List of additional Amazon EC2 security group IDs for the slave nodes 98 * `emr_managed_master_security_group` - (Optional) Identifier of the Amazon EC2 security group for the master node 99 * `emr_managed_slave_security_group` - (Optional) Identifier of the Amazon EC2 security group for the slave nodes 100 * `service_access_security_group` - (Optional) Identifier of the Amazon EC2 service-access security group - required when the cluster runs on a private subnet 101 * `instance_profile` - (Required) Instance Profile for EC2 instances of the cluster assume this role 102 103 104 ## bootstrap\_action 105 106 * `name` - (Required) Name of the bootstrap action 107 * `path` - (Required) Location of the script to run during a bootstrap action. Can be either a location in Amazon S3 or on a local file system 108 * `args` - (Optional) List of command line arguments to pass to the bootstrap action script 109 110 ## Attributes Reference 111 112 The following attributes are exported: 113 114 * `id` - The ID of the EMR Cluster 115 * `name` - The name of the cluster. 116 * `release_label` - The release label for the Amazon EMR release. 117 * `master_instance_type` - The EC2 instance type of the master node. 118 * `master_public_dns` - The public DNS name of the master EC2 instance. 119 * `core_instance_type` - The EC2 instance type of the slave nodes. 120 * `core_instance_count` The number of slave nodes, i.e. EC2 instance nodes. 121 * `log_uri` - The path to the Amazon S3 location where logs for this cluster are stored. 122 * `applications` - The applications installed on this cluster. 123 * `ec2_attributes` - Provides information about the EC2 instances in a cluster grouped by category: key name, subnet ID, IAM instance profile, and so on. 124 * `bootstrap_action` - A list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. 125 * `configurations` - The list of Configurations supplied to the EMR cluster. 126 * `service_role` - The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf. 127 * `visible_to_all_users` - Indicates whether the job flow is visible to all IAM users of the AWS account associated with the job flow. 128 * `tags` - The list of tags associated with a cluster. 129 130 131 ## Example bootable config 132 133 **NOTE:** This configuration demonstrates a minimal configuration needed to 134 boot an example EMR Cluster. It is not meant to display best practices. Please 135 use at your own risk. 136 137 138 ``` 139 provider "aws" { 140 region = "us-west-2" 141 } 142 143 resource "aws_emr_cluster" "tf-test-cluster" { 144 name = "emr-test-arn" 145 release_label = "emr-4.6.0" 146 applications = ["Spark"] 147 148 ec2_attributes { 149 subnet_id = "${aws_subnet.main.id}" 150 emr_managed_master_security_group = "${aws_security_group.allow_all.id}" 151 emr_managed_slave_security_group = "${aws_security_group.allow_all.id}" 152 instance_profile = "${aws_iam_instance_profile.emr_profile.arn}" 153 } 154 155 master_instance_type = "m3.xlarge" 156 core_instance_type = "m3.xlarge" 157 core_instance_count = 1 158 159 tags { 160 role = "rolename" 161 dns_zone = "env_zone" 162 env = "env" 163 name = "name-env" 164 } 165 166 bootstrap_action { 167 path = "s3://elasticmapreduce/bootstrap-actions/run-if" 168 name = "runif" 169 args = ["instance.isMaster=true", "echo running on master node"] 170 } 171 172 configurations = "test-fixtures/emr_configurations.json" 173 174 service_role = "${aws_iam_role.iam_emr_service_role.arn}" 175 } 176 177 resource "aws_security_group" "allow_all" { 178 name = "allow_all" 179 description = "Allow all inbound traffic" 180 vpc_id = "${aws_vpc.main.id}" 181 182 ingress { 183 from_port = 0 184 to_port = 0 185 protocol = "-1" 186 cidr_blocks = ["0.0.0.0/0"] 187 } 188 189 egress { 190 from_port = 0 191 to_port = 0 192 protocol = "-1" 193 cidr_blocks = ["0.0.0.0/0"] 194 } 195 196 depends_on = ["aws_subnet.main"] 197 198 lifecycle { 199 ignore_changes = ["ingress", "egress"] 200 } 201 202 tags { 203 name = "emr_test" 204 } 205 } 206 207 resource "aws_vpc" "main" { 208 cidr_block = "168.31.0.0/16" 209 enable_dns_hostnames = true 210 211 tags { 212 name = "emr_test" 213 } 214 } 215 216 resource "aws_subnet" "main" { 217 vpc_id = "${aws_vpc.main.id}" 218 cidr_block = "168.31.0.0/20" 219 220 tags { 221 name = "emr_test" 222 } 223 } 224 225 resource "aws_internet_gateway" "gw" { 226 vpc_id = "${aws_vpc.main.id}" 227 } 228 229 resource "aws_route_table" "r" { 230 vpc_id = "${aws_vpc.main.id}" 231 232 route { 233 cidr_block = "0.0.0.0/0" 234 gateway_id = "${aws_internet_gateway.gw.id}" 235 } 236 } 237 238 resource "aws_main_route_table_association" "a" { 239 vpc_id = "${aws_vpc.main.id}" 240 route_table_id = "${aws_route_table.r.id}" 241 } 242 243 ### 244 245 # IAM Role setups 246 247 ### 248 249 # IAM role for EMR Service 250 resource "aws_iam_role" "iam_emr_service_role" { 251 name = "iam_emr_service_role" 252 253 assume_role_policy = <<EOF 254 { 255 "Version": "2008-10-17", 256 "Statement": [ 257 { 258 "Sid": "", 259 "Effect": "Allow", 260 "Principal": { 261 "Service": "elasticmapreduce.amazonaws.com" 262 }, 263 "Action": "sts:AssumeRole" 264 } 265 ] 266 } 267 EOF 268 } 269 270 resource "aws_iam_role_policy" "iam_emr_service_policy" { 271 name = "iam_emr_service_policy" 272 role = "${aws_iam_role.iam_emr_service_role.id}" 273 274 policy = <<EOF 275 { 276 "Version": "2012-10-17", 277 "Statement": [{ 278 "Effect": "Allow", 279 "Resource": "*", 280 "Action": [ 281 "ec2:AuthorizeSecurityGroupEgress", 282 "ec2:AuthorizeSecurityGroupIngress", 283 "ec2:CancelSpotInstanceRequests", 284 "ec2:CreateNetworkInterface", 285 "ec2:CreateSecurityGroup", 286 "ec2:CreateTags", 287 "ec2:DeleteNetworkInterface", 288 "ec2:DeleteSecurityGroup", 289 "ec2:DeleteTags", 290 "ec2:DescribeAvailabilityZones", 291 "ec2:DescribeAccountAttributes", 292 "ec2:DescribeDhcpOptions", 293 "ec2:DescribeInstanceStatus", 294 "ec2:DescribeInstances", 295 "ec2:DescribeKeyPairs", 296 "ec2:DescribeNetworkAcls", 297 "ec2:DescribeNetworkInterfaces", 298 "ec2:DescribePrefixLists", 299 "ec2:DescribeRouteTables", 300 "ec2:DescribeSecurityGroups", 301 "ec2:DescribeSpotInstanceRequests", 302 "ec2:DescribeSpotPriceHistory", 303 "ec2:DescribeSubnets", 304 "ec2:DescribeVpcAttribute", 305 "ec2:DescribeVpcEndpoints", 306 "ec2:DescribeVpcEndpointServices", 307 "ec2:DescribeVpcs", 308 "ec2:DetachNetworkInterface", 309 "ec2:ModifyImageAttribute", 310 "ec2:ModifyInstanceAttribute", 311 "ec2:RequestSpotInstances", 312 "ec2:RevokeSecurityGroupEgress", 313 "ec2:RunInstances", 314 "ec2:TerminateInstances", 315 "ec2:DeleteVolume", 316 "ec2:DescribeVolumeStatus", 317 "ec2:DescribeVolumes", 318 "ec2:DetachVolume", 319 "iam:GetRole", 320 "iam:GetRolePolicy", 321 "iam:ListInstanceProfiles", 322 "iam:ListRolePolicies", 323 "iam:PassRole", 324 "s3:CreateBucket", 325 "s3:Get*", 326 "s3:List*", 327 "sdb:BatchPutAttributes", 328 "sdb:Select", 329 "sqs:CreateQueue", 330 "sqs:Delete*", 331 "sqs:GetQueue*", 332 "sqs:PurgeQueue", 333 "sqs:ReceiveMessage" 334 ] 335 }] 336 } 337 EOF 338 } 339 340 # IAM Role for EC2 Instance Profile 341 resource "aws_iam_role" "iam_emr_profile_role" { 342 name = "iam_emr_profile_role" 343 344 assume_role_policy = <<EOF 345 { 346 "Version": "2008-10-17", 347 "Statement": [ 348 { 349 "Sid": "", 350 "Effect": "Allow", 351 "Principal": { 352 "Service": "ec2.amazonaws.com" 353 }, 354 "Action": "sts:AssumeRole" 355 } 356 ] 357 } 358 EOF 359 } 360 361 resource "aws_iam_instance_profile" "emr_profile" { 362 name = "emr_profile" 363 roles = ["${aws_iam_role.iam_emr_profile_role.name}"] 364 } 365 366 resource "aws_iam_role_policy" "iam_emr_profile_policy" { 367 name = "iam_emr_profile_policy" 368 role = "${aws_iam_role.iam_emr_profile_role.id}" 369 370 policy = <<EOF 371 { 372 "Version": "2012-10-17", 373 "Statement": [{ 374 "Effect": "Allow", 375 "Resource": "*", 376 "Action": [ 377 "cloudwatch:*", 378 "dynamodb:*", 379 "ec2:Describe*", 380 "elasticmapreduce:Describe*", 381 "elasticmapreduce:ListBootstrapActions", 382 "elasticmapreduce:ListClusters", 383 "elasticmapreduce:ListInstanceGroups", 384 "elasticmapreduce:ListInstances", 385 "elasticmapreduce:ListSteps", 386 "kinesis:CreateStream", 387 "kinesis:DeleteStream", 388 "kinesis:DescribeStream", 389 "kinesis:GetRecords", 390 "kinesis:GetShardIterator", 391 "kinesis:MergeShards", 392 "kinesis:PutRecord", 393 "kinesis:SplitShard", 394 "rds:Describe*", 395 "s3:*", 396 "sdb:*", 397 "sns:*", 398 "sqs:*" 399 ] 400 }] 401 } 402 EOF 403 } 404 ```