github.com/danp/terraform@v0.9.5-0.20170426144147-39d740081351/website/source/docs/providers/aws/r/emr_cluster.html.md (about) 1 --- 2 layout: "aws" 3 page_title: "AWS: aws_emr_cluster" 4 sidebar_current: "docs-aws-resource-emr-cluster" 5 description: |- 6 Provides an Elastic MapReduce Cluster 7 --- 8 9 # aws\_emr\_cluster 10 11 Provides an Elastic MapReduce Cluster, a web service that makes it easy to 12 process large amounts of data efficiently. See [Amazon Elastic MapReduce Documentation](https://aws.amazon.com/documentation/elastic-mapreduce/) 13 for more information. 14 15 ## Example Usage 16 17 ```hcl 18 resource "aws_emr_cluster" "emr-test-cluster" { 19 name = "emr-test-arn" 20 release_label = "emr-4.6.0" 21 applications = ["Spark"] 22 23 termination_protection = false 24 keep_job_flow_alive_when_no_steps = true 25 26 ec2_attributes { 27 subnet_id = "${aws_subnet.main.id}" 28 emr_managed_master_security_group = "${aws_security_group.sg.id}" 29 emr_managed_slave_security_group = "${aws_security_group.sg.id}" 30 instance_profile = "${aws_iam_instance_profile.emr_profile.arn}" 31 } 32 33 master_instance_type = "m3.xlarge" 34 core_instance_type = "m3.xlarge" 35 core_instance_count = 1 36 37 tags { 38 role = "rolename" 39 env = "env" 40 } 41 42 bootstrap_action { 43 path = "s3://elasticmapreduce/bootstrap-actions/run-if" 44 name = "runif" 45 args = ["instance.isMaster=true", "echo running on master node"] 46 } 47 48 configurations = "test-fixtures/emr_configurations.json" 49 50 service_role = "${aws_iam_role.iam_emr_service_role.arn}" 51 } 52 ``` 53 54 The `aws_emr_cluster` resource typically requires two IAM roles, one for the EMR Cluster 55 to use as a service, and another to place on your Cluster Instances to interact 56 with AWS from those instances. The suggested role policy template for the EMR service is `AmazonElasticMapReduceRole`, 57 and `AmazonElasticMapReduceforEC2Role` for the EC2 profile. See the [Getting 58 Started](https://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-gs-launch-sample-cluster.html) 59 guide for more information on these IAM roles. There is also a fully-bootable 60 example Terraform configuration at the bottom of this page. 61 62 ## Argument Reference 63 64 The following arguments are supported: 65 66 * `name` - (Required) The name of the job flow 67 * `release_label` - (Required) The release label for the Amazon EMR release 68 * `master_instance_type` - (Required) The EC2 instance type of the master node 69 * `service_role` - (Required) IAM role that will be assumed by the Amazon EMR service to access AWS resources 70 * `core_instance_type` - (Optional) The EC2 instance type of the slave nodes 71 * `core_instance_count` - (Optional) Number of Amazon EC2 instances used to execute the job flow. EMR will use one node as the cluster's master node and use the remainder of the nodes (`core_instance_count`-1) as core nodes. Default `1` 72 * `log_uri` - (Optional) S3 bucket to write the log files of the job flow. If a value 73 is not provided, logs are not created 74 * `applications` - (Optional) A list of applications for the cluster. Valid values are: `Flink`, `Hadoop`, `Hive`, `Mahout`, `Pig`, and `Spark`. Case insensitive 75 * `termination_protection` - (Optional) Switch on/off termination protection (default is off) 76 * `keep_job_flow_alive_when_no_steps` - (Optional) Switch on/off run cluster with no steps or when all steps are complete (default is on) 77 * `ec2_attributes` - (Optional) Attributes for the EC2 instances running the job 78 flow. Defined below 79 * `bootstrap_action` - (Optional) List of bootstrap actions that will be run before Hadoop is started on 80 the cluster nodes. Defined below 81 * `configurations` - (Optional) List of configurations supplied for the EMR cluster you are creating 82 * `visible_to_all_users` - (Optional) Whether the job flow is visible to all IAM users of the AWS account associated with the job flow. Default `true` 83 * `autoscaling_role` - (Optional) An IAM role for automatic scaling policies. The IAM role provides permissions that the automatic scaling feature requires to launch and terminate EC2 instances in an instance group. 84 * `tags` - (Optional) list of tags to apply to the EMR Cluster 85 86 87 ## ec2\_attributes 88 89 Attributes for the Amazon EC2 instances running the job flow 90 91 * `key_name` - (Optional) Amazon EC2 key pair that can be used to ssh to the master 92 node as the user called `hadoop` 93 * `subnet_id` - (Optional) VPC subnet id where you want the job flow to launch. 94 Cannot specify the `cc1.4xlarge` instance type for nodes of a job flow launched in a Amazon VPC 95 * `additional_master_security_groups` - (Optional) List of additional Amazon EC2 security group IDs for the master node 96 * `additional_slave_security_groups` - (Optional) List of additional Amazon EC2 security group IDs for the slave nodes 97 * `emr_managed_master_security_group` - (Optional) Identifier of the Amazon EC2 security group for the master node 98 * `emr_managed_slave_security_group` - (Optional) Identifier of the Amazon EC2 security group for the slave nodes 99 * `service_access_security_group` - (Optional) Identifier of the Amazon EC2 service-access security group - required when the cluster runs on a private subnet 100 * `instance_profile` - (Required) Instance Profile for EC2 instances of the cluster assume this role 101 102 103 ## bootstrap\_action 104 105 * `name` - (Required) Name of the bootstrap action 106 * `path` - (Required) Location of the script to run during a bootstrap action. Can be either a location in Amazon S3 or on a local file system 107 * `args` - (Optional) List of command line arguments to pass to the bootstrap action script 108 109 ## Attributes Reference 110 111 The following attributes are exported: 112 113 * `id` - The ID of the EMR Cluster 114 * `name` - The name of the cluster. 115 * `release_label` - The release label for the Amazon EMR release. 116 * `master_instance_type` - The EC2 instance type of the master node. 117 * `master_public_dns` - The public DNS name of the master EC2 instance. 118 * `core_instance_type` - The EC2 instance type of the slave nodes. 119 * `core_instance_count` The number of slave nodes, i.e. EC2 instance nodes. 120 * `log_uri` - The path to the Amazon S3 location where logs for this cluster are stored. 121 * `applications` - The applications installed on this cluster. 122 * `ec2_attributes` - Provides information about the EC2 instances in a cluster grouped by category: key name, subnet ID, IAM instance profile, and so on. 123 * `bootstrap_action` - A list of bootstrap actions that will be run before Hadoop is started on the cluster nodes. 124 * `configurations` - The list of Configurations supplied to the EMR cluster. 125 * `service_role` - The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf. 126 * `visible_to_all_users` - Indicates whether the job flow is visible to all IAM users of the AWS account associated with the job flow. 127 * `tags` - The list of tags associated with a cluster. 128 129 130 ## Example bootable config 131 132 **NOTE:** This configuration demonstrates a minimal configuration needed to 133 boot an example EMR Cluster. It is not meant to display best practices. Please 134 use at your own risk. 135 136 137 ``` 138 provider "aws" { 139 region = "us-west-2" 140 } 141 142 resource "aws_emr_cluster" "tf-test-cluster" { 143 name = "emr-test-arn" 144 release_label = "emr-4.6.0" 145 applications = ["Spark"] 146 147 ec2_attributes { 148 subnet_id = "${aws_subnet.main.id}" 149 emr_managed_master_security_group = "${aws_security_group.allow_all.id}" 150 emr_managed_slave_security_group = "${aws_security_group.allow_all.id}" 151 instance_profile = "${aws_iam_instance_profile.emr_profile.arn}" 152 } 153 154 master_instance_type = "m3.xlarge" 155 core_instance_type = "m3.xlarge" 156 core_instance_count = 1 157 158 tags { 159 role = "rolename" 160 dns_zone = "env_zone" 161 env = "env" 162 name = "name-env" 163 } 164 165 bootstrap_action { 166 path = "s3://elasticmapreduce/bootstrap-actions/run-if" 167 name = "runif" 168 args = ["instance.isMaster=true", "echo running on master node"] 169 } 170 171 configurations = "test-fixtures/emr_configurations.json" 172 173 service_role = "${aws_iam_role.iam_emr_service_role.arn}" 174 } 175 176 resource "aws_security_group" "allow_all" { 177 name = "allow_all" 178 description = "Allow all inbound traffic" 179 vpc_id = "${aws_vpc.main.id}" 180 181 ingress { 182 from_port = 0 183 to_port = 0 184 protocol = "-1" 185 cidr_blocks = ["0.0.0.0/0"] 186 } 187 188 egress { 189 from_port = 0 190 to_port = 0 191 protocol = "-1" 192 cidr_blocks = ["0.0.0.0/0"] 193 } 194 195 depends_on = ["aws_subnet.main"] 196 197 lifecycle { 198 ignore_changes = ["ingress", "egress"] 199 } 200 201 tags { 202 name = "emr_test" 203 } 204 } 205 206 resource "aws_vpc" "main" { 207 cidr_block = "168.31.0.0/16" 208 enable_dns_hostnames = true 209 210 tags { 211 name = "emr_test" 212 } 213 } 214 215 resource "aws_subnet" "main" { 216 vpc_id = "${aws_vpc.main.id}" 217 cidr_block = "168.31.0.0/20" 218 219 tags { 220 name = "emr_test" 221 } 222 } 223 224 resource "aws_internet_gateway" "gw" { 225 vpc_id = "${aws_vpc.main.id}" 226 } 227 228 resource "aws_route_table" "r" { 229 vpc_id = "${aws_vpc.main.id}" 230 231 route { 232 cidr_block = "0.0.0.0/0" 233 gateway_id = "${aws_internet_gateway.gw.id}" 234 } 235 } 236 237 resource "aws_main_route_table_association" "a" { 238 vpc_id = "${aws_vpc.main.id}" 239 route_table_id = "${aws_route_table.r.id}" 240 } 241 242 ### 243 244 # IAM Role setups 245 246 ### 247 248 # IAM role for EMR Service 249 resource "aws_iam_role" "iam_emr_service_role" { 250 name = "iam_emr_service_role" 251 252 assume_role_policy = <<EOF 253 { 254 "Version": "2008-10-17", 255 "Statement": [ 256 { 257 "Sid": "", 258 "Effect": "Allow", 259 "Principal": { 260 "Service": "elasticmapreduce.amazonaws.com" 261 }, 262 "Action": "sts:AssumeRole" 263 } 264 ] 265 } 266 EOF 267 } 268 269 resource "aws_iam_role_policy" "iam_emr_service_policy" { 270 name = "iam_emr_service_policy" 271 role = "${aws_iam_role.iam_emr_service_role.id}" 272 273 policy = <<EOF 274 { 275 "Version": "2012-10-17", 276 "Statement": [{ 277 "Effect": "Allow", 278 "Resource": "*", 279 "Action": [ 280 "ec2:AuthorizeSecurityGroupEgress", 281 "ec2:AuthorizeSecurityGroupIngress", 282 "ec2:CancelSpotInstanceRequests", 283 "ec2:CreateNetworkInterface", 284 "ec2:CreateSecurityGroup", 285 "ec2:CreateTags", 286 "ec2:DeleteNetworkInterface", 287 "ec2:DeleteSecurityGroup", 288 "ec2:DeleteTags", 289 "ec2:DescribeAvailabilityZones", 290 "ec2:DescribeAccountAttributes", 291 "ec2:DescribeDhcpOptions", 292 "ec2:DescribeInstanceStatus", 293 "ec2:DescribeInstances", 294 "ec2:DescribeKeyPairs", 295 "ec2:DescribeNetworkAcls", 296 "ec2:DescribeNetworkInterfaces", 297 "ec2:DescribePrefixLists", 298 "ec2:DescribeRouteTables", 299 "ec2:DescribeSecurityGroups", 300 "ec2:DescribeSpotInstanceRequests", 301 "ec2:DescribeSpotPriceHistory", 302 "ec2:DescribeSubnets", 303 "ec2:DescribeVpcAttribute", 304 "ec2:DescribeVpcEndpoints", 305 "ec2:DescribeVpcEndpointServices", 306 "ec2:DescribeVpcs", 307 "ec2:DetachNetworkInterface", 308 "ec2:ModifyImageAttribute", 309 "ec2:ModifyInstanceAttribute", 310 "ec2:RequestSpotInstances", 311 "ec2:RevokeSecurityGroupEgress", 312 "ec2:RunInstances", 313 "ec2:TerminateInstances", 314 "ec2:DeleteVolume", 315 "ec2:DescribeVolumeStatus", 316 "ec2:DescribeVolumes", 317 "ec2:DetachVolume", 318 "iam:GetRole", 319 "iam:GetRolePolicy", 320 "iam:ListInstanceProfiles", 321 "iam:ListRolePolicies", 322 "iam:PassRole", 323 "s3:CreateBucket", 324 "s3:Get*", 325 "s3:List*", 326 "sdb:BatchPutAttributes", 327 "sdb:Select", 328 "sqs:CreateQueue", 329 "sqs:Delete*", 330 "sqs:GetQueue*", 331 "sqs:PurgeQueue", 332 "sqs:ReceiveMessage" 333 ] 334 }] 335 } 336 EOF 337 } 338 339 # IAM Role for EC2 Instance Profile 340 resource "aws_iam_role" "iam_emr_profile_role" { 341 name = "iam_emr_profile_role" 342 343 assume_role_policy = <<EOF 344 { 345 "Version": "2008-10-17", 346 "Statement": [ 347 { 348 "Sid": "", 349 "Effect": "Allow", 350 "Principal": { 351 "Service": "ec2.amazonaws.com" 352 }, 353 "Action": "sts:AssumeRole" 354 } 355 ] 356 } 357 EOF 358 } 359 360 resource "aws_iam_instance_profile" "emr_profile" { 361 name = "emr_profile" 362 roles = ["${aws_iam_role.iam_emr_profile_role.name}"] 363 } 364 365 resource "aws_iam_role_policy" "iam_emr_profile_policy" { 366 name = "iam_emr_profile_policy" 367 role = "${aws_iam_role.iam_emr_profile_role.id}" 368 369 policy = <<EOF 370 { 371 "Version": "2012-10-17", 372 "Statement": [{ 373 "Effect": "Allow", 374 "Resource": "*", 375 "Action": [ 376 "cloudwatch:*", 377 "dynamodb:*", 378 "ec2:Describe*", 379 "elasticmapreduce:Describe*", 380 "elasticmapreduce:ListBootstrapActions", 381 "elasticmapreduce:ListClusters", 382 "elasticmapreduce:ListInstanceGroups", 383 "elasticmapreduce:ListInstances", 384 "elasticmapreduce:ListSteps", 385 "kinesis:CreateStream", 386 "kinesis:DeleteStream", 387 "kinesis:DescribeStream", 388 "kinesis:GetRecords", 389 "kinesis:GetShardIterator", 390 "kinesis:MergeShards", 391 "kinesis:PutRecord", 392 "kinesis:SplitShard", 393 "rds:Describe*", 394 "s3:*", 395 "sdb:*", 396 "sns:*", 397 "sqs:*" 398 ] 399 }] 400 } 401 EOF 402 } 403 ```