When Amazon ECS was first released back in April 2015, it left a lot to be desired: tasks and services could only be run on a cluster you managed, clusters had limited support for limited support for autoscaling and spot instances, and so on. Amazon filled these gaps over the next couple of years with support for scaling policies, great blog posts on integrating spot fleets with ECS, and now even a wizard cluster builder for EC2-only clusters. But even while the tools improved, you were still managing cluster capacity, which is a pain. ECS really came into its own with the release of Fargate, which allows you to run ECS tasks on Amazon-managed “virtual capacity” in your cluster so you could finally stop counting servers.
While Fargate is great, it still costs more than running on Spot Fleets, and it can take a few minutes to start tasks. This isn’t a problem for most static workloads, but the delay in particular can be irritating for dynamic loads, especially when users are waiting for containers to start and stop. As a result, my team has started keeping a reasonably-sized EC2 Spot Fleet cluster warm, and then using Fargate for overflow. This provides the best possible user experience: most things start quickly and run cheaply, but nothing fails due to insufficient capacity.
The trick with this configuration, though, is you need to configure your Spot Fleet so that the same tasks can run with LaunchType
set to EC2
or FARGATE
. It’s not trivial; here are some of our lessons learned.
- Start by reading this article
- Both the EC2 and ECS Spot Fleet Builders allows you to export your configuration as a CloudFormation template. The ECS output is an excellent starting point for provisioning Spot Fleet cluster capacity using CloudFormation.
- Fargate requires tasks to use
awsvpc
network mode, so we’ll want our cluster to accept tasks run inawsvpc
network mode, too. This requires you to run a fairly modern version of the ECS agent, so pick your AMI from this list of the latest releases for each region. - This “task networking mode” attaches an Elastic Networking Interface to each task, and each cluster member will only support a limited number of interfaces. Functionally, this is one more resource you’ll need to track, just like memory and CPU. If you see a failure like
RESOURCE:ENI
, you’ll know you’ve tried to schedule a task on a box with insufficient networking slots. You’ll need to update either your placement strategy or your cluster. - Amazon manages these ENIs and assigns them private IP addresses automatically. That means you’ll need to run them in a private subnet in a VPC. Functionally, this means you’ll need to perform the following steps for the single VPC in which you want your EC2 instances to live:
- Create a new “public subnet” with “Assign Public IP” set to true
- Create a new “private subnet” with “Assign Public IP” set to false
- Create a new NAT Gateway located in the public subnet
- Create a routing table that routes non-local content through the NAT Gateway
- Assign the routing table to the private subnet
- Because these instances are in a private subnet, you’ll need to use a “bastion host” to SSH into these hosts to troubleshoot. Essentially, just create a new host in the public subnet from above, SSH into that box, and SSH into your ECS instance from there.
- Bear in mind that VPC Security Groups include rules for Outgoing traffic as well as Incoming. You’ll need to attach a security group to your ECS instances that not only lets you access incoming ports on your host — don’t forget SSH! — and outgoing ports for your app as well as the ECS agent. I found this StackOverflow thread to be useful. If you’re having trouble and you suspect networking, it’s worth temporarily allowing all traffic in and out to confirm the source of issues; if that fixes the problem, you’ll know it’s related to networking, and likely even ports.
Additionally, here is the CloudFormation template we use to set up new stacks. It’s edited from the ECS Spot Fleet wizard builder mentioned above. It allows on-demand ASG cluster building as well as Spot Fleet cluster building.
AWSTemplateFormatVersion: '2010-09-09' Description: > AWS CloudFormation template to create a new VPC or use an existing VPC for ECS deployment in Create Cluster Wizard. Requires exactly 4 Instance Types for a Spot Request. Parameters: EcsClusterName: Type: String Description: > Specifies the ECS Cluster Name with which the resources would be associated Default: default EcsAmiId: Type: String Description: Specifies the AMI ID for your container instances. EcsInstanceType: Type: CommaDelimitedList Description: > Specifies the EC2 instance type for your container instances. Defaults to m4.large Default: m4.large ConstraintDescription: must be a valid EC2 instance type. KeyName: Type: String Description: > Optional - Specifies the name of an existing Amazon EC2 key pair to enable SSH access to the EC2 instances in your cluster. Default: '' VpcId: Type: String Description: > Optional - Specifies the ID of an existing VPC in which to launch your container instances. If you specify a VPC ID, you must specify a list of existing subnets in that VPC. If you do not specify a VPC ID, a new VPC is created with atleast 1 subnet. Default: '' AllowedPattern: "^(?:vpc-[0-9a-f]{8}|)$" ConstraintDescription: > VPC Id must begin with 'vpc-' or leave blank to have a new VPC created SubnetIds: Type: CommaDelimitedList Description: > Optional - Specifies the Comma separated list of existing VPC Subnet Ids where ECS instances will run Default: '' SecurityGroupId: Type: String Description: > Optional - Specifies the Security Group Id of an existing Security Group. Leave blank to have a new Security Group created Default: '' AsgMaxSize: Type: Number Description: > Specifies the number of instances to launch and register to the cluster. Defaults to 1. Default: '1' IamRoleInstanceProfile: Type: String Description: > Specifies the Name or the Amazon Resource Name (ARN) of the instance profile associated with the IAM role for the instance EcsEndpoint: Type: String Description: > Optional - Specifies the ECS Endpoint for the ECS Agent to connect to Default: '' EbsVolumeSize: Type: Number Description: > Optional - Specifies the Size in GBs, of the newly created Amazon Elastic Block Store (Amazon EBS) volume Default: '0' EbsVolumeType: Type: String Description: Optional - Specifies the Type of (Amazon EBS) volume Default: '' AllowedValues: - '' - standard - io1 - gp2 - sc1 - st1 ConstraintDescription: Must be a valid EC2 volume type. DeviceName: Type: String Description: Optional - Specifies the device mapping for the Volume UseSpot: Type: String Default: 'false' IamSpotFleetRoleArn: Type: String Default: '' SpotPrice: Type: String Default: '' SpotAllocationStrategy: Type: String Default: 'diversified' AllowedValues: - 'lowestPrice' - 'diversified' UserData: Type: String Description: Optional base-64 encoded User Data for created instances Default: '' IsWindows: Type: String Default: 'false' Conditions: CreateEC2LCWithKeyPair: !Not [!Equals [!Ref KeyName, '']] SetEndpointToECSAgent: !Not [!Equals [!Ref EcsEndpoint, '']] CreateWithSpot: !Equals [!Ref UseSpot, 'true'] CreateWithASG: !Not [!Condition CreateWithSpot] CreateWithSpotPrice: !Not [!Equals [!Ref SpotPrice, '']] DefaultUserData: !Equals [!Ref UserData, ''] Resources: EcsInstanceLc: Type: AWS::AutoScaling::LaunchConfiguration Condition: CreateWithASG Properties: ImageId: !Ref EcsAmiId InstanceType: !Select [ 0, !Ref EcsInstanceType ] AssociatePublicIpAddress: true IamInstanceProfile: !Ref IamRoleInstanceProfile KeyName: !If [ CreateEC2LCWithKeyPair, !Ref KeyName, !Ref "AWS::NoValue" ] SecurityGroups: !Ref SecurityGroupId BlockDeviceMappings: - DeviceName: !Ref DeviceName Ebs: VolumeSize: !Ref EbsVolumeSize VolumeType: !Ref EbsVolumeType UserData: !If - DefaultUserData - Fn::Base64: !Sub "#!/bin/bash\necho \"ECS_CLUSTER=${EcsClusterName}\" >> /etc/ecs/ecs.config" - !Ref UserData EcsInstanceAsg: Type: AWS::AutoScaling::AutoScalingGroup Condition: CreateWithASG Properties: VPCZoneIdentifier: !Ref SubnetIds LaunchConfigurationName: !Ref EcsInstanceLc MinSize: '0' MaxSize: !Ref AsgMaxSize DesiredCapacity: !Ref AsgMaxSize Tags: - Key: Name Value: !Sub "ECS Instance - ${AWS::StackName}" PropagateAtLaunch: 'true' - Key: Description Value: "This instance is the part of the Auto Scaling group which was created through ECS Console" PropagateAtLaunch: 'true' EcsSpotFleet: Condition: CreateWithSpot Type: AWS::EC2::SpotFleet Properties: SpotFleetRequestConfigData: AllocationStrategy: !Ref SpotAllocationStrategy IamFleetRole: !Ref IamSpotFleetRoleArn TargetCapacity: !Ref AsgMaxSize SpotPrice: !If [ CreateWithSpotPrice, !Ref SpotPrice, !Ref 'AWS::NoValue' ] TerminateInstancesWithExpiration: true LaunchSpecifications: - IamInstanceProfile: Arn: !Ref IamRoleInstanceProfile ImageId: !Ref EcsAmiId InstanceType: !Select [ 0, !Ref EcsInstanceType ] KeyName: !If [ CreateEC2LCWithKeyPair, !Ref KeyName, !Ref "AWS::NoValue" ] Monitoring: Enabled: true SecurityGroups: - GroupId: !Ref SecurityGroupId SubnetId: !Join [ "," , !Ref SubnetIds ] BlockDeviceMappings: - DeviceName: !Ref DeviceName Ebs: VolumeSize: !Ref EbsVolumeSize VolumeType: !Ref EbsVolumeType UserData: !If - DefaultUserData - Fn::Base64: !Sub "#!/bin/bash\necho \"ECS_CLUSTER=${EcsClusterName}\" >> /etc/ecs/ecs.config" - !Ref UserData - IamInstanceProfile: Arn: !Ref IamRoleInstanceProfile ImageId: !Ref EcsAmiId InstanceType: !Select [ 1, !Ref EcsInstanceType ] KeyName: !If [ CreateEC2LCWithKeyPair, !Ref KeyName, !Ref "AWS::NoValue" ] Monitoring: Enabled: true SecurityGroups: - GroupId: !Ref SecurityGroupId SubnetId: !Join [ "," , !Ref SubnetIds ] BlockDeviceMappings: - DeviceName: !Ref DeviceName Ebs: VolumeSize: !Ref EbsVolumeSize VolumeType: !Ref EbsVolumeType UserData: !If - DefaultUserData - Fn::Base64: !Sub "#!/bin/bash\necho \"ECS_CLUSTER=${EcsClusterName}\" >> /etc/ecs/ecs.config" - !Ref UserData - IamInstanceProfile: Arn: !Ref IamRoleInstanceProfile ImageId: !Ref EcsAmiId InstanceType: !Select [ 2, !Ref EcsInstanceType ] KeyName: !If [ CreateEC2LCWithKeyPair, !Ref KeyName, !Ref "AWS::NoValue" ] Monitoring: Enabled: true SecurityGroups: - GroupId: !Ref SecurityGroupId SubnetId: !Join [ "," , !Ref SubnetIds ] BlockDeviceMappings: - DeviceName: !Ref DeviceName Ebs: VolumeSize: !Ref EbsVolumeSize VolumeType: !Ref EbsVolumeType UserData: !If - DefaultUserData - Fn::Base64: !Sub "#!/bin/bash\necho \"ECS_CLUSTER=${EcsClusterName}\" >> /etc/ecs/ecs.config" - !Ref UserData - IamInstanceProfile: Arn: !Ref IamRoleInstanceProfile ImageId: !Ref EcsAmiId InstanceType: !Select [ 3, !Ref EcsInstanceType ] KeyName: !If [ CreateEC2LCWithKeyPair, !Ref KeyName, !Ref "AWS::NoValue" ] Monitoring: Enabled: true SecurityGroups: - GroupId: !Ref SecurityGroupId SubnetId: !Join [ "," , !Ref SubnetIds ] BlockDeviceMappings: - DeviceName: !Ref DeviceName Ebs: VolumeSize: !Ref EbsVolumeSize VolumeType: !Ref EbsVolumeType UserData: !If - DefaultUserData - Fn::Base64: !Sub "#!/bin/bash\necho \"ECS_CLUSTER=${EcsClusterName}\" >> /etc/ecs/ecs.config" - !Ref UserData Outputs: EcsInstanceAsgName: Condition: CreateWithASG Description: Auto Scaling Group Name for ECS Instances Value: !Ref EcsInstanceAsg EcsSpotFleetRequestId: Condition: CreateWithSpot Description: Spot Fleet Request for ECS Instances Value: !Ref EcsSpotFleet UsedByECSCreateCluster: Description: Flag used by ECS Create Cluster Wizard Value: 'true' TemplateVersion: Description: The version of the template used by Create Cluster Wizard Value: '2.0.0'
Hopefully this helps others set up cross-compatible EC2-Fargate tasks. It’s not hard, but it requires getting a lot of small things right, as we learned the hard way. Hopefully this example makes things simpler for folks.