When Amazon ECS was first released back in April 2015, it left a lot to be desired: tasks and services could only be run on a cluster you managed, clusters had limited support for limited support for autoscaling and spot instances, and so on. Amazon filled these gaps over the next couple of years with support for scaling policies, great blog posts on integrating spot fleets with ECS, and now even a wizard cluster builder for EC2-only clusters. But even while the tools improved, you were still managing cluster capacity, which is a pain. ECS really came into its own with the release of Fargate, which allows you to run ECS tasks on Amazon-managed “virtual capacity” in your cluster so you could finally stop counting servers.
While Fargate is great, it still costs more than running on Spot Fleets, and it can take a few minutes to start tasks. This isn’t a problem for most static workloads, but the delay in particular can be irritating for dynamic loads, especially when users are waiting for containers to start and stop. As a result, my team has started keeping a reasonably-sized EC2 Spot Fleet cluster warm, and then using Fargate for overflow. This provides the best possible user experience: most things start quickly and run cheaply, but nothing fails due to insufficient capacity.
The trick with this configuration, though, is you need to configure your Spot Fleet so that the same tasks can run with
LaunchType set to
FARGATE. It’s not trivial; here are some of our lessons learned.