Use Spot Instance Fleet in the Right Way

chrisnie4
Apr 23
1 min read

You've set up a Spot Fleet with a few instance types. You've enabled Capacity Rebalancing. Your workload is still getting interrupted constantly. Here's what's actually happening.

Spot interruptions don't happen one node at a time.

When a capacity pool dries up — m5.large in us-east-1a, for example — every instance in that pool goes down together. You lose nothing for weeks, then 20 nodes disappear in the same minute.

A fleet of 10x the same instance type in one AZ just means your single point of failure got bigger.

The fix is pool-level diversification:

- 10+ different instance types (mix generations: m5, m5a, m6i, m6a — similar specs, different pools)

- Spread across all available AZs

- Use price-capacity-optimized — AWS draws from the deepest pools, not just the cheapest

When one pool dries up, the others absorb it. Most interruptions become invisible.

On Kubernetes: Karpenter handles this natively, but needs 15+ instance types to do spot-to-spot consolidation properly. The default configs I see in the wild use 3–4. That's why you're still getting hit.

Spot only works reliably when you treat it as a pool of capacity, not a list of machines.

Use Spot Instance Fleet in the Right Way

Recent Posts

Comments